Re: [PERFORM] Effects of cascading references in foreign keys

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
> Michael Fuhr <[EMAIL PROTECTED]> writes:
> > On Sat, Oct 29, 2005 at 09:49:47AM -0500, Bruno Wolff III wrote:
> >> It looks like this feature was added last May, so I think it only applies
> >> to 8.1.
> 
> > Earlier versions appear to have at least some kind of optimization.
> 
> Yeah.  IIRC, for quite some time we've had tests inside the FK update
> triggers to not bother to search the other table if the key value hasn't
> changed.  What we did in 8.1 was to push that test further upstream, so
> that the trigger event isn't even queued if the key value hasn't
> changed.  (This is why you don't see the trigger shown as being called
> even once.)
> 
> Looking at this, I wonder if there isn't a bug or at least an
> inefficiency in 8.1.  The KeysEqual short circuit tests are still there
> in ri_triggers.c; aren't they now redundant with the test in triggers.c?
> And don't they need to account for the special case mentioned in the
> comment in triggers.c, that the RI check must still be done if we are
> looking at a row updated by the same transaction that created it?
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread pgdba

Hi, I am having some trouble understanding a plan and was wondering if anyone
could guide me. The query in question here seems to be showing some
incorrect row counts. I have vacuumed and analyzed the table, but the
estimate versus the actual total seems to be way out (est 2870 vs actual
85k). Perhaps I am reading the plan incorrectly though. (hopefully the plan
below is readable)

db=# select version();
version
-
PostgreSQL 8.2.4 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115
(prerelease) (Debian 4.1.1-21)

db=# show shared_buffers ;
 shared_buffers

 300MB

#4GB ram, 2 SATA striped, XFS

db=# show default_statistics_target;
 default_statistics_target
---
 100

# stats have been raised to 1000 on both the destip and srcip columns
# create index slog_gri_idx on slog (gid,rule,(case when rule in (8,9) then
destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10));
# vacuum analyze verbose slog;

db=# show random_page_cost ;
 random_page_cost
--
 3

db=# select count(*) from slog
  count
-
 1,019,121

db=#select  count(*) as total
from slog
where gid=1::INTEGER
and rule in (1,2,8,9,10)
and (case when rule in (8,9) then destip else srcip
end)='192.168.10.23'::INET;
 total
---
 83,538

# problematic query
explain analyze
select  coalesce(uri,host((case when rule in (8,9) then srcip else destip
end))) as
destip,
case when rule in (8,9) then 'ext' else 'int' end as tp,
count(*) as total,
coalesce(sum(destbytes),0)+coalesce(sum(srcbytes),0) as bytes
from slog
where gid=1::INTEGER
and rule in (1,2,8,9,10)
and (case when rule in (8,9) then destip else srcip
end)='192.168.10.23'::INET
group by destip,tp
order by bytes desc,total desc,destip limit 20


Limit  (cost=6490.18..6490.23 rows=20 width=61) (actual
time=2036.968..2037.220 rows=20 loops=1)
->  Sort  (cost=6490.18..6490.90 rows=288 width=61) (actual
time=2036.960..2037.027 rows=20 loops=1)
Sort Key: (COALESCE(sum(destbytes), 0::numeric) +
COALESCE(sum(srcbytes), 0::numeric)), count(*), COALESCE(uri, host(CASE WHEN
(rule = ANY ('{8,9}'::integer[])) THEN srcip ELSE destip END))
->  HashAggregate  (cost=6470.50..6478.42 rows=288 width=61) (actual
time=2008.478..2022.125 rows=2057 loops=1)
->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
Recheck Cond: ((gid = 1) AND (rule = ANY
('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
->  Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26
rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
Index Cond: ((gid = 1) AND (rule = ANY
('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
Total runtime: 2037.585 ms

Does anyone have any suggestions?

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12902068
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread Tom Lane
pgdba <[EMAIL PROTECTED]> writes:
> ->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
> width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
> Recheck Cond: ((gid = 1) AND (rule = ANY
> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
> ('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))
> ->  Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26
> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
> Index Cond: ((gid = 1) AND (rule = ANY
> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
> ('{8,9}'::integer[])) THEN destip ELSE srcip END = '192.168.10.23'::inet))

[ blink... ]  Pray tell, what is the definition of this index?

With such a bizarre scan condition, it's unlikely you'll get any really
accurate row estimate.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-26 Thread Csaba Nagy
> Csaba, please can you copy that data into fresh tables, re-ANALYZE and
> then re-post the EXPLAINs, with stats data.

Here you go, fresh experiment attached.

Cheers,
Csaba.


db=# \d temp_table_a 
 Table "public.temp_table_a"
 Column |  Type  | Modifiers 
++---
 a  | bigint | not null
 b  | bigint | not null
Indexes:
"temp_pk_table_a" PRIMARY KEY, btree (a, b)

db=# \d temp_table_b1
Table "public.temp_table_b1"
 Column |  Type  | Modifiers 
++---
 b  | bigint | not null
Indexes:
"temp_pk_table_b1" PRIMARY KEY, btree (b)

db=# \d temp_table_b2
Table "public.temp_table_b2"
 Column |  Type  | Modifiers 
++---
 b  | bigint | not null
Indexes:
"temp_pk_table_b2" PRIMARY KEY, btree (b)
Foreign-key constraints:
"temp_fk_table_b2_b1" FOREIGN KEY (b) REFERENCES temp_table_b1(b)

db=# analyze verbose temp_table_a;
INFO:  analyzing "public.temp_table_a"
INFO:  "temp_table_a": scanned 3000 of 655299 pages, containing 1887000 live 
rows and 0 dead rows; 3000 rows in sample, 412183071 estimated total rows

db=# analyze verbose temp_table_b1;
INFO:  analyzing "public.temp_table_b1"
INFO:  "temp_table_b1": scanned 3000 of 57285 pages, containing 2232000 live 
rows and 0 dead rows; 3000 rows in sample, 42620040 estimated total rows

db=# analyze verbose temp_table_b2;
INFO:  analyzing "public.temp_table_b2"
INFO:  "temp_table_b2": scanned 57 of 57 pages, containing 41967 live rows and 
0 dead rows; 3000 rows in sample, 41967 estimated total rows


db=# select * from pg_stats where tablename = 'temp_table_a';
 schemaname |  tablename   | attname | null_frac | avg_width | n_distinct | 
  most_common_vals  
  |most_common_freqs
|   
 histogram_bounds| 
correlation 
+--+-+---+---++---+-++-
 public | temp_table_a | a   | 0 | 8 |   1261 | 
{74117,700022128,72317,79411,700023682,76025,72843,700014833,76505,78694}
 | 
{0.015,0.0116667,0.0097,0.0097,0.0097,0.009,0.0087,0.008,0.0077,0.0077}
 | 
{70010872,70035,73086,75843,78974,700011369,700013305,700015988,700019048,700022257,83151}
 |0.850525
 public | temp_table_a | b   | 0 | 8 | -1 | 

  | 
| 
{41708986,700707712,803042997,7004741432,7007455842,7009719495,7013869874,7016501748,7019139288,7025078292,7037930133}
 |0.646759


db=# select * from pg_stats where tablename = 'temp_table_b1';
 schemaname |   tablename   | attname | null_frac | avg_width | n_distinct | 
most_common_vals | most_common_freqs |  
  histogram_bounds 
| correlation 
+---+-+---+---++--+---+-+-
 public | temp_table_b1 | b   | 0 | 8 | -1 |
  |   | 
{41719236,801608645,7003211583,7007403678,7011591097,7016707278,7021089839,7025573684,7029316772,7033888226,8002470137}
 |0.343186


db=# select * from pg_stats where tablename = 'temp_table_b2';
 schemaname |   tablename   | attname | null_frac | avg_width | n_distinct | 
most_common_vals | most_common_freqs |  
   histogram_bounds 
 | correlation 
+---+-+---+---++--+---+---+-
 public | temp_table_b2 | b   | 0 | 8 | -1 |
  |   | 
{70054,7000352893,7000357745,7000362304,7000367025,7000371629,7000376587,7000381229,7009567724,7023749432,7034300740}
 |   -0.216073


db=# prepare test_001(bigint) as
db-# SELECT tb.*
db-# FROM

Re: [PERFORM] Searching for the cause of a bad plan

2007-09-26 Thread Tom Lane
Csaba Nagy <[EMAIL PROTECTED]> writes:
> db=# analyze verbose temp_table_a;
> INFO:  analyzing "public.temp_table_a"
> INFO:  "temp_table_a": scanned 3000 of 655299 pages, containing 1887000 live 
> rows and 0 dead rows; 3000 rows in sample, 412183071 estimated total rows

Hmm.  So the bottom line here is probably that that's not a big enough
sample to derive a realistic n_distinct value.  Or maybe it is ... how
many values of "a" are there really, and what's the true distribution of
counts?  Do the plan estimates get closer to reality if you set a higher
statistics target?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread pgdba

Hi Tom,

Tom Lane-2 wrote:
> 
> pgdba <[EMAIL PROTECTED]> writes:
>> ->  Bitmap Heap Scan on slog  (cost=82.98..6434.62 rows=2870
>> width=61) (actual time=50.235..1237.948 rows=83538 loops=1)
>> Recheck Cond: ((gid = 1) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
>> ->  Bitmap Index Scan on slog_gri_idx 
>> (cost=0.00..82.26
>> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
>> Index Cond: ((gid = 1) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
> 
> [ blink... ]  Pray tell, what is the definition of this index?
> 
> With such a bizarre scan condition, it's unlikely you'll get any really
> accurate row estimate.
> 
>   regards, tom lane
> 
> 

Original index: "create index slog_gri_idx on slog (gid,rule,(case when rule
in (8,9) then
destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"

The purpose of that index is to match a specific query (one that gets run
frequently and needs to be fast). It is using the destip when rule 8/9, and
srcip when other, but only for a subset of the rules (1,2,8,9,10). There are
about 18 rules in total, but I'm only interested in those 5. I have tried a
couple of indices like:
create index test_destip_idx on slog (gid,destip) where rule in (8,9);
create index test_srcip_idx on slog (gid,srcip) where rule in (1,2,10);

But the original slog_gri_idx index was used instead. Is there a way that I
can rewrite that index then? Not that I'm a fan of a CASE statement in a
functional index, but I'm at a loss as to how else I can create this. Or
what else I can look into to make this faster?


-- 
View this message in context: 
http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12903194
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread Tom Lane
pgdba <[EMAIL PROTECTED]> writes:
> Tom Lane-2 wrote:
> ->  Bitmap Index Scan on slog_gri_idx 
> (cost=0.00..82.26
> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
> Index Cond: ((gid = 1) AND (rule = ANY
> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
> '192.168.10.23'::inet))
>> 
>> [ blink... ]  Pray tell, what is the definition of this index?

> Original index: "create index slog_gri_idx on slog (gid,rule,(case when rule
> in (8,9) then
> destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"

> The purpose of that index is to match a specific query (one that gets run
> frequently and needs to be fast).

Ah.  I didn't think you would've put such a specific thing into an index
definition, but if you're stuck supporting such badly written queries,
maybe there's no other way.

I rather doubt that you're going to be able to make this query any
faster than it is, short of buying enough RAM to keep the whole table
RAM-resident.  Pulling 8 random rows in 1200 msec doesn't sound
all that slow to me.

The ultimate solution might be to rethink your table designs ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Incorrect row estimates in plan?

2007-09-26 Thread pgdba



Tom Lane-2 wrote:
> 
> pgdba <[EMAIL PROTECTED]> writes:
>> Tom Lane-2 wrote:
>> ->  Bitmap Index Scan on slog_gri_idx 
>> (cost=0.00..82.26
>> rows=2870 width=0) (actual time=41.306..41.306 rows=83538 loops=1)
>> Index Cond: ((gid = 1) AND (rule = ANY
>> ('{1,2,8,9,10}'::integer[])) AND (CASE WHEN (rule = ANY
>> ('{8,9}'::integer[])) THEN destip ELSE srcip END =
>> '192.168.10.23'::inet))
>>> 
>>> [ blink... ]  Pray tell, what is the definition of this index?
> 
>> Original index: "create index slog_gri_idx on slog (gid,rule,(case when
>> rule
>> in (8,9) then
>> destip else srcip end)) WHERE (rule in (1, 2, 8, 9, 10))"
> 
>> The purpose of that index is to match a specific query (one that gets run
>> frequently and needs to be fast).
> 
> Ah.  I didn't think you would've put such a specific thing into an index
> definition, but if you're stuck supporting such badly written queries,
> maybe there's no other way.
> 
> I rather doubt that you're going to be able to make this query any
> faster than it is, short of buying enough RAM to keep the whole table
> RAM-resident.  Pulling 8 random rows in 1200 msec doesn't sound
> all that slow to me.
> 
> The ultimate solution might be to rethink your table designs ...
> 
>   regards, tom lane
> 

Badly written the query may be, but I do have the opportunity to change it.
Part of the problem is that I cannot come up with a better way of writing
it.

What about the discrepancy between the estimated row count and the actual
row count for that index access?
"Bitmap Index Scan on slog_gri_idx  (cost=0.00..82.26 rows=2870 width=0)
(actual time=41.306..41.306 rows=83538 loops=1)"

Is there anything I can do to influence that (not that it is likely to
change the plan, but...). I vacuumed and analyzed after I created the index,
so the stats should be at least be close (with stats target set to 1000
there).

-- 
View this message in context: 
http://www.nabble.com/Incorrect-row-estimates-in-plan--tf4522692.html#a12905186
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Difference in query plan when using = or > in where clause

2007-09-26 Thread Radhika S
Hi,

I am curious as to why this occurs. Why does an = change the query plan so
drastically?

When my query is:

Select count(*) from View_A WHERE tradedate = '20070801';
The query plan is as below: I see that the scan on the alloctbl is being
indexed on k_alloctbl_blockid_status

->  Bitmap Index Scan on idx_tradeblocktbl_tradeate
(cost=0.00..50.47rows=1444 width=0) (actual time=
0.040..0.040 rows=106 loops=1)
 Index Cond: ((tradedate >=
'2007-08-01'::date) AND (tradedate <= '2007-09-24'::date))
   ->  Bitmap Heap Scan on alloctbl a
(cost=4.59..270.73rows=70 width=16) (actual time=
0.010..0.011 rows=1 loops=7)
 Recheck Cond: (tr.recid = a.blockid)
 ->  Bitmap Index Scan on k_alloctbl_blockid_status (cost=
0.00..4.59 rows=70 width=0) (actual time=0.007..0.007 rows=1 loops=7)
   Index Cond: (tr.recid = a.blockid)
 Total runtime: 1.453 ms


But when my query is:
Select count(*) from View_A WHERE tradedate BETWEEN '20070801' and
'20070901';
The query plan is:

-
 ->  Bitmap Heap Scan on tradeblocktbl tr  (cost=
50.47..2849.67 rows=1444 width=80) (actual time=0.095..0.218 rows=104
loops=1)
   Recheck Cond: ((tradedate >= '2007-08-01'::date)
AND (tradedate <= '2007-09-24'::date))
   ->  Bitmap Index Scan on
idx_tradeblocktbl_tradeate  (cost=0.00..50.47 rows=1444 width=0) (actual
time=0.050..0.050 rows=106 loops=1)
 Index Cond: ((tradedate >=
'2007-08-01'::date) AND (tradedate <= '2007-09-24'::date))
   ->  Sort  (cost=99007.79..100479.68 rows=588755 width=16)
(actual time=2660.009..3150.887 rows=588755 loops=1)
 Sort Key: a.blockid
 ->  Seq Scan on alloctbl a
(cost=0.00..20442.55rows=588755 width=16) (actual time=
0.026..764.833 rows=588755 loops=1)


 Total runtime: 3590.715 ms

Thank you.
Radhika

-- 
It is all a matter of perspective. You choose your view by choosing where to
stand. --Larry Wall