Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Ron,

 If the databases are on different machines, maybe the postgres.conf
 or pg_hba.conf files are different, and the buffer counts is affect-
 ing the optimizer?

The databases are on the same machine, using the same postmaster.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Tom,

 Partly.  The numbers suggest that in ANALYZE's default sample of 3000
 rows, it's only finding about a dozen non-null tgroup_ids (yielding the
 0.996 null_frac value); and that in one case all dozen are different and
 in the other case there are two duplicates.  It would help if you
 boosted the stats target for this column by a factor of 10.  (You can
 do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
 show that you did so.)

Hmmm.  No dice.   I raised the selectivity to 1000, which increased n_distinct 
to 108, which is pretty close to accurate.  However, the planner still 
insists on using a seq scan on case_clients unless I drop random_page_cost to 
1.5 (which is up from 1.2 but still somewhat unreasonable).

 But the other part of the problem is that in 7.2, the join selectivity
 estimator is way off when you are joining a unique column (like the pkey
 on the other side) to a column with a very large fraction of nulls.
 We only discovered this recently; it's fixed as of 7.3.3:

OK, I'll talk to the client about upgrading.

-- 
-Josh Berkus

__AGLIO DATABASE SOLUTIONS___
Josh Berkus
   Complete information technology  [EMAIL PROTECTED]
and data management solutions   (415) 565-7293
   for law firms, small businesses   fax 621-2533
and non-profit organizations.   San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Also, there doesn't seem to be any way in 7.2 for me to find out what the 
 current statistics target for a column is.   What am I missing?

There still isn't a handy command for it --- you have to look at
pg_attribute.attstattarget for the column.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Peter Darley
Josh,
I'm sure that you've thought of this, but it sounds like you may not have
done an analyze in your new DB.
Thanks,
Peter Darley

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Josh Berkus
Sent: Monday, August 11, 2003 3:48 PM
To: Ron Johnson; PgSQL Performance ML
Subject: Re: [PERFORM] Odd problem with performance in duplicate
database


Ron,

 If the databases are on different machines, maybe the postgres.conf
 or pg_hba.conf files are different, and the buffer counts is affect-
 ing the optimizer?

The databases are on the same machine, using the same postmaster.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Tom,
 Okay, here's our problem:
 
 live DB: tgroup_id  n_distinct = -1
 
 test DN: tgroup_id  n_distinct = 11
 
 The former estimate actually means that it thinks tgroup_id is a unique
 column, whereas the latter says there are only 11 distinct values in the
 column.  I assume the former is much nearer to the truth (how many rows
 in cases, and how many distinct tgroup_id values)?

 The real case is that there are 113 distinct tgroup_ids, which cover
 about 10% of the population of cases.  The other 90% is NULL.  The
 average tgroup_id is shared between 4.7 cases.

 So this seems like sampling error.

Partly.  The numbers suggest that in ANALYZE's default sample of 3000
rows, it's only finding about a dozen non-null tgroup_ids (yielding the
0.996 null_frac value); and that in one case all dozen are different and
in the other case there are two duplicates.  It would help if you
boosted the stats target for this column by a factor of 10.  (You can
do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
show that you did so.)

But the other part of the problem is that in 7.2, the join selectivity
estimator is way off when you are joining a unique column (like the pkey
on the other side) to a column with a very large fraction of nulls.
We only discovered this recently; it's fixed as of 7.3.3:

2003-04-15 01:18  tgl

* src/backend/utils/adt/selfuncs.c (REL7_3_STABLE): eqjoinsel's
logic for case where MCV lists are not present should account for
NULLs; in hindsight this is obvious since the code for the
MCV-lists case would reduce to this when there are zero entries in
both lists.  Per example from Alec Mitchell.

Possibly you could backpatch that into 7.2, although I'd think an update
to 7.3.4 would be a more profitable use of time.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Tom,

 Partly.  The numbers suggest that in ANALYZE's default sample of 3000
 rows, it's only finding about a dozen non-null tgroup_ids (yielding the
 0.996 null_frac value); and that in one case all dozen are different and
 in the other case there are two duplicates.  It would help if you
 boosted the stats target for this column by a factor of 10.  (You can
 do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
 show that you did so.)

Also, there doesn't seem to be any way in 7.2 for me to find out what the 
current statistics target for a column is.   What am I missing?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Folks,

More followup on this:

The crucial difference between the two execution plans is this clause:

test db has:
-  Seq Scan on case_clients  (cost=0.00..3673.48 rows=11274 width=11) (actual 
time=0.02..302.20 rows=8822 loops=855)

whereas live db has:
-  Index Scan using idx_caseclients_case on case_clients  (cost=0.00..5.10 
rows=1 width=11) (actual time=0.03..0.04 rows=1 loops=471)

using an enable_seqscan = false fixes this, but is obviously not a long-term 
solution.   

I've re-created the test system from an immediate copy of the live database, 
and checked that the the main tables and indexes were reproduced faithfully.

Lowering random_page_cost seems to do the trick.  But I'm still mystified; why 
would one identical database pick a different plan than its copy?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Tom,

 Let's see the pg_stats rows for case_clients in both databases.  The
 entries for trial_groups might be relevant too.

My reading is that the case is borderline; that is, becuase the correlation 
is about 10-20% higher on the test database (since it was restored clean 
from backup) the planner is resorting to a seq scan.

At which point the spectre of random_page_cost less than 1.0 rears its ugly 
head again.  Because the planner seems to regard this as a borderline case, 
but it's far from borderline ... index scan takes 260ms, seq scan takes 
244,000ms.   Yet my random_page_cost is set pretty low already, at 1.5.

It seems like I'd have to set random_page_cost to less than 1.0 to make sure 
that the planner never used a seq scan.  Which kinda defies the meaning of 
the setting.

*sigh* wish the client would pay for an upgrade 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster