Re: [HACKERS] Simple postgresql.conf wizard

2008-12-19 Thread Tom Lane
Mark Wong mark...@gmail.com writes:
 On Mon, Dec 8, 2008 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Mark Wong mark...@gmail.com writes:
 On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Are any of the queries complicated enough to trigger GEQO planning?

 Sorry for the delay in responding, here's the queries and the number
 of tables in each:
 [ worst case is ]
 Q8:  8 (7 uniqie) + 1 temp table

Well, geqo_threshold is 12 by default, so that theory loses ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-18 Thread Mark Wong
On Mon, Dec 8, 2008 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Mark Wong mark...@gmail.com writes:
 On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Are any of the queries complicated enough to trigger GEQO planning?

 Is there a debug option that we could use to see?

 Well, you could set geqo=off and see if the behavior changes, but
 it'd be easier just to count how many tables are in each query ...

Sorry for the delay in responding, here's the queries and the number
of tables in each:

Q1:  1
Q2:  2
Q3:  3
Q4:  1
Q5:  6
Q6:  1
Q7:  6 (5 unique) + 1 temp table
Q8:  8 (7 uniqie) + 1 temp table
Q9:  6 + 1 temp table
Q10: 4
Q11: 3
Q12:  2
Q13:  3 + 1 temp table
Q14:  2
Q15:  5
Q16:  2
Q17:  2
Q18:  3
Q19:  2
Q20:  2
Q21:  4
Q22:  3 + 1 temp table

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-10 Thread Kevin Grittner
 Tom Lane [EMAIL PROTECTED] wrote: 
 [ a bit off-topic for the thread, but ... ]
 
 Kevin Grittner [EMAIL PROTECTED] writes:
 I'll attach the query and plan.  You'll note that the query looks a
 little odd, especially all the (1=1) tests.
 
 FWIW, it would be better to use TRUE as a placeholder in your
 generated queries.  I don't suppose this would make a huge
percentage
 difference in such complicated queries, but in and of itself there
are
 a lot of cycles spent to parse 1=1 and then reduce it to constant
TRUE.
 
Thanks, I'll put in a request for enhancement for our framework. 
(Not all databases we support handle boolean literals, so we need a
few lines in our plugin layer.)
 
In case anyone cares in terms of interpreting the timings I posted, on
the server where I just tested this change, the average plan time
dropped from 65.0 ms to 63.7 ms -- a 2% improvement.  Eliminating
pretty whitespace shaved off another 0.2 ms, or 0.3%.
 
So, worth doing on our end as a tuning measure, but not a significant
distortion in terms of the issues discussed on the thread.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-08 Thread Tom Lane
[ a bit off-topic for the thread, but ... ]

Kevin Grittner [EMAIL PROTECTED] writes:
 I'll attach the query and plan.  You'll note that the query looks a
 little odd, especially all the (1=1) tests.

FWIW, it would be better to use TRUE as a placeholder in your
generated queries.  I don't suppose this would make a huge percentage
difference in such complicated queries, but in and of itself there are
a lot of cycles spent to parse 1=1 and then reduce it to constant TRUE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-08 Thread Mark Wong
On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Greg Smith [EMAIL PROTECTED] writes:
 ... where the Power Test seems to oscillate between degrees of good and bad
 behavior seemingly at random.

 Are any of the queries complicated enough to trigger GEQO planning?

Is there a debug option that we could use to see?

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-08 Thread Tom Lane
Mark Wong [EMAIL PROTECTED] writes:
 On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Are any of the queries complicated enough to trigger GEQO planning?

 Is there a debug option that we could use to see?

Well, you could set geqo=off and see if the behavior changes, but
it'd be easier just to count how many tables are in each query ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Gregory Stark

Kevin Grittner [EMAIL PROTECTED] writes:

 One more data point to try to help.
  
 While the jump from a default_statistics_target from 10 to 1000
 resulted in a plan time increase for a common query from 50 ms to 310
 ms, at a target of 50 the plan time was 53 ms.  Analyze time was 7.2
 minutes and 18.5 minutes for targets of 10 and 50.  This is an 842 GB
 database on an 8 processor (3.5 GHz Xeon) machine with 64 GB RAM
 running (soon to be updated) PostgreSQL 8.2.7.

So my half-finished DBT3 tests showed hardly any change in planning time for
default_statistics_targets 1000 and even Kevin Grittner's query which is the
worst real example posted so far only went from 50ms to 310ms.

So I started thinking perhaps substantially larger values might not have much
effect on planning at all. (Actual data is a lot more convincing than simple
assertions!).

Kevin's query was an OLTP query so 300ms is still way too much -- 300ms is on
the high end for OLTP response times for query *execution*. But seeing the
intermediate values would be interesting.

So I wondered what the worst-case would be for a synthetic case designed to
exercise the planner severely. This would also be useful for optimizing the
planner under gprof, though I suspect the hot spots are pretty obvious even
without empirical data.

So anyways, here's a script to create a table with a 75k pg_statistic record.
And a worst-case query where the plan time goes from 34ms to 1.2s for
histogram sizes between 10 and 1,000.

Looking at eqjoinsel I think it could be improved algorithmically if we keep
the mcv list in sorted order, even if it's just binary sorted order. But I'm
not sure what else uses those values and whether the current ordering is
significant. I'm also not sure it's the only O(n^2) algorithm there and
there's no algorithmic gain unless they're all knocked down.

Incidentally this timing is with the 75kB toasted arrays in shared buffers
because the table has just been analyzed. If it was on a busy system then just
planning the query could involve 75kB of I/O which is what I believe was
happening to me way back when I last observed super-long plan times.


postgres=# create table tk as select 
random()::text||random()::text||random()::text||random()::text||random()::text||random()::text
 as r from generate_series(1,1000);
postgres=# insert into tk (select * from tk);
postgres=# insert into tk (select * from tk);
postgres=# insert into tk (select  
random()::text||random()::text||random()::text||random()::text||random()::text||random()::text
 as r from generate_series(1,2000));
postgres=# alter table tk alter r set statistics 1000;
postgres=# analyze tk;

postgres=# select pg_column_size(stavalues1) from pg_statistic where starelid = 
'tk'::regclass;
 pg_column_size 

  75484
(1 row)
postgres=# explain select count(*) from (select * from tk as k, tk as l,tk as 
m,tk as n,tk as o,tk as p where k.r=l.r and k.r=m.r and k.r=n.r and k.r=o.r and 
k.r=p.r) as x;
 QUERY PLAN 
 
-
 Aggregate  (cost=41358.14..41358.15 rows=1 width=0)
   -  Merge Join  (cost=3213.13..37713.13 rows=1458000 width=0)
 Merge Cond: (k.r = l.r)
 -  Merge Join  (cost=2677.61..14092.61 rows=486000 width=510)
   Merge Cond: (k.r = m.r)
   -  Merge Join  (cost=2142.09..5862.09 rows=162000 width=408)
 Merge Cond: (k.r = n.r)
 -  Merge Join  (cost=1606.57..2761.57 rows=54000 
width=306)
   Merge Cond: (k.r = o.r)
   -  Merge Join  (cost=1071.04..1371.04 rows=18000 
width=204)
 Merge Cond: (k.r = p.r)
 -  Sort  (cost=535.52..550.52 rows=6000 
width=102)
   Sort Key: k.r
   -  Seq Scan on tk k  (cost=0.00..159.00 
rows=6000 width=102)
 -  Sort  (cost=535.52..550.52 rows=6000 
width=102)
   Sort Key: p.r
   -  Seq Scan on tk p  (cost=0.00..159.00 
rows=6000 width=102)
   -  Sort  (cost=535.52..550.52 rows=6000 width=102)
 Sort Key: o.r
 -  Seq Scan on tk o  (cost=0.00..159.00 
rows=6000 width=102)
 -  Sort  (cost=535.52..550.52 rows=6000 width=102)
   Sort Key: n.r
   -  Seq Scan on tk n  (cost=0.00..159.00 rows=6000 
width=102)
   -  Sort  (cost=535.52..550.52 rows=6000 width=102)
 Sort Key: m.r
 -  Seq Scan on tk m  (cost=0.00..159.00 rows=6000 
width=102)
 -  Sort  (cost=535.52..550.52 rows=6000 

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Kevin Grittner
 Gregory Stark [EMAIL PROTECTED] wrote: 
 
 Incidentally this timing is with the 75kB toasted arrays in shared
buffers
 because the table has just been analyzed. If it was on a busy system
then 
 just
 planning the query could involve 75kB of I/O which is what I believe
was
 happening to me way back when I last observed super-long plan times.
 
I'm pretty sure I saw that in some of my tests with larger targets. 
With a large database and a large target, some of the tables'
statistics apparently weren't still cached the first time I planned
the query, and I got an extremely long plan time on the first attempt,
and then it settled in within a pretty narrow range on repeated plans.
I discarded the initial plan time as immaterial for our purposes
because a query that's run 300,000 times per day is probably going to
keep its statistics in cache most of the time.
 
I was looking at trying to modify the perl script from Robert Haas to
run my query at a wide range of target values, collecting analyze and
plan times at each.  Now that you have an easy-to-create synthetic
example, is this still worth it, considering that it would be on 8.2? 
(If we wait a week or two, I could probably do it on 8.3.)  We do have
gprof on these systems, although I'd need advice on how to use it.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Kevin Grittner
 Gregory Stark [EMAIL PROTECTED] wrote: 

 Incidentally, here's a graph of the explain time for that plan. It
looks
 pretty linear to me
 
Except for that sweet spot between 50 and 100.
Any idea what's up with that?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Josh Berkus
All,

I'm thinking that default_statistics_target is disputable enough that we 
want to move discussion of it to pgsql-performance, and for version 0.1 of 
the tuning wizard, exclude it.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Nathan Boley
Thanks for putting out pgtune - it's a sorely needed tool.

I had a chance to look over the source code and have a few comments,
mostly about python specific coding conventions.

- The windows specific try block ( line 16 ) raises a ValueError vs
ImportError on my debian machine. Maybe it would be more appropriate
to explicitly test platform.system()==Windows?

- from ctypes import * ( 18 ) makes the block difficult to read and
pollutes the namespace.

- on line 45, the try block should probably catch exceptions derived
from Exception ( to avoid catching SystemExit and KeyboardInterrupt
errors ). ie, except Exception: return None. Also, printing out the
expection in debug mode would probably be a good idea ( ie except
Exception, e: print e\ return None )

- all classes ( 58, 135, 205 ) are 'old-style' classes.  I dont see
any reason to use classic classes ( unless Python 2.1 is a support
goal? ) To make classes 'new style' classes (
http://www.python.org/doc/2.5.2/ref/node33.html ) they should inherit
object. i.e. class PGConfigLine(object):

- The doc strings ( 59, 136, 206 ) don't follow standard conventions,
described here http://www.python.org/dev/peps/pep-0257/.

- Functions also support doc strings ( 342, 351, etc. )

- Tuple unpacking doesn't require the tuple boundaries ( 446 and
others ). ie, options, args = ReadOptions()  works.

This is more of a style comment about the 'Java-ish interface' ( line
112 ), feel free to ignore it.

overloading __str__ and __repr__ are the python ways to return string
representations of an object. ie, instead of toSting use __str__ and
then ( on 197 ) print l or print str(l) instead of print l.toString()

for the other methods ( getValue, getLineNumber, isSetting ) I'm
assuming you didnt call the attributes directly because you didnt want
them to be accidently overwritten. Have you considered the use of
properties ( http://www.python.org/download/releases/2.2.3/descrintro/#property
)? Also, it would be more clear to mark attributes as private ( i.e.
_name or __name, _readable, _lineNumber, _setsParameter ) if you dont
want them to be accessed directly.

Hope my comments are useful! Thanks again for writing this.

-Nathan

P.S. I'd be happy to officially review this if it gets to that.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Greg Smith

On Fri, 5 Dec 2008, Nathan Boley wrote:


- all classes ( 58, 135, 205 ) are 'old-style' classes.  I dont see
any reason to use classic classes ( unless Python 2.1 is a support
goal? )


I'm not targeting anything older then 2.4, as that's the oldest version I 
have installed anywhere.  2.4 is still an important target because it's 
the version that comes with the very popular RHEL4.  I've worked on some 
SuSE 9 boxes that only have 2.3 as recently as last year, but I consider 
that a legacy version I don't care much about mainly because the kinds of 
programs I'm usually writing tend to rely heavily on using subprocess, 
added in 2.4.


The answers to all of your other points is that I'm relatively new to 
Python and just didn't know any of that.  I'll convert to new style 
classes, use __str__ properly, and switch all the getters to use 
properties now that you've pointed me toward those.  Improving the 
rudimentary doc strings was already on my list, that's not a major target 
though because this project has its own documentation standards and I 
can't do anything useful with the output from pydoc.  I'm only interested 
in documenting things about the source code itself in there, anything 
user-related needs to go in the official docs.



I'd be happy to officially review this if it gets to that.


Just added you to the wiki in that role.  I should have a first version 
that's of commit candidate quality ready to go in the next week.  If you 
could make a second pass then and see how I did implementing your 
suggestions, that would be great.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Joshua D. Drake
On Fri, 2008-12-05 at 17:27 -0500, Greg Smith wrote:
 On Fri, 5 Dec 2008, Nathan Boley wrote:
 
  - all classes ( 58, 135, 205 ) are 'old-style' classes.  I dont see
  any reason to use classic classes ( unless Python 2.1 is a support
  goal? )
 
 I'm not targeting anything older then 2.4, as that's the oldest version I 
 have installed anywhere.  2.4 is still an important target because it's 
 the version that comes with the very popular RHEL4.

And RHEL5

Joshua D. Drake

-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-05 Thread Robert Haas
 Looking at eqjoinsel I think it could be improved algorithmically if we keep
 the mcv list in sorted order, even if it's just binary sorted order. But I'm
 not sure what else uses those values and whether the current ordering is
 significant. I'm also not sure it's the only O(n^2) algorithm there and
 there's no algorithmic gain unless they're all knocked down.

The current code seems to be trying to handle pathological cases where:

(1) the operator for which it is invoked doesn't really represent
equality and/or
(2) the type has an equality operator but no comparison operator.

Those are important cases, but maybe we could create an alternative
version for the fairly common situation where neither holds?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 On Thu, 4 Dec 2008, Gregory Stark wrote:

 My point was more that you could have a data warehouse on a non-dedicated
 machine, you could have a web server on a non-dedicated machine, or you could
 have a mixed server on a non-dedicated machine.

 I should just finish the documentation, where there will be a big disclaimer
 saying THESE SETTINGS ASSUME A SERVER DEDICATED TO POSTGRESQL!  That's the
 context here.  Why, after you follow my tuning instructions, you're lucky if
 the server will run anything but the database afterwards.

So you're getting rid of the desktop mode altogether? That's more drastic
than I was suggesting. I was suggesting that you want to separate desktop
into a separate option form the workload list. 

What users are getting now is kind of like asking users Would you like a
small, medium, large, or diet Coke?


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Kevin Grittner
 Gregory Stark [EMAIL PROTECTED] wrote: 
 
 That sounds like it would be an interesting query to analyze in more
detail.
 Is there any chance to could run the complete graph and get a chart
of 
 analyze
 times for all statistics values from 1..1000 ? And log the explain
plans to 
 a
 file so we can look for at what statistics targets the plan changed?
 
 Or if the data is public I would be interested in looking at doing it
if you
 want to send it to me.
 
There are some very big tables in that query which contain some
confidential data.  It would be hard do a lot of runs at high
default_statistics_target values because the database analyze time
goes so high.  If I pick out which tables are used by the query, I
might be able to put a script together which loops through analyze of
those tables with different targets and capturing run time.
 
There are two problems -- finding the time to set this up, and finding
server time windows where other things wouldn't be distorting the
results.  If you could help with setting up the test script, that
would go a long way toward solving the first problem.  I think I could
reserve a smaller multi-CPU machine with identical data but slower
CPUs to run the test.
 
I'll attach the query and plan.  You'll note that the query looks a
little odd, especially all the (1=1) tests.  This is because the
application allows users to plug in a variety of selection criteria,
and any that aren't used are stubbed out that way.  I picked one that
was not too atypical for the 300,000 runs per day.  I intentionally
didn't clean up the white space, but left it just as it was emitted
by our framework, in case any of that affected parse/plan time.
 
If you need schema info not obvious from the plan, let me know.
 
-Kevin
 



name-search.sql
Description: Binary data






 QUERY PLAN 








 Unique  (cost=473.48..473.57 rows=2 width=173)
   -  Sort  (cost=473.48..473.49 rows=2 width=173)
 Sort Key: caseNo, filingDate, countyName, statusCodeDescr, 
nameF, nameM, nameL, suffix, dob, caption, countyNo, caseType, 
isSeal, isPartySeal, lastModified, searchName, isPetitionerSeal
 -  Append  (cost=0.00..473.47 rows=2 width=173)
   -  Subquery Scan *SELECT* 1  (cost=0.00..235.74 rows=1 
width=169)
 -  Nested Loop  (cost=0.00..235.73 rows=1 width=169)
   -  Nested Loop  (cost=0.00..126.19 rows=1 width=163)
 -  Nested Loop Left Join  (cost=0.00..124.31 
rows=1 width=153)
   -  Nested Loop Left Join  
(cost=0.00..120.71 rows=1 width=153)
 -  Nested Loop Left Join  
(cost=0.00..117.82 rows=1 width=145)
   Filter: 

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Gregory Stark
Kevin Grittner [EMAIL PROTECTED] writes:

 There are some very big tables in that query which contain some
 confidential data.  

oh well.

 I'll attach the query and plan.  You'll note that the query looks a
 little odd, especially all the (1=1) tests.  

That is interesting. I seem to recall Tom is resistant to trying to optimize
such queries but actually I've written lots of queries like that myself so I
find them interesting.

I'll look at the query and see if I can write a similar one using dbt3.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Kevin Grittner
 Gregory Stark [EMAIL PROTECTED] wrote: 
 And log the explain plans to a
 file so we can look for at what statistics targets the plan changed?
 
Well, I can give you explain analyze output for
default_statistics_target 10 and 50, for whatever that's worth.
 
Unfortunately I blew my save from the first run with target 50, but it
ran much faster than the first run of target 10 (attached), I think it
was about 2.8 seconds.  That may or may not be due to the right pages
being cached by coincidence.  The machine with the target 50 run was
serving the web app at the time, so there was significant other load,
while the other was idle except as a replicaton target at the time of
the run.
 
-Kevin
 

 Unique  (cost=457.72..457.81 rows=2 width=171) (actual 
time=94927.370..94932.613 rows=2388 loops=1)
   -  Sort  (cost=457.72..457.73 rows=2 width=171) (actual 
time=94927.366..94928.548 rows=2396 loops=1)
 Sort Key: caseNo, filingDate, countyName, statusCodeDescr, 
nameF, nameM, nameL, suffix, dob, caption, countyNo, caseType, 
isSeal, isPartySeal, lastModified, searchName, isPetitionerSeal
 -  Append  (cost=0.00..457.71 rows=2 width=171) (actual 
time=108.847..94913.855 rows=2396 loops=1)
   -  Subquery Scan *SELECT* 1  (cost=0.00..227.86 rows=1 
width=169) (actual time=108.846..93393.253 rows=2315 loops=1)
 -  Nested Loop  (cost=0.00..227.85 rows=1 width=169) 
(actual time=108.843..93383.339 rows=2315 loops=1)
   -  Nested Loop  (cost=0.00..122.30 rows=1 
width=163) (actual time=97.389..93174.181 rows=2315 loops=1)
 -  Nested Loop Left Join  (cost=0.00..120.42 
rows=1 width=153) (actual time=86.182..93144.680 rows=2315 loops=1)
   -  Nested Loop Left Join  
(cost=0.00..116.82 rows=1 width=153) (actual time=75.841..73181.110 rows=2315 
loops=1)
 -  Nested Loop Left Join  
(cost=0.00..113.88 rows=1 width=145) (actual time=60.527..52107.799 rows=2315 
loops=1)
   Filter: 
(((WPCT.profileName IS NOT NULL) OR (((C.caseType)::text = ANY 
(('{PA,JD}'::character varying[])::text[])) AND (NOT C.isConfidential))) 
AND (((WPCT.profileName)::text  'PUBLIC'::text) OR 
((C.caseType)::text  'FA'::text) OR ((C.wcisClsCode)::text  
'40501'::text)))
   -  Nested Loop  
(cost=0.00..113.56 rows=1 width=146) (actual time=53.104..52045.586 rows=2601 
loops=1)
 Join Filter: 
(P.partyType)::text = ANY (('{JV,CH}'::character varying[])::text[])) 
AND ((C.caseType)::text = 'ZZ'::text)) OR ((P.partyType)::text  ALL 
(('{JV,CH}'::character varying[])::text[]))) AND (((C.caseType)::text  
ALL (('{CF,CI,CM,CT,FO,TR}'::character varying[])::text[])) OR 
((P.partyType)::text = 'DE'::text)) AND C.caseType)::text = ANY 
(('{JA,JC,JG,JM,JO,JV,JI,TP}'::character varying[])::text[])) AND 
((P.partyType)::text = ANY (('{CH,JV}'::character varying[])::text[]))) OR 
(((C.caseType)::text  ALL (('{JA,JC,JG,JM,JO,JV,JI,TP}'::character 
varying[])::text[])) AND ((P.partyType)::text  ALL (('{CH,JV}'::character 
varying[])::text[] AND (((P.partyType)::text  ALL 
(('{PE,PL,JP}'::character varying[])::text[])) OR C.filingDate)::date  
'2008-11-01'::date) OR ((C.wcisClsCode)::text  '30709'::text)) AND 
(((C.caseType)::text  ALL (('{CV,FA}'::character varying[])::text[])) OR 
((C.wcisClsCode)::text  '30711'::text) OR (NOT (subplan))
 -  Index Scan using 
Party_SearchName on Party P  (cost=0.00..3.21 rows=1 width=81) (actual 
time=6.878..159.418 rows=4097 loops=1)
   Index Cond: 
(((searchName)::text = 'HILL,J'::character varying) AND 
((searchName)::text  'HILL,K'::character varying))
   Filter: ((NOT 
isSeal) AND ((searchName)::text ~~ 'HILL,J%'::text))
 -  Index Scan using 
Case_pkey on Case C  (cost=0.00..5.01 rows=1 width=87) (actual 
time=12.607..12.625 rows=1 loops=4097)
   Index Cond: 
(((C.countyNo)::smallint = (P.countyNo)::smallint) AND 
((C.caseNo)::text = (P.caseNo)::text))
   Filter: 
((isExpunge  true) AND (NOT (subplan)))
   SubPlan
 -  Index Scan 
using HiddenCase_pkey on HiddenCase HCA  (cost=0.00..2.14 rows=1 width=0) 
(actual time=4.487..4.487 rows=0 loops=4097)
   Index 
Cond: (((countyNo)::smallint = ($0)::smallint) AND ((caseNo)::text = 
($1)::text))
 

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Ron Mayer

Greg Smith wrote:
I'm not the sort to be too concerned myself that 
the guy who thinks he's running a DW on a system with 64MB of RAM might 
get bad settings, but it's a fair criticism to point that out as a problem.


In defense of thinking about very small configurations, I've seen many
cases where an enterprise-software salesperson's laptop is running a
demo - either in a small virtual machine in the laptop, or on an
overloaded windows box.Even though the customer might end up
running with 64GB, the sales demo's more likely to be 64MB.

In some ways, I'd be more interested in seeing automation of those than
the large production systems.   Large production systems are likely to
have an administrator who's paid to read the documentation and learn
how to configure the database.  OTOH there tends to be less DBA time
available to tune the smaller demo instances that comego as sales
people upgrade their laptops; so improved automation would be much
appreciated there.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Joshua D. Drake
On Thu, 2008-12-04 at 10:20 -0800, Ron Mayer wrote:
 Greg Smith wrote:
  I'm not the sort to be too concerned myself that 
  the guy who thinks he's running a DW on a system with 64MB of RAM might 
  get bad settings, but it's a fair criticism to point that out as a problem.
 
 In defense of thinking about very small configurations, I've seen many
 cases where an enterprise-software salesperson's laptop is running a
 demo - either in a small virtual machine in the laptop, or on an
 overloaded windows box.Even though the customer might end up
 running with 64GB, the sales demo's more likely to be 64MB.

Although I get your point, that is a job for sqllite not postgresql.
PostgreSQL is not a end all be all solution and it is definitely not
designed to be embedded which is essentially what you are suggesting
with that kind of configuration.

Joshua D. Drake


-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Greg Stark
Well that's a bit if hyperbole. There's a gulf of difference between  
an embedded use case where it should fit within an acceptable  
footprint for a desktop app component of maybe a megabyte or so of ram  
and disk - if we're generous and saying it should run comfortably  
without having to spec out special server hardware for a demo.


That said 64mb of ram seems like hyperbole too. My NSLU2 has 32mb...

greg

On 4 Dec 2008, at 06:28 PM, Joshua D. Drake [EMAIL PROTECTED]  
wrote:



On Thu, 2008-12-04 at 10:20 -0800, Ron Mayer wrote:

Greg Smith wrote:

I'm not the sort to be too concerned myself that
the guy who thinks he's running a DW on a system with 64MB of RAM  
might
get bad settings, but it's a fair criticism to point that out as a  
problem.


In defense of thinking about very small configurations, I've seen  
many

cases where an enterprise-software salesperson's laptop is running a
demo - either in a small virtual machine in the laptop, or on an
overloaded windows box.Even though the customer might end up
running with 64GB, the sales demo's more likely to be 64MB.


Although I get your point, that is a job for sqllite not postgresql.
PostgreSQL is not a end all be all solution and it is definitely not
designed to be embedded which is essentially what you are suggesting
with that kind of configuration.

Joshua D. Drake


--
PostgreSQL
  Consulting, Development, Support, Training
  503-667-4564 - http://www.commandprompt.com/
  The PostgreSQL Company, serving since 1997



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Robert Haas
 In defense of thinking about very small configurations, I've seen many
 cases where an enterprise-software salesperson's laptop is running a
 demo - either in a small virtual machine in the laptop, or on an
 overloaded windows box.Even though the customer might end up
 running with 64GB, the sales demo's more likely to be 64MB.

 Although I get your point, that is a job for sqllite not postgresql.
 PostgreSQL is not a end all be all solution and it is definitely not
 designed to be embedded which is essentially what you are suggesting
 with that kind of configuration.

It's unlikely that someone would want to write a demo version of the
software that runs on a completely different database than the
production one...  it's also totally unnecessary because PostgreSQL
runs great on small systems.  In fact, it runs great on small systems
with NO TUNING AT ALL.  That's exactly why a wizard is needed to set
values for systems that aren't small.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Ron Mayer

Joshua D. Drake wrote:

On Thu, 2008-12-04 at 10:20 -0800, Ron Mayer wrote:

Greg Smith wrote:
I'm not the sort to be too concerned myself that 
the guy who thinks he's running a DW on a system with 64MB of RAM might 
get bad settings, but it's a fair criticism to point that out as a problem.

In defense of thinking about very small configurations, I've seen many
cases where an enterprise-software salesperson's laptop is running a
demo - either in a small virtual machine in the laptop, or on an
overloaded windows box.Even though the customer might end up
running with 64GB, the sales demo's more likely to be 64MB.


Although I get your point, that is a job for sqllite not postgresql.
PostgreSQL is not a end all be all solution and it is definitely not
designed to be embedded which is essentially what you are suggesting
with that kind of configuration.


But these sales people are selling a postgres based product. It'd be
both much less convincing to demo a different application stack; as
well as not a very productive use of the developer's time.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Joshua D. Drake
On Thu, 2008-12-04 at 10:55 -0800, Ron Mayer wrote:
 Joshua D. Drake wrote: 
  Although I get your point, that is a job for sqllite not postgresql.
  PostgreSQL is not a end all be all solution and it is definitely not
  designed to be embedded which is essentially what you are suggesting
  with that kind of configuration.
 
 But these sales people are selling a postgres based product. It'd be
 both much less convincing to demo a different application stack; as
 well as not a very productive use of the developer's time.

Fair enough, then make sure you are demoing on a platform that can
handle PostgreSQL :)

Joshua D. Drake


 
 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Kevin Grittner
 Joshua D. Drake [EMAIL PROTECTED] wrote: 
 
 Fair enough, then make sure you are demoing on a platform that can
 handle PostgreSQL :)
 
There are a lot of good reasons for people to be running an instance
of PostgreSQL on a small machine, running it on a machine with other
software, or running many clusters of PostgreSQL on a single machine. 
It may not be possible for this tool to generate useful values for all
of these situations, but it seems to me that should be viewed as a
limitation of the tool, rather than some moral failing on the part of
the people with those needs.  Let's cover what we can, document the
limitations, and avoid any user-hostile tones
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Joshua D. Drake
On Thu, 2008-12-04 at 14:05 -0600, Kevin Grittner wrote:
  Joshua D. Drake [EMAIL PROTECTED] wrote: 
  
  Fair enough, then make sure you are demoing on a platform that can
  handle PostgreSQL :)
  
 There are a lot of good reasons for people to be running an instance
 of PostgreSQL on a small machine, running it on a machine with other
 software, or running many clusters of PostgreSQL on a single machine. 
 It may not be possible for this tool to generate useful values for all
 of these situations, but it seems to me that should be viewed as a
 limitation of the tool, rather than some moral failing on the part of
 the people with those needs.  Let's cover what we can, document the
 limitations, and avoid any user-hostile tones

I didn't say don't run on a small machine :) I said make sure you run on
one that is up for the job. There is a difference.

Joshua D. Drake



  
 -Kevin
 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Greg Smith

On Thu, 4 Dec 2008, Ron Mayer wrote:

OTOH there tends to be less DBA time available to tune the smaller demo 
instances that comego as sales people upgrade their laptops; so 
improved automation would be much appreciated there.


I have a TODO list for things that might be interesting to add to a V2.0 
version of this tool.  I just added an item to there for extending the 
tuning model usefully into working on systems with smaller amounts of RAM. 
I'm not opposed to the idea, just don't have any background doing that and 
I'm trying to stay focused on the more common big-machine problems for the 
first release.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Kevin Grittner
 Greg Smith [EMAIL PROTECTED] wrote: 
 On Thu, 4 Dec 2008, Ron Mayer wrote:
 
 OTOH there tends to be less DBA time available to tune the smaller
demo 
 instances that comego as sales people upgrade their laptops; so 
 improved automation would be much appreciated there.
 
 I have a TODO list for things that might be interesting to add to a
V2.0 
 version of this tool.  I just added an item to there for extending
the 
 tuning model usefully into working on systems with smaller amounts of
RAM. 
 I'm not opposed to the idea, just don't have any background doing
that and 
 I'm trying to stay focused on the more common big-machine problems
for the 
 first release.
 
I think there needs to be some easy way to choose an option which
yields a configuration similar to what we've had in recent production
releases -- something that will start up and allow minimal testing on
even a small machine.
 
It also occurred to me that if initdb is generating its initial
configuration with this, some special handling might be needed for the
make check runs.  It isn't unusual to want to do a build and check
it on a production server.  If the generated configuration used in
regression tests is assuming it owns the machine there could be a
problem.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Greg Smith

On Thu, 4 Dec 2008, Kevin Grittner wrote:


I think there needs to be some easy way to choose an option which
yields a configuration similar to what we've had in recent production
releases -- something that will start up and allow minimal testing on
even a small machine.


But that's the goal of what comes out of initdb already; I'm missing how 
that is something this script would even get involved in.  Is your 
suggestion to add support for a minimal target that takes a tuned-up 
configuration file and returns it to that state, or did you have something 
else in mind?


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Kevin Grittner
 Greg Smith [EMAIL PROTECTED] wrote: 
 On Thu, 4 Dec 2008, Kevin Grittner wrote:
 
 I think there needs to be some easy way to choose an option which
 yields a configuration similar to what we've had in recent
production
 releases -- something that will start up and allow minimal testing
on
 even a small machine.
 
 But that's the goal of what comes out of initdb already; I'm missing
how 
 that is something this script would even get involved in.  Is your 
 suggestion to add support for a minimal target that takes a tuned-up

 configuration file and returns it to that state, or did you have
something 
 else in mind?
 
Perhaps I misunderstood some earlier post -- I had gotten the
impression that initdb was going to use this utility for the initial
postgresql.conf file.  If that's not happening, then you can ignore my
last post as noise, with my apologies.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Josh Berkus

Greg Smith wrote:

On Thu, 4 Dec 2008, Ron Mayer wrote:

OTOH there tends to be less DBA time available to tune the smaller 
demo instances that comego as sales people upgrade their laptops; so 
improved automation would be much appreciated there.


I have a TODO list for things that might be interesting to add to a V2.0 
version of this tool.  I just added an item to there for extending the 
tuning model usefully into working on systems with smaller amounts of 
RAM. I'm not opposed to the idea, just don't have any background doing 
that and I'm trying to stay focused on the more common big-machine 
problems for the first release.


We *have* a configuration for small amounts of RAM; it's our current 
default configuration.


However, I take the point that the workstation calculations should 
work down to 128MB of system RAM.  I'll check.


--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Robert Haas
On Thu, Dec 4, 2008 at 5:11 PM, Greg Smith [EMAIL PROTECTED] wrote:
 On Thu, 4 Dec 2008, Kevin Grittner wrote:

 I think there needs to be some easy way to choose an option which
 yields a configuration similar to what we've had in recent production
 releases -- something that will start up and allow minimal testing on
 even a small machine.

 But that's the goal of what comes out of initdb already; I'm missing how
 that is something this script would even get involved in.  Is your
 suggestion to add support for a minimal target that takes a tuned-up
 configuration file and returns it to that state, or did you have something
 else in mind?

I humbly suggest that the memory-related settings output by the tool
don't need to match what initdb outputs.  But the values of
checkpoint_segments, constraint_exclusion, and
default_statistics_target probably should, given a small mixed-mode
database.

You've probably all figured out by now that I, personally, in my own
opinion, think that default_statistics_target = 10 is just fine for
such a database, but if a decision is made to change that number, so
be it.  Just let's please change it both places, rather than letting
contrib/pgtune be a backdoor to get around not liking what initdb
does.  And similarly with the other parameters...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Greg Smith

On Thu, 4 Dec 2008, Robert Haas wrote:

Just let's please change it both places, rather than letting 
contrib/pgtune be a backdoor to get around not liking what initdb does. 
And similarly with the other parameters...


Someone running pgtune has specifically asked for their database to be 
tuned for performance; someone running initdb has not.  It's not a 
backdoor, the defaults for a tuned small system and what comes out of 
initdb have completely different priorities.  The linking of the two that 
keeps happening in this thread makes no sense to me, and frankly I 
consider the whole topic an off-topic distraction.


I never had any intention of making changes to the basic configuration 
that comes out of initdb, the burden of proof for making a change there is 
far higher than I feel justified in clearing.  The last time I got an 
initdb setting changed I had days worth of focused test data to present 
with the suggestion.


If I take a poll of half a dozen experienced PostgreSQL administrators 
with performance tuning background (which is basically where the pgtune 
settings are coming from) and I hear the same story about a setting from 
most of them, that's good enough for me to justify a settings change for 
this tool; the whole idea is to pool expert opinion and try to distill it 
into code.  But that's not good enough for changing that setting for 
everybody who installs the database.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-04 Thread Joshua D. Drake
On Thu, 2008-12-04 at 21:51 -0500, Greg Smith wrote:
 On Thu, 4 Dec 2008, Robert Haas wrote:
 
  Just let's please change it both places, rather than letting 
  contrib/pgtune be a backdoor to get around not liking what initdb does. 
  And similarly with the other parameters...
 
 Someone running pgtune has specifically asked for their database to be 
 tuned for performance; someone running initdb has not.  It's not a 
 backdoor,

Right.

  the defaults for a tuned small system and what comes out of 
 initdb have completely different priorities.

IMO the priority of initdb is, Get the damn thing running

   The linking of the two that 
 keeps happening in this thread makes no sense to me, and frankly I 
 consider the whole topic an off-topic distraction.
 

Agreed.


Joshua D. Drake

-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 Looks like I need to add Python 2.5+Linux to my testing set.  I did not
 expect that the UNIX distributions of Python 2.5 would ship with wintypes.py
 at all.  I think I can fix this on the spot though.  On line 40, you'll find
 this bit:

 except ImportError:

 Change that to the following:

 except ImportError,ValueError:

That didn't work, same error message.

 And it should pass that point.  If it doesn't, you can try the completely
 general:

 except:

That worked.

The settings that this initially spit out (I guess it defaults to
mixed mode) didn't look too sane to me, because as discussed
elsewhere on this thread 50 is not a reasonable value for
default_statistics_target for my installation.  It also wanted to set
constraint_exclusion to on, which I'm pretty confident is useless.

Then I tried -T web and got what seemed like a more reasonable set
of values.  But I wasn't sure I needed that many connections, so I
added -c 150 to see how much difference that made.  Kaboom!

$ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150
Traceback (most recent call last):
  File ./pgtune, line 463, in module
wizardTune(config,options,settings)
  File ./pgtune, line 403, in wizardTune
'web':mem/con, 'oltp':mem/con,'dw':mem/con/2,
TypeError: unsupported operand type(s) for /: 'int' and 'str'

I'm not sure what mixed mode is supposed to be, but based on what
I've seen so far, I'm a skeptical of the idea that encouraging people
to raise default_statistics_target to 50 and turn on
constraint_exclusion is reasonable.  I'm also a bit surprised that
there doesn't seem to be anything here that depends on the size of the
database, even order-of-magnitude.  It seems like the right value for
checkpoint_segments, at least, might depend on that.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote:
  Looks like I need to add Python 2.5+Linux to my testing set.  I did not
  expect that the UNIX distributions of Python 2.5 would ship with wintypes.py
  at all.  I think I can fix this on the spot though.  On line 40, you'll find
  this bit:
 
  except ImportError:
 
  Change that to the following:
 
  except ImportError,ValueError:
 
 That didn't work, same error message.
 
  And it should pass that point.  If it doesn't, you can try the completely
  general:
 
  except:
 
 That worked.
 
 The settings that this initially spit out (I guess it defaults to
 mixed mode) didn't look too sane to me, because as discussed
 elsewhere on this thread 50 is not a reasonable value for
 default_statistics_target for my installation.  It also wanted to set
 constraint_exclusion to on, which I'm pretty confident is useless.
 
 Then I tried -T web and got what seemed like a more reasonable set
 of values.  But I wasn't sure I needed that many connections, so I
 added -c 150 to see how much difference that made.  Kaboom!
 
 $ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150
 Traceback (most recent call last):
   File ./pgtune, line 463, in module
 wizardTune(config,options,settings)
   File ./pgtune, line 403, in wizardTune
 'web':mem/con, 'oltp':mem/con,'dw':mem/con/2,
 TypeError: unsupported operand type(s) for /: 'int' and 'str'
 
 I'm not sure what mixed mode is supposed to be, but based on what
 I've seen so far, I'm a skeptical of the idea that encouraging people
 to raise default_statistics_target to 50 and turn on
 constraint_exclusion is reasonable.

Why?

   I'm also a bit surprised that
 there doesn't seem to be anything here that depends on the size of the
 database, even order-of-magnitude.  It seems like the right value for
 checkpoint_segments, at least, might depend on that.

What does checkpoint_segments have to do with the size of the database?

Joshua D. Drake




 
 ...Robert
 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 I'm not sure what mixed mode is supposed to be, but based on what
 I've seen so far, I'm a skeptical of the idea that encouraging people
 to raise default_statistics_target to 50 and turn on
 constraint_exclusion is reasonable.

 Why?

Because both of those settings are strictly worse for my database than
the defaults.  I don't have any partitioned tables, and see:

http://archives.postgresql.org/pgsql-hackers/2008-11/msg01837.php

   I'm also a bit surprised that
 there doesn't seem to be anything here that depends on the size of the
 database, even order-of-magnitude.  It seems like the right value for
 checkpoint_segments, at least, might depend on that.

 What does checkpoint_segments have to do with the size of the database?

It seems unlikely that you would want 256 MB of checkpoint segments on
a database that is only 100 MB (or even 500 MB).  But you might very
well want that on a database that is 1 TB.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote:
 I'm not sure what mixed mode is supposed to be, but based on what
 I've seen so far, I'm a skeptical of the idea that encouraging people
 to raise default_statistics_target to 50 and turn on
 constraint_exclusion is reasonable.

 Why?

Well did you have any response to what I posited before? I said mixed should
produce the same settings that the default initdb settings produce. At least
on a moderately low-memory machine that initdb targets.

It sure seems strange to me to have initdb which presumably is targeting a
mixed system -- where it doesn't know for sure what workload will be run --
produce a different set of values than the tuner on the same machine.

   I'm also a bit surprised that
 there doesn't seem to be anything here that depends on the size of the
 database, even order-of-magnitude.  It seems like the right value for
 checkpoint_segments, at least, might depend on that.

 What does checkpoint_segments have to do with the size of the database?

I had the same reaction but I think he's right.

checkpoint_segments is the maximum amount of space you want the WAL to take up
(ideally). Presumably on a small database you don't want hundreds of megabytes
of WAL for a 10M database. But on a terabyte data warehouse sitting on a big
SAN you're not going to be concerned with how much space the WAL files are
taking. In fact, really it would be nice if we allowed units of space (MB, GB,
etc) for checkpoint_segments.

I used to think of checkpoint_segments in terms of transaction rate and
maximum tolerable recovery time but really if those are your constraints
you're better off using checkpoint_timeout I think.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 Well did you have any response to what I posited before? I said mixed should
 produce the same settings that the default initdb settings produce. At least
 on a moderately low-memory machine that initdb targets.

I'm actually really skeptical of this whole idea of modes.  The main
thing mode does, other than set max connections, is handle desktop
differently than other modes by decreasing shared_buffers and
effective_cache_size by 4x and work_mem by 3x.  And the default
settings for max_connections are a SWAG that could easily be way off
for any particular installation.  I think it would be more useful to
get rid of modes, accept the user is going to have to specify
max_connections if the default of, say, 100 is not reasonable, and
handle the desktop case by telling the user to rerun the tool
overriding the system memory with a lower value.

I'm not sure if you've thought about this, but there is also a
difference between max_connections and maximum LIKELY connections.
For example my apps don't have too many users, since they are
internal-facing.  But setting max_connections to 100 gives me a nice
buffer just in case everyone decides to log on at once.  Still, for
performance reasons, I'd prefer to calculate based on a more likely
scenario, where the concurrent user count might be only 10 or 20.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 I can see an argument about constraint_exclusion but
 default_statistics_target I don't.

Why not?  I don't want to accept a big increase in ANALYZE times (or
planning times, though I'm really not seeing that at this point)
without some benefit.

 It seems unlikely that you would want 256 MB of checkpoint segments on
 a database that is only 100 MB (or even 500 MB).  But you might very
 well want that on a database that is 1 TB.

 It also seems unlikely that you would hit 256MB of checkpoint segments
 on a 100MB database before checkpoint_timeout and if you did, you
 certainly did need them.

So why do we have this parameter at all?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 It also seems unlikely that you would hit 256MB of checkpoint segments
 on a 100MB database before checkpoint_timeout and if you did, you
 certainly did need them.

 Remember postgresql only creates the segments when it needs them. 

Should we change the initdb output then?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 If you are concerned about the analyze time between 10, 50 and 150, I
 would suggest that you are concerned about the wrong things. Remember

I can't rule that out.  What things do you think I should be concerned
about?  ISTM that default_statistics_target trades off ANALYZE time
and query planning time vs. the possibility of better plans.  If the
former considerations are not an issue for dst = 50, then maybe we
should emit 50 by default.  But the limited evidence that has been
published in this forum thus far doesn't support that contention.

  It also seems unlikely that you would hit 256MB of checkpoint segments
  on a 100MB database before checkpoint_timeout and if you did, you
  certainly did need them.

 So why do we have this parameter at all?

 Excellent question, for a different thread :)

I think the rhetorical answer is so that we don't fill up the disk,
which gets us back to database size.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Wed, 2008-12-03 at 17:33 -0500, Robert Haas wrote:
 On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
  If you are concerned about the analyze time between 10, 50 and 150, I
  would suggest that you are concerned about the wrong things. Remember
 
 I can't rule that out.  What things do you think I should be concerned
 about?

Your databases production performance with the change of the parameter.
Quite a bit more often than not, your problem (if you have one) isn't
going to be default_statistics_target is too high.

   ISTM that default_statistics_target trades off ANALYZE time
 and query planning time vs. the possibility of better plans.  If the
 former considerations are not an issue for dst = 50, then maybe we
 should emit 50 by default.  But the limited evidence that has been
 published in this forum thus far doesn't support that contention.
 

Actually there are years worth of evidence in these archives. Not that
the 50 is the right number but that the current settings are definitely
wrong and that higher ones are needed. That people generally start
around 100 and go from there, except where they don't and then someone
like Tom, I or some other person says, Oh you need to increase
default_statistics_target.

There is no empirical evidence that 50 is the right setting but there is
more than enough anecdotal evidence to suggest that 50 is a lot better
than 10 and that even higher than 50 is reasonable. In an effort to
follow the PostgereSQL conservative mantra, 50 is a good compromise. 


   It also seems unlikely that you would hit 256MB of checkpoint segments
   on a 100MB database before checkpoint_timeout and if you did, you
   certainly did need them.
 
  So why do we have this parameter at all?
 
  Excellent question, for a different thread :)
 
 I think the rhetorical answer is so that we don't fill up the disk,

I don't think at any time I have said to my self, I am going to set this
parameter low so I don't fill up my disk. If I am saying that to myself
I have either greatly underestimated the hardware for the task. Consider
that we are quarreling over what amounts to a nominal amount of hard
drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
capacity than that.

*If* hard drive is a space (that much of) a concern then you are having
other problems already that pgtune won't satisfy and you should be
manually tuning the conf in the first place.

Joshua D. Drake


 which gets us back to database size.
 
 ...Robert
 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Kevin Grittner
 Robert Haas [EMAIL PROTECTED] wrote: 
 On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake
[EMAIL PROTECTED] wrote:
 If you are concerned about the analyze time between 10, 50 and 150,
I
 would suggest that you are concerned about the wrong things.
Remember
 
 I can't rule that out.  What things do you think I should be
concerned
 about?  ISTM that default_statistics_target trades off ANALYZE time
 and query planning time vs. the possibility of better plans.  If the
 former considerations are not an issue for dst = 50, then maybe we
 should emit 50 by default.  But the limited evidence that has been
 published in this forum thus far doesn't support that contention.
 
One more data point to try to help.
 
While the jump from a default_statistics_target from 10 to 1000
resulted in a plan time increase for a common query from 50 ms to 310
ms, at a target of 50 the plan time was 53 ms.  Analyze time was 7.2
minutes and 18.5 minutes for targets of 10 and 50.  This is an 842 GB
database on an 8 processor (3.5 GHz Xeon) machine with 64 GB RAM
running (soon to be updated) PostgreSQL 8.2.7.
 
Based on the minimal plan time increase of this test, we're going to
try 50 in production and see how it goes.
 
It's worth pondering that at the target of 1000, had we put that into
production, running this query 300,000 times per day would have used
21 hours and 40 minutes of additional CPU time per day on planning the
runs of this one query, while a target of 50 only consumes an
additional 15 minutes of 3.5 GHz CPU time per day.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Gregory Stark wrote:


It sure seems strange to me to have initdb which presumably is targeting a
mixed system -- where it doesn't know for sure what workload will be run --
produce a different set of values than the tuner on the same machine.


It's been a long time since the output from initdb was targeting anything 
but a minimal system with an untuned kernel and limited resources.  If you 
invert the normal tuning recommendations, as if its initial configuration 
were the output from typical practice, it would be aiming at a system with 
approximately 128MB of RAM.  That feels about right to me; when I had 
128MB of RAM in my high-end P2-300 server running PG 7.0, 32MB of 
shared_buffers was huge and 3 checkpoints segments was plenty.  I don't 
recall regularly dirtying things fast enough to see checkpoints occuring 
too fast then like you can do trivially nowadays.  Here in 2008, I push 
checkpoint_segments up to 10 even for the most trivial apps lest the logs 
fill with those buggers the first time I run an update on a table.


Right now, my program doesn't fiddle with any memory settings if you've 
got less than 256MB of RAM.  Were someone to champion the idea that 
*nothing* should be fiddled with in those cases, that's not an 
unreasonable position.  I'm not the sort to be too concerned myself that 
the guy who thinks he's running a DW on a system with 64MB of RAM might 
get bad settings, but it's a fair criticism to point that out as a 
problem.



In fact, really it would be nice if we allowed units of space (MB, GB,
etc) for checkpoint_segments.


That's a good way to think about this, let's run with that for a minute. 
The values I'm throwing in there look like this (if your tab stops aren't 
at 8 characters this will suck):


  Completion  Max   Max
TypeSegsTarget  SegmentsUsage
web 8   0.7 23  368MB
oltp16  0.9 47  752MB
dw  64  0.9 187 3GB
mixed   16  0.9 47  752MB
desktop 3   0.5 9   144MB

Is 368MB of overhead unreasonable for a web application database today, 
where you can get a mirrored pair of disks for under $1/GB?  It's only the 
DW case that even starts to leave trivial territory.  Your example of 
somebody who thinks the overhead is too high on their 10MB database is 
already being blown away even at the default of 3 segments (assuming that 
data has enough churn on it to go through that many segments ever--if it 
doesn't then the maximum doesn't matter anyway).


The reality here is that it's the recovery playback time that's the real 
bear.  If I were trying to argue against me, what would be more persuasive 
is some tests showing how long it takes to sort through, cleanup, and 
replay the appropriate portions of as many as 47 segments worth of WAL 
after an unclean shutdown when checkpoint_segments=16.  Given how long 
that takes, it might be possible to find a modern system takes a while to 
process that much WAL volume.  It's pretty rare I run into that (usually 
only after I do something abusive), whereas complaints about the logs 
filling with checkpoint warnings on systems set to the default seem to pop 
up all the time.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Kevin Grittner [EMAIL PROTECTED] writes:

 One more data point to try to help.
  
 While the jump from a default_statistics_target from 10 to 1000
 resulted in a plan time increase for a common query from 50 ms to 310
 ms, at a target of 50 the plan time was 53 ms.  

That sounds like it would be an interesting query to analyze in more detail.
Is there any chance to could run the complete graph and get a chart of analyze
times for all statistics values from 1..1000 ? And log the explain plans to a
file so we can look for at what statistics targets the plan changed?

Or if the data is public I would be interested in looking at doing it if you
want to send it to me.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I
 started to do this for you last week but got side-tracked. Do you have any
 time for this?

I can do it if you have a script.

 So how big should a minimum postgres install be not including your data? 
 Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever
 number we pick (or allow the user to pick) will determine how large this value
 ought to be. And incidentally also provide a bound on
 autovacuum_max_freeze_age as Heikki pointed out on another thread.
 

I fail to see what any of the above paragraph has to do with
checkpoint_segments.

Anyway, I have made my arguments.

Joshua D. Drake


 
-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Robert Haas wrote:


I'm not sure if you've thought about this, but there is also a
difference between max_connections and maximum LIKELY connections.


It's actually an implicit assumption of the model Josh threw out if you 
stare at the numbers.  The settings for work_mem are twice as high per 
connection in the Web+OLTP application cases, based on the assumption that 
you're just not going to get everybody doing sorting at once in those 
situations.  I toyed with exposing that as an explicit connection load 
duty factor, then remembered I was trying to deliver something rather 
than tweak the parameters forever.  It may be a bit too aggressive as 
written right now in those cases.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Guillaume Smet wrote:


- it would be really nice to make it work with Python 2.4 as RHEL 5 is
a Python 2.4 thing and it is a very widespread platform out there,


The 2.5 stuff is only required in order to detect memory on Windows.  My 
primary box is RHEL5 and runs 2.4, it works fine there.



- considering the audience of this tool, I think you should explain in
the usage text which type of workload implies each database type (DW,
OLTP, Web, Mixed, Desktop).


Once I'm done with the docs I'll refer over to those, it's too much to put 
into the usage without cluttering it.



- it would be nice to be able to define the architecture (32-64 bits)
from the command line (especially considering I won't be able to run
it on our target boxes which are all RHEL 5 :))


I'm starting to lean toward making everything that gets detected as also 
being possible to override, for these case.  I want to make this just work 
in as many cases as possible, but the situation where someone is 
configuring/testing on a system other than the server is pretty common. 
Now that I think about it I often setup configs on my 32-bit laptop and 
them move them over onto 64-bit servers.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 On Wed, 3 Dec 2008, Gregory Stark wrote:

 It sure seems strange to me to have initdb which presumably is targeting a
 mixed system -- where it doesn't know for sure what workload will be run --
 produce a different set of values than the tuner on the same machine.

 It's been a long time since the output from initdb was targeting anything but 
 a
 minimal system with an untuned kernel and limited resources.  If you invert 
 the
 normal tuning recommendations, as if its initial configuration were the output
 from typical practice, it would be aiming at a system with approximately 128MB
 of RAM.  That feels about right to me; when I had 128MB of RAM in my high-end
 P2-300 server running PG 7.0, 32MB of shared_buffers was huge and 3 
 checkpoints
 segments was plenty.  I don't recall regularly dirtying things fast enough to
 see checkpoints occuring too fast then like you can do trivially nowadays.

Well I think there's also an assumption in initdb that Postgres can't assume
it's on a dedicated machine. So whether it's 32MB on a dedicated 128MB machine
or 32MB on a 256MB machine where it's only expected to be half the workload of
the machine it works out to about the same thing.

 Right now, my program doesn't fiddle with any memory settings if you've got
 less than 256MB of RAM.  

What I'm suggesting is that you shouldn't have to special case this. That you
should expect whatever formulas you're using to produce the same values as
initdb if they were run on the same machine initdb is targeting.

But actually I'm more concerned with the *non* memory related parameters. It
may make sense to tweak those one way or the other for oltp or dss but mixed
should be exactly what initdb produces since that's exactly what it's
targeting -- a system that will have a wide mixture of queries and must
function reasonably well for both data warehouse and oltp queries.

 Completion  Max   Max
 Type  SegsTarget  SegmentsUsage
 web   8   0.7 23  368MB
 oltp  16  0.9 47  752MB
 dw64  0.9 187 3GB
 mixed 16  0.9 47  752MB
 desktop   3   0.5 9   144MB

(incidentally using tab stops in emails is probably a bad idea because of
quoting as above)

Uhm, I hadn't actually seen this list before. I don't understand how web is
different from oltp. A web service really is just one (very typical) example
of an oltp application.

And desktop seems like an outlier here. I suppose it's meant to capture
whether postgres is on a dedicated box? But it's possible to have a
non-dedicated oltp application or non-dedicated data warehouse box just as
easily. It's an orthogonal issue from the oltp/data-warehouse axis.

 Is 368MB of overhead unreasonable for a web application database today

Well I think it's more than most people expect a single application install to
take up before they start putting data in it. It would probably work better if
we were asking how big their database was and then could say, well, you said
you had 10G of data so 300MB of overhead isn't going to be so bad.


 The reality here is that it's the recovery playback time that's the real bear.

I agree, but then that's what checkpoint_timeout is for, no? It might take
longer to replay but the recovery time should bear some relation to how long
it took to write out the wal. More so than to the sheer size of the wal. 

 whereas complaints about the logs filling with checkpoint warnings on systems
 set to the default seem to pop up all the time.

filling? The cure to having too much space taken up by logs is to take up
space with, well, logs?

The logs are filling up with warnings which explain exactly what parameter to
adjust. Are there really complaints about this?

I'm really beginning to think the root of the problem is the name. If it were
transaction_log_max_space and measured in megabytes people would be happy to
say ok, I'll make space for 100MB of logs or whatever. Today they don't know
what to set it to or what the impact of setting it will be.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Robert Haas wrote:


Then I tried -T web and got what seemed like a more reasonable set
of values.  But I wasn't sure I needed that many connections, so I
added -c 150 to see how much difference that made.  Kaboom!


That and the import errors fixed in the version attached (just replacing 
the script, not its data), thanks for testing that out.



I'm not sure what mixed mode is supposed to be, but based on what
I've seen so far, I'm a skeptical of the idea that encouraging people
to raise default_statistics_target to 50 and turn on
constraint_exclusion is reasonable.


The statistics stuff is obviously a broader discussion, will let that rage 
in existing threads.  The reason for setting constraint_exclusion in the 
mixed case is that people who just pick the defaults without reading 
anything will get a configuration that supports partitions usefully.  One 
of the HINTs I intend to throw out for that specific case is that they 
should turn it off if they don't ever intend to use paritions.


The idea of the mixed mode is that you want to reduce the odds someone 
will get a massively wrong configuration if they're not paying attention. 
Is it worse to suffer from additional query overhead if you're sloppy with 
the tuning tool, or to discover addition partitions didn't work as you 
expected?  That's a tough call; I could invert things, so that it defaults 
to off in mixed mode, as always, and just produces a HINT to turn it on. 
I don't have a really strong opinion there either way.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

pgtune.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 Is it worse to suffer from additional query overhead if you're sloppy with
 the tuning tool, or to discover addition partitions didn't work as you
 expected?

Surely that's the same question we faced when deciding what the Postgres
default should be? 

That and the unstated other question Is someone more likely to use partitions
without reading the manual or not use partitions without reading the manual
about the down-sides of constraint_exclusion (in the partitioning
section)


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark

Joshua D. Drake [EMAIL PROTECTED] writes:

 On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I
 started to do this for you last week but got side-tracked. Do you have any
 time for this?

 I can do it if you have a script.

Well, I can send you what I have so far but it still needs more work. I only
got as far as the graphs I sent earlier which don't include scanning for
changed plans. Also, if you have any sample databases with skewed data sets
that would be interesting.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Alvaro Herrera
Gregory Stark escribió:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I don't think at any time I have said to my self, I am going to set this
  parameter low so I don't fill up my disk. If I am saying that to myself
  I have either greatly underestimated the hardware for the task. Consider
  that we are quarreling over what amounts to a nominal amount of hard
  drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
  capacity than that.
 
 Well my phone has 16G of RAM, why not 1 ?

I don't think the disk space used is the only consideration here.  You
also have to keep recovery time in mind.  If you set it to 1000,
recovery would take way too long.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:

  I
 started to do this for you last week but got side-tracked. Do you have any
 time for this?

 I can do it if you have a script.

 So how big should a minimum postgres install be not including your data? 
 Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever
 number we pick (or allow the user to pick) will determine how large this 
 value
 ought to be. And incidentally also provide a bound on
 autovacuum_max_freeze_age as Heikki pointed out on another thread.
 

 I fail to see what any of the above paragraph has to do with
 checkpoint_segments.

Are we all on the same page on what checkpoint_segments does? It's the number
of segments of WAL log postgres will allow to accumulate before it triggers a
checkpoint and trims off any it doesn't need. 

That means even if your database is just churning updating the same records
over and over the WAL will grow to this size before Postgres makes any attempt
to trim it (unless it hits checkpoint_timeout but that's a separate tunable).
If you're loading data all your data will go into the heap *and* the wal log
until it hits this size and triggers a checkpoint.

So this is the minimum amount of extra space you need in addition to your data
for a functioning postgres database install not including your data. If you
don't anticipate postgres using this much space and set aside enough space for
it, your database is at risk of randomly stopping and producing errors when it
can't create new log files you told it needs.

It's interesting that this is the *only* parameter we can come up with that
really directly depends on disk space. The tables are obviously going to be as
big as they have to be and there's not much to do about that. If we could
eliminate this parameter it would be a lot nicer. 


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Joshua D. Drake
On Wed, 2008-12-03 at 22:17 -0300, Alvaro Herrera wrote:
 Gregory Stark escribió:
  Joshua D. Drake [EMAIL PROTECTED] writes:
 
   I don't think at any time I have said to my self, I am going to set this
   parameter low so I don't fill up my disk. If I am saying that to myself
   I have either greatly underestimated the hardware for the task. Consider
   that we are quarreling over what amounts to a nominal amount of hard
   drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
   capacity than that.
  
  Well my phone has 16G of RAM, why not 1 ?
 
 I don't think the disk space used is the only consideration here.  You
 also have to keep recovery time in mind.  If you set it to 1000,
 recovery would take way too long.

Well certainly but the original argument that came back was, (from
Robert Haas):


It seems unlikely that you would want 256 MB of checkpoint segments on
a database that is only 100 MB (or even 500 MB).  But you might very
well want that on a database that is 1 TB.


My whole point is that:

1. It seems unlikely that you would hit 256MB of checkpoint segments on
a 100MB database before checkpoint_timeout and if you did, you certainly
did need them. (the checkpoint segments)

2. taking up space is such a minute concern in comparison to the
potential benefit.

Recovery is certainly a consideration but let's be realistic it is the
last consideration because it is the least likely to happen. What is
more likely to happen is IO spikes because we are recycling logs too
much.

I know we have some other facilities to deal with that now too but it
doesn't completely negate the problem and in my opinion, increasing the
checkpoint_segments provides no perceivable downside in production use
but does provide significant perceivable upside.

Joshua D. Drake




-- 
PostgreSQL
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Thu, 4 Dec 2008, Gregory Stark wrote:


Right now, my program doesn't fiddle with any memory settings if you've got
less than 256MB of RAM.


What I'm suggesting is that you shouldn't have to special case this. That you
should expect whatever formulas you're using to produce the same values as
initdb if they were run on the same machine initdb is targeting.


The reason that approach isn't taken is that the model here assumes the OS 
overhead is negligable relative to everything else going on.  If you've 
only got a small amount of RAM, that assumption is so badly broken that 
you can't just extend the curves for everything down to there and expect 
that what comes out will make any sense.  I started to make a more 
complicated bit that did scale down to the bottom by modeling the overhead 
better, Josh talked me out of doing it for now.



But actually I'm more concerned with the *non* memory related parameters. It
may make sense to tweak those one way or the other for oltp or dss but mixed
should be exactly what initdb produces since that's exactly what it's
targeting -- a system that will have a wide mixture of queries and must
function reasonably well for both data warehouse and oltp queries.


The only way this line of discussion will go is toward talking about what 
should be changed in initdb to make it more representative of the current 
real world, and I know that's not going anywhere (see 
default_statistics_target=10).  The idea that the sample configuration 
is tuned usefully for any application whatsoever gets nothing from me but 
a chuckle.



And desktop seems like an outlier here. I suppose it's meant to capture
whether postgres is on a dedicated box? But it's possible to have a
non-dedicated oltp application or non-dedicated data warehouse box just as
easily.


That's the target for something that's not a dedicated server--a desktop 
PC you use as a general workstation, maybe you're installing PostgreSQL as 
a developer that's competing with your web server and other apps; 
something like that.  There might be a better name for that.



Is 368MB of overhead unreasonable for a web application database today


Well I think it's more than most people expect a single application install to
take up before they start putting data in it.


Segments don't get allocated until you churn through that much WAL 
activity; that figure is an upper-bound after you've pushed more than that 
worth of data through WAL and into the database.  The only example where 
this overhead isn't dwarfed by the size of the resulting database is where 
some small number of records are inserted, then constantly updated and 
vacuumed.  And you know what?  The person doing that is likely to really 
benefit from having checkpoint_segments set to a larger value.  Update and 
vacuum heavy workloads are exactly the sort where you end up checkpointing 
too often with the default parameters.



I'm really beginning to think the root of the problem is the name. If it were
transaction_log_max_space and measured in megabytes people would be happy to
say ok, I'll make space for 100MB of logs or whatever. Today they don't know
what to set it to or what the impact of setting it will be.


Unless they do something crazy like read the documentation:

http://www.postgresql.org/docs/8.3/static/wal-configuration.html
There will always be at least one WAL segment file, and will normally not 
be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 
files. Each segment file is normally 16 MB (though this size can be 
altered when building the server). You can use this to estimate space 
requirements for WAL.


Too complicated for most people you say?  I agree; that's why I put some 
annotated examples for what those translate into 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server like 32 
(checkpoint every 512MB).


What fun.  I'm beginning to remember why nobody has ever managed to 
deliver a community tool that helps with this configuration task before.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 The idea of the mixed mode is that you want to reduce the odds someone will
 get a massively wrong configuration if they're not paying attention. Is it
 worse to suffer from additional query overhead if you're sloppy with the
 tuning tool, or to discover addition partitions didn't work as you expected?
  That's a tough call; I could invert things, so that it defaults to off in
 mixed mode, as always, and just produces a HINT to turn it on. I don't have
 a really strong opinion there either way.

I think that the strong feelings about default_statistics_target and
constraint_exclusion come from the fact that when they are too low (in
the first case) or off (in the second case), you can get very, very
bad query plans.  The penalties in the opposite direction are more
subtle.  If they're so subtle that we don't care about incurring them,
then let's change initdb too.  If not, then let's not have the tuning
tool generate them by default either.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 What fun.  I'm beginning to remember why nobody has ever managed to deliver
 a community tool that helps with this configuration task before.

I have to say I really like this tool.  It may not be perfect but it's
a lot easier than trying to do this analysis from scratch.  And we are
really only arguing about a handful of settings.

It wouldn't take a lot to convince me that checkpoint_segments=3 is
too low.  I easily blew through that testing the bulk-insert tuning
patch.  I'm curious why wal_buffers is being set to 512 *
checkpoint_segments.  Are they related?  The default value for
wal_buffers is only 64 kB, which means someone thought you shouldn't
need much space for this at all, but this suggests a setting in the
4-32 MB range, an increase of ~2 orders of magnitude.  For all I know
that could be right but it's a big increase.

Regarding the religious war now in progress, I think it would be
awfully good for someone to offer some thoughts on how to figure out
which particular columns on which particular tables need a higher
statistics target.  That might allow us to either (a) build a wizard
that helps you find those problems that could perhaps be used
alongside this one or (b) incorporate those same smarts into core.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark

Greg Smith [EMAIL PROTECTED] writes:

 On Thu, 4 Dec 2008, Gregory Stark wrote:

 What I'm suggesting is that you shouldn't have to special case this. That you
 should expect whatever formulas you're using to produce the same values as
 initdb if they were run on the same machine initdb is targeting.

 The reason that approach isn't taken is that the model here assumes the OS
 overhead is negligable relative to everything else going on.  

ok that does make sense. But the non-memory parameters...

... I've cut part of my response for a separate thread ...

 And desktop seems like an outlier here. I suppose it's meant to capture
 whether postgres is on a dedicated box? But it's possible to have a
 non-dedicated oltp application or non-dedicated data warehouse box just as
 easily.

 That's the target for something that's not a dedicated server--a desktop PC 
 you
 use as a general workstation, maybe you're installing PostgreSQL as a 
 developer
 that's competing with your web server and other apps; something like that.
 There might be a better name for that.

My point was more that you could have a data warehouse on a non-dedicated
machine, you could have a web server on a non-dedicated machine, or you could
have a mixed server on a non-dedicated machine. I don't see how you would
decide whether to set enable_constraint_exclusion for desktop for example.

 Is 368MB of overhead unreasonable for a web application database today

 Well I think it's more than most people expect a single application install 
 to
 take up before they start putting data in it.

 Segments don't get allocated until you churn through that much WAL activity;
 that figure is an upper-bound after you've pushed more than that worth of data
 through WAL and into the database.  The only example where this overhead isn't
 dwarfed by the size of the resulting database 

Right, well, no, it won't be dwarfed -- it'll be about the same size. Ie, if
you load 100MB into the database there'll be about 100MB of logs generated. Up
to the point where you hit this maximum upper bound.

But yes, right that it's the upper bound for the extra space allocated in
addition to the size of the database. And how much extra space should we
allocate? 

I don't see why this extra space bound should depend on the type of OLTP vss
DSS workload. Only on how much disk space is available that the admin is
willing to dedicate to Postgres. Assuming an admin of a 1TB server is willing
to dedicate 1GB to logs and the admin of a 1GB server would be annoyed to have
to throw more than a few hundred megs seems as reasonable a place as any to
start.

 is where some small number of records are inserted, then constantly updated
 and vacuumed. And you know what? The person doing that is likely to really
 benefit from having checkpoint_segments set to a larger value. Update and
 vacuum heavy workloads are exactly the sort where you end up checkpointing
 too often with the default parameters.

Well there are a few problems with this. a) we have HOT now so you don't need
any vacuums to be part of the picture. b) if you're updating the same pages
over and over again a checkpoint will be super-quick since there will only be
a few pages to write out so no you don't really need some large
checkpoint_segments for any performance reason.

Josh's logic is impeccable -- for the specific use case he's describing of a
truly dedicated server with enough disk space for a major production database.
But not every install is going to have gigabytes of space reserved for it and
not every admin is going to realize that he really should set aside gigabytes
of space even though he only expects his database to be a few megabytes.

 I'm really beginning to think the root of the problem is the name. If it were
 transaction_log_max_space and measured in megabytes people would be happy 
 to
 say ok, I'll make space for 100MB of logs or whatever. Today they don't 
 know
 what to set it to or what the impact of setting it will be.

 Unless they do something crazy like read the documentation:

Well we know nobody does that :/

It's great that Postgres has such great documentation but whenever we have the
chance to replace something with an option which doesn't need any
documentation that would be even better. I'm just exploring whether that's an
option here.

 What fun.  I'm beginning to remember why nobody has ever managed to deliver a
 community tool that helps with this configuration task before.

Well I don't think this is why. Nobody's even tried to do this side of things
before. They always got bogged down in trying to parse config files and such.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Thu, 4 Dec 2008, Gregory Stark wrote:


Greg Smith [EMAIL PROTECTED] writes:


Is it worse to suffer from additional query overhead if you're sloppy with
the tuning tool, or to discover addition partitions didn't work as you
expected?


Surely that's the same question we faced when deciding what the Postgres
default should be?


Gosh, you're right.  I'm really new here, and I just didn't understand how 
things work.  I should have known that there was lots of thorough research 
into that setting before the default was set.  (hangs head in shame)


Wait, what list am I on?  pgsql-hackers?  Oh, crap, that can't be right at 
all then.  This one is actually an interesting example of how this stuff 
ends up ossified without being revisited, I'm glad you brought it up.


First we have to visit the 8.1 and 8.2 documentation.  There we find the 
real reason it originally defaulted to off:


http://www.postgresql.org/docs/8.1/static/runtime-config-query.html 
Currently, constraint_exclusion is disabled by default because it risks 
incorrect results if query plans are cached if a table constraint is 
changed or dropped, the previously generated plan might now be wrong, and 
there is no built-in mechanism to force re-planning.  It stayed off for 
that reason for years.


Then the plan invalidation stuff went into 8.3 that made this no longer 
true.  Bruce even removed the item from the TODO list that used to say 
that constraint_exclusion should be improved to allow it to be used for 
all statements with little performance impact.  Then a couple of months 
later, when the 8.3 docs were being worked on, Tom updated the text to 
remove the obsolete warning about the plan risks:


http://archives.postgresql.org/pgsql-committers/2007-03/msg00372.php

Leaving only the leftovers of the original caveat about how it can also 
cause some overhead as the reason for why it was still off--a concern 
which was certainly more serious when that text was written in 2005 than 
it is today for multiple reasons.


How much was that overhead lowered by the work done in 8.3?  I can't find 
any public information suggesting that was ever even discussed.  The only 
thing I found when poking around looking for it is that Tom had expressed 
some concerns that the proof overhead was too still large back in 2006: 
http://archives.postgresql.org/pgsql-committers/2006-02/msg00035.php


But you know what?  The cached proof comparison bit Tom commited a couple 
of weeks ago shifted the mechanics of the overhead for this specific case 
around, so even if we did have 8.3 results they'd need to get re-run at 
this point anyway.  See below for more on what might be different soon.


So, if you want to say that turning on constraint_exclusion by default is 
a horrible idea because it adds significant overhead, and you have any 
sort of evidence that will still be true for 8.4 on the kind of hardware 
8.4 is likely to run on, I would greatly appreciate that information.


But presuming that serious thought must have went into every decision made 
about what the defaults for all the performance-related parameter in the 
postgresql.conf is something we all know just ain't so.  What I see is a 
parameter that doesn't add enough overhead relative to query execution 
time on today's systems that I've noticed whether it was on or off, one 
that's set to off only by historical accident combined with basic 
conservatism (mainly from Tom far as I can tell, he's a nice reliable 
source for that).  Whereas if it's accidentally set wrong, it can lead to 
massively wrong plans.  I'm not sure what the right move here is, but the 
appeal to authority approach for defending the default here isn't going to 
work on me.



That and the unstated other question Is someone more likely to use partitions
without reading the manual or not use partitions without reading the manual
about the down-sides of constraint_exclusion (in the partitioning
section)


Have you started thinking about the implications of 
http://archives.postgresql.org/message-id/[EMAIL PROTECTED] 
yet?  It is a bold new world of people who partition with less time stuck 
in the manual first we approach, and I was very much thinking about that 
when mulling over whether I agreed with Josh's suggestion to put that into 
the default mixed settings before I went with it (that's right--I wrote 
all the above and it wasn't even my idea originally).  If that doesn't 
make it into 8.4 I will yield to your statement of the boring, 
manual-reading status quo still being on target.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 On Thu, 4 Dec 2008, Gregory Stark wrote:

 Greg Smith [EMAIL PROTECTED] writes:

 Is it worse to suffer from additional query overhead if you're sloppy with
 the tuning tool, or to discover addition partitions didn't work as you
 expected?

 Surely that's the same question we faced when deciding what the Postgres
 default should be?

 Gosh, you're right.  I'm really new here, and I just didn't understand how
 things work.  I should have known that there was lots of thorough research 
 into
 that setting before the default was set.  (hangs head in shame)

Oh no, in this case I meant just that if we want to change it we should change
it in *both* places. That the argument you're making applies just as much to
the Postgres default as it does to the mixed workload default in the tuner.

But I admit I totally didn't remember that the main reason it was originally
off was the lack of plan invalidation. That does rather change things. Perhaps
we should be enabling it now.

If we do though, it shouldn't default one way and then get randomly flipped by
a tool that has the same information to make its decision on. What I'm saying
is that mixed is the same information that initdb had about the workload.

If we do change this then I wonder if we need the parameter at all. I mean, we
don't generally have parameters to turn off random parts of the optimizer...

 How much was that overhead lowered by the work done in 8.3?  I can't find any
 public information suggesting that was ever even discussed.  

Well it does have to compare every constraint with every clause and do a
moderately complex analysis. It's never going to be super-fast for complex
queries. But on the other hand it should drop out pretty fast if the tables
haven't got any constraints so it does seem like it's only hurting people when
they would want it on anyways.

 What I see is a parameter that doesn't add enough overhead relative to query
 execution time on today's systems that I've noticed whether it was on or off

There's a danger in this. There's some famous, probably apocryphal, example of
a fast food restaurant that taste tested their menu and got great results.
Then they cheapened an ingredient and their testers couldn't taste the
difference. Then they did that with another ingredient and another and so on
and each time the testers couldn't taste a difference. And in the end they
ended up rolling out a dramatically inferior menu which people panned compared
to the original...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Mark Wong
On Mon, Dec 1, 2008 at 9:32 PM, Greg Smith [EMAIL PROTECTED] wrote:
 On Mon, 1 Dec 2008, Mark Wong wrote:

 So then I attempted to see if there might have been difference between the
 executing time of each individual query with the above parameters. The
 queries that don't seem to be effected are Q1, Q4, Q12, Q13, and Q15.  Q17
 suggests that anything higher than default_statistics_target=10 is an
 improvement.  The rest of the queries appears not to follow any particular
 trend with respect to default_statistics_target.

 The interesting ones are Q2, Q9, Q17, Q18, and Q20, and that data is much
 more useful than the summary.  As you mention, Q17 improves significantly
 with a higher target.  All of the rest are dramatically slower in one or
 both tests going from default_statistics_target=10 to 100.  Those look like
 the most useful data points on the X axis--the increases from 100 up to 1000
 aren't particularly interesting in most of these, except in Q20 where the
 Power Test seems to oscillate between degrees of good and bad behavior
 seemingly at random.

 My picks for the most useful graphs from the long list Mark sent:

 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png
 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png

 I think the tests you could consider next is to graph the target going from
 10 to 100 in steps of 10 just for those 5 queries.  If it gradually
 degrades, that's interesting but hard to nail down.  But if there's a sharp
 transition, getting an explain plan for the two sides of that should provide
 some insight.  I'm really more interested in the ones that slowed down than
 the one that improved, understanding that might finally provide some
 evidence against increasing it by default.

I've updated the charts to include results from setting
default_statistics_target from 20-90.  The links to the charts are the
same.  The links to the raw data are in
http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52.

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 If we do though, it shouldn't default one way and then get randomly flipped by
 a tool that has the same information to make its decision on. What I'm saying
 is that mixed is the same information that initdb had about the workload.

+1.

 If we do change this then I wonder if we need the parameter at all. I mean, we
 don't generally have parameters to turn off random parts of the optimizer...

It probably isn't a good idea to both change the default setting and
remove the parameter in the same release.  It would be awesome if this
is cheap enough now to have it on by default - but constraints are
pretty useful for maintaining data integrity, so it's conceivable to
me that someone could have a lot of constraints most of which are
unuseful for query planning.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Thu, 4 Dec 2008, Gregory Stark wrote:

My point was more that you could have a data warehouse on a 
non-dedicated machine, you could have a web server on a non-dedicated 
machine, or you could have a mixed server on a non-dedicated machine.


I should just finish the documentation, where there will be a big 
disclaimer saying THESE SETTINGS ASSUME A SERVER DEDICATED TO 
POSTGRESQL!  That's the context here.  Why, after you follow my tuning 
instructions, you're lucky if the server will run anything but the 
database afterwards.



Josh's logic is impeccable -- for the specific use case he's describing of a
truly dedicated server with enough disk space for a major production database.
But not every install is going to have gigabytes of space reserved for it and
not every admin is going to realize that he really should set aside gigabytes
of space even though he only expects his database to be a few megabytes.


It's really quite simple.  Josh and I don't care directly about disk space 
used by the WAL for people with trivial databases.  At all.  Whatsoever. 
Maybe once, long ago, when we were young and frugal and skinny[1]; not 
now, or probably ever again the future.  If that's your concern, maybe 
there can be some companion utility named pgmiser that lowers parameters 
back down again.  Your mascot can be some sort of animal that efficiently 
lives off small scraps of food or something.[2]


The context here is pgtune, which is aiming to make a fat elephant of a 
server faster so that there's an answer to people who say My benchmarks 
are all running really slow, is this because my system with 16PT of RAM is 
only using 32MB of it for the database?  This sucks, I'm going back to 
Oracle which used all my RAM.  If there are people who instead think, 
hey, I'll run this tuning utility to make my database faster, then it 
will also be a lot smaller!, maybe we can find a class about space/time 
tradeoffs in algorithm design to send them to or something.[3]


There are exactly two important things here.  The first is how large 
checkpoint_settings needs to be in order to for the considerable overhead 
of checkpoints to be bearable.  That drives the setting up.  Our super-fat 
DW application gets set to at least 64 so that when you bulk-load another 
TB of data into it, that doesn't get bottlenecked dumping gigabytes of 
dirty buffers every few seconds.  If the database crashes and recovery 
reads or writes a bunch of data, who cares about random writes because 
your SAN has a 4GB write cache on it and dozens of drives slaving away.


Driving the setting down is knowing how much time you'll have to wait for 
recovery to happen, which is really a measure of what your tolerance for 
downtime is.  We're thinking that someone who picks the Desktop tuning may 
have no tolerance for the database to be sluggish coming back up after 
Windows crashed and they rebooted, so tiny setting for them to make 
recovery super fast.


Everybody else in our sample profiles fall in the middle of those two 
extremes, which is why the values curve the way they do.  Web app? 
Probably not a lot of write volume, probably trouble if it's down a long 
time; how about 8, on the low side, but it gives checkpoints more time to 
spread out their I/O so worst-case latency isn't as bad.  That's the sort 
of analysis those numbers come from.  Do performance tuning and juggle 
these trade-offs for long enough for new people all the time, you get a 
gut feel for the right ballpark an app should start at based on its type. 
The whole idea behind this tool is that we're taking some of that hard-won 
knowledge and trying to automate the distribution of it.



It's great that Postgres has such great documentation but whenever we have the
chance to replace something with an option which doesn't need any
documentation that would be even better. I'm just exploring whether that's an
option here.


I would be glad to have a post-CommitFest discussion of this very topic as 
it's quite a pain to me in its current form.  Just not right now because 
it's too late to touch it.


Nobody's even tried to do this side of things before. They always got 
bogged down in trying to parse config files and such.


It's actually because most of them were working in Perl, which encourages 
deviant behavior where people delight in converting useful ideas into 
illegible punctuation rather than actually getting anything done.  Except 
for that other Greg around here who's not involved in this discussion, his 
Perl is pretty good.


[1] Josh is being aggressively bulked up right now for his next sumo 
match.


[2] Like a rat, which would give you an excuse to add the long overdue 
PL/Ratfor.


[3] This wouldn't actually help them learn anything, but it would make 
their heads explode at which point all their problems are gone.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Robert Haas
 I think the tests you could consider next is to graph the target going from
 10 to 100 in steps of 10 just for those 5 queries.  If it gradually
 degrades, that's interesting but hard to nail down.  But if there's a sharp
 transition, getting an explain plan for the two sides of that should provide
 some insight.  I'm really more interested in the ones that slowed down than
 the one that improved, understanding that might finally provide some
 evidence against increasing it by default.

 I've updated the charts to include results from setting
 default_statistics_target from 20-90.  The links to the charts are the
 same.  The links to the raw data are in
 http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52.

I still think we're missing the boat here because it's not really the
same query every time.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Greg Smith

On Wed, 3 Dec 2008, Mark Wong wrote:


http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png


I've updated the charts to include results from setting
default_statistics_target from 20-90.


Interesting.

Q2: Explodes hard when the target hits 100, but not before.  Would be 
interesting to compare the explain plans at, say, 50 and 150 to see if 
it's possible to nail down what is shifting so much.


Q3: goes bonkers as soon as the target hits 20, so the interesting plans 
to compare are 10 and 20.


Q17:  our one from the original set that improved a bunch with the larger 
target gets all that behefit just from going to 20.  Would be interesting 
to compare the plans at 10 and 20 to see what changed so much with such a 
small difference.


Q18:  looks like it was in some sort of local bad area around 100-400 
before, with some more context that one doesn't look interesting anymore.


Q20:  also doesn't look very interesting anymore.  The results at 10 were 
nice, and the ones at 100 were among the unusuallly bad ones, but it's 
pretty random--if there was something inherantly bad related to the stats, 
there wouldn't be low points around 200.


Out of those, the most interesting one to me (as someone who is trying to 
defend raising the target some but not going crazy with that) is Q3.  The 
reason I say that is that everything else is better or basically the same 
raising the target from 10, as long as you don't go too high (=100). 
That one falls apart immediately with a larger target which seems weird.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-03 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Gregory Stark escribió:

Joshua D. Drake [EMAIL PROTECTED] writes:



I don't think at any time I have said to my self, I am going to set this
parameter low so I don't fill up my disk. If I am saying that to myself
I have either greatly underestimated the hardware for the task. Consider
that we are quarreling over what amounts to a nominal amount of hard
drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more
capacity than that.

Well my phone has 16G of RAM, why not 1 ?


I don't think the disk space used is the only consideration here.  You
also have to keep recovery time in mind.  If you set it to 1000,
recovery would take way too long.


Presumably if you set checkpoint_segments to a high value, you'd use 
checkpoint_timeout to limit recovery time.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-02 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 ... where the Power Test seems to oscillate between degrees of good and bad
 behavior seemingly at random.

Are any of the queries complicated enough to trigger GEQO planning?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-02 Thread Bruce Momjian
Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Dann Corbit [EMAIL PROTECTED] writes:
  I also do not believe that there is any value that will be the right
  answer.  But a table of data might be useful both for people who want to
  toy with altering the values and also for those who want to set the
  defaults.  I guess that at one time such a table was generated to
  produce the initial estimates for default values.
 
  Sir, you credit us too much :-(.  The actual story is that the current
  default of 10 was put in when we first implemented stats histograms,
  replacing code that kept track of only a *single* most common value
  (and not very well, at that).  So it was already a factor of 10 more
  stats than we had experience with keeping, and accordingly conservatism
  suggested not boosting the default much past that.
 
 I think that's actually too little credit. The sample size is chosen quite
 carefully based on solid mathematics to provide a specific confidence interval
 estimate for queries covering ranges the size of a whole bucket.
 
 The actual number of buckets more of an arbitrary choice. It depends entirely
 on how your data is distributed and how large a range your queries are
 covering. A uniformly distributed data set should only need a single bucket to
 generate good estimates. Less evenly distributed data sets need more.
 
 I wonder actually if there are algorithms for estimating the number of buckets
 needed for a histogram to achieve some measurable goal. That would close the
 loop. It would be much more reassuring to base the size of the sample on solid
 statistics than on hunches.

I have a few thoughts on this.  First, people are correct that there is
no perfect default_statistics_target value.  This is similar to the
problem with the pre-8.4 max_fsm_pages/max_fsm_relations, for which
there also was never a perfect value.  But, if the FSM couldn't store
all the free space, a server log message was issued that recommended
increasing these values;  the same is still done for
checkpoint_segments.  Is there a way we could emit a server log message
to recommend increasing the statistics targets for specific columns?

Also, is there a way to increase the efficiency of the statistics
targets lookups?  I assume the values are already sorted in the
pg_statistic arrays;  do we already do a binary lookup on those?  Does
that help?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-02 Thread Guillaume Smet
Greg,

On Mon, Dec 1, 2008 at 3:17 AM, Greg Smith [EMAIL PROTECTED] wrote:
 ./pgtune -i ~/data/postgresql.conf

First, thanks for your work: it will really help a lot of people to
have a decent default configuration.

A couple of comments from reading the code (I didn't run it yet):
- it would be really nice to make it work with Python 2.4 as RHEL 5 is
a Python 2.4 thing and it is a very widespread platform out there,
- considering the audience of this tool, I think you should explain in
the usage text which type of workload implies each database type (DW,
OLTP, Web, Mixed, Desktop).
- not sure of the wording of The file this needs to operate correctly
can be generated with:
- it would be nice to be able to define the architecture (32-64 bits)
from the command line (especially considering I won't be able to run
it on our target boxes which are all RHEL 5 :))

I'll see if I have more feedback while testing it for real.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Dave Page
On Mon, Dec 1, 2008 at 3:21 AM, Greg Smith [EMAIL PROTECTED] wrote:
 On Sun, 30 Nov 2008, Greg Smith wrote:

 Memory detection works on recent (=2.5) version of Python for Windows
 now.

 I just realized that the provided configuration is really not optimal for
 Windows users because of the known limitations that prevent larger
 shared_buffers settings from being effective on that platform.  I know there
 are some notes on that subject in the archives that I'll look though, but
 I'd appreciate a suggestion for what a good upper-limit for that setting is
 on Windows.  I also wonder whether any of the other parameters have similar
 restrictions on their useful range.

It's going to be of little use to 99% of Windows users anyway as it's
written in Python. What was wrong with C?

FWIW though, in some pgbench tests on XP Pro, on a 4GB machine, 512MB
seemed to be consistently the most effective size (out of tests on
32MB, 512MB and 1GB). There wasn't much between 32 and 512 though - my
suspicion is that 128 or 256 would be similarly effective. I didn't
have time to test that though.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Magnus Hagander
Dave Page wrote:
 On Mon, Dec 1, 2008 at 3:21 AM, Greg Smith [EMAIL PROTECTED] wrote:
 On Sun, 30 Nov 2008, Greg Smith wrote:

 Memory detection works on recent (=2.5) version of Python for Windows
 now.
 I just realized that the provided configuration is really not optimal for
 Windows users because of the known limitations that prevent larger
 shared_buffers settings from being effective on that platform.  I know there
 are some notes on that subject in the archives that I'll look though, but
 I'd appreciate a suggestion for what a good upper-limit for that setting is
 on Windows.  I also wonder whether any of the other parameters have similar
 restrictions on their useful range.
 
 It's going to be of little use to 99% of Windows users anyway as it's
 written in Python. What was wrong with C?

It could be shipped with a bundled python, I guess.

Python is a lot more common in these scenarios than any of the other
unixy languages - much more common than perl for example - on Windows
in my experience. It could be because it's easier to bundle or something?


 FWIW though, in some pgbench tests on XP Pro, on a 4GB machine, 512MB
 seemed to be consistently the most effective size (out of tests on
 32MB, 512MB and 1GB). There wasn't much between 32 and 512 though - my
 suspicion is that 128 or 256 would be similarly effective. I didn't
 have time to test that though.

That's about what I've heard around as well - I don't think I've ever
heard of a case where 512 has actually helped.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Robert Haas
I just gave this a try and got:

$ ./pgtune
Traceback (most recent call last):
  File ./pgtune, line 20, in module
from ctypes.wintypes import *
  File /usr/lib/python2.5/ctypes/wintypes.py, line 21, in module
class VARIANT_BOOL(_SimpleCData):
ValueError: _type_ 'v' not supported

This is FC7, installed RPM is python-2.5-15.fc7.

(This is also my #1 beef with Python: too many utterly impenetrable
error messages... or at least impenetrable to me.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Greg Smith

On Mon, 1 Dec 2008, Dave Page wrote:


It's going to be of little use to 99% of Windows users anyway as it's
written in Python. What was wrong with C?


It's 471 lines of Python code that leans heavily on that language's 
Dictionary type to organize everything.  Had I insisted on writing 
directly to C first, it would have be of no use to 100% of all users in 
the 8.4 timeframe because it wouldn't be even remotely close to finished 
by now.


I'd ultimately like to use the Python version as a spec to produce a C 
implementation, because that's the only path to get something like this 
integrated into initdb itself.  There were just too many thing to get 
under control for that to practical just yet.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 I'd ultimately like to use the Python version as a spec to produce a C 
 implementation, because that's the only path to get something like this 
 integrated into initdb itself.

It won't get integrated into initdb in any case: a standalone tool is
the correct thing.  Or do you think people should have to initdb in
order to get fresh advice after changes in the environment?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Greg Smith

On Mon, 1 Dec 2008, Robert Haas wrote:


I just gave this a try and got:

$ ./pgtune
Traceback (most recent call last):
 File ./pgtune, line 20, in module
   from ctypes.wintypes import *
 File /usr/lib/python2.5/ctypes/wintypes.py, line 21, in module
   class VARIANT_BOOL(_SimpleCData):
ValueError: _type_ 'v' not supported


Looks like I need to add Python 2.5+Linux to my testing set.  I did not 
expect that the UNIX distributions of Python 2.5 would ship with 
wintypes.py at all.  I think I can fix this on the spot though.  On line 
40, you'll find this bit:


except ImportError:

Change that to the following:

except ImportError,ValueError:

And it should pass that point.  If it doesn't, you can try the completely 
general:


except:

And then it should be OK.  I should probably use that formulation in that 
particular section.



(This is also my #1 beef with Python: too many utterly impenetrable
error messages... or at least impenetrable to me.)


That one made sense to me, but that is one of the sharper parts of its 
learning curve.  I think part of the problem is that much of the tutorial 
Python code out there never gets into error handling.  One of the things 
Java does that I miss is that if you create a try block, it won't work 
until you have addressed every possible exception the code in there can 
generate.  It may be impractical for Python to know that much, but I 
wonder if a lint-ish tool for the language might be able to do it.


The above mumbling is certainly off-topic for this list though.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Greg Smith

On Mon, 1 Dec 2008, Tom Lane wrote:


Greg Smith [EMAIL PROTECTED] writes:

I'd ultimately like to use the Python version as a spec to produce a C
implementation, because that's the only path to get something like this
integrated into initdb itself.


It won't get integrated into initdb in any case: a standalone tool is
the correct thing.  Or do you think people should have to initdb in
order to get fresh advice after changes in the environment?


The long-term goal Josh suggested aiming for (that I abandoned for now) 
was replacing the part of initdb that creates the initial postgresql.conf 
via the sample with a call to a program that generates it instead.  Said 
program could also run standalone for later updates to the configuration.


I have no interest in arguing about whether that's a good or bad idea at 
this point, I've accepted that the only useful statements to be made in 
this area come in the form of a working implementation to review.  Until I 
have one of those, this really isn't worth getting into.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Mark Wong
On Thu, Nov 13, 2008 at 11:53 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 A lot of people have suggested raising our default_statistics target,
 and it has been rejected because there's some O(n^2) behavior in the
 planner, and it makes ANALYZE slower, but it's not that crazy.

 I think everyone agrees it ought to be raised.  Where the rubber meets
 the road is deciding just *what* to raise it to.  We've got no
 convincing evidence in favor of any particular value.

 If someone actually wanted to put some effort into this, I'd suggest
 taking some reasonably complex benchmark (maybe TPCH or one of the DBT
 series) and plotting planner runtime for each query as a function of
 statistics_target, taking care to mark the breakpoints where it shifted
 to a better (or worse?) plan due to having better stats.

Hi all,

I have some data from a mostly un-tuned system.  I have a 10GB scale
factor using DBT-3 on a single 25 disk hardware RAID-0 lun.  Generally
speaking, it seems like on un-tuned systems increasing the
default_statistics_target for this workload doesn't have a clear
benefit.  Here is a histogram of the Power Test and Throughput Test
with default GUC values.  The Power and Throughput test results are
plotted against the default_statistics_target value.  For reminders,
the Power Test executes each query serially, while the Throughput Test
executes 8 streams of the 22 queries simultaneously.

Default values:
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/default.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 1 - 11.

I started increasing the shared_buffers and effective_cache_size to
see if there might be anything more interesting in the results, but
still don't think so.

shared_buffers=2048MB, effective_cache_size=1024MB
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/2048x1024.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 12 - 22.

-

shared_buffers=4096MB, effective_cache_size=2048MB
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/4096x2048.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 23 - 33.

-

shared_buffers=20480MB
effective_cache_size=10240MB
checkpoint_segments=3000

http://207.173.203.223/~markwkm/pgsql/default_statistics_target/20480x10240.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 34 - 44.

So then I attempted to see if there might have been difference between
the executing time of each individual query with the above parameters.
 The queries that don't seem to be effected are Q1, Q4, Q12, Q13, and
Q15.  Q17 suggests that anything higher than
default_statistics_target=10 is an improvement.  The rest of the
queries appears not to follow any particular trend with respect to
default_statistics_target.

Here's query 17 when default_statistics_target=10

 EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from
lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#52'
and p_container = 'SM PACK' and l_quantity  ( select 0.2 *
avg(l_quantity) from lineitem where l_partkey = p_partkey );

QUERY PLAN
--
 Aggregate  (cost=15725486.01..15725486.02 rows=1 width=4) (actual
time=31669.249..31669.249 rows=1 loops=1)
   -  Hash Join  (cost=70662.08..15725435.96 rows=20019 width=4)
(actual time=690.551..31666.031 rows=5413 loops=1)
 Hash Cond: (public.lineitem.l_partkey = part.p_partkey)
 Join Filter: (public.lineitem.l_quantity  (subplan))
 -  Seq Scan on lineitem  (cost=0.00..1668441.22
rows=60052722 width=12) (actual time=0.007..11016.720 rows=60046144
loops=1)
 -  Hash  (cost=70637.08..70637.08 rows=2000 width=4) (actual
time=548.192..548.192 rows=1986 loops=1)
   -  Seq Scan on part  (cost=0.00..70637.08 rows=2000
width=4) (actual time=0.066..547.106 rows=1986 loops=1)
 Filter: ((p_brand = 'Brand#52'::bpchar) AND
(p_container = 'SM PACK'::bpchar))
 SubPlan
   -  Aggregate  (cost=227.86..227.87 rows=1 width=4) (actual
time=0.141..0.141 rows=1 loops=59857)
 -  Index Scan using i_l_partkey on lineitem
(cost=0.00..227.72 rows=54 width=4) (actual time=0.071..0.132 rows=31
loops=59857)
   Index Cond: (l_partkey = $0)
 Total runtime: 31669.340 ms
(13 rows)


Here's query 17 when default_statistics_target=1000

 EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from
lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#21'
and p_container = 'LG BAG' and l_quantity  ( select 0.2 *
avg(l_quantity) from lineitem where l_partkey = p_partkey );

QUERY PLAN

Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Greg Smith

On Mon, 1 Dec 2008, Mark Wong wrote:

So then I attempted to see if there might have been difference between 
the executing time of each individual query with the above parameters. 
The queries that don't seem to be effected are Q1, Q4, Q12, Q13, and 
Q15.  Q17 suggests that anything higher than 
default_statistics_target=10 is an improvement.  The rest of the queries 
appears not to follow any particular trend with respect to 
default_statistics_target.


The interesting ones are Q2, Q9, Q17, Q18, and Q20, and that data is much 
more useful than the summary.  As you mention, Q17 improves significantly 
with a higher target.  All of the rest are dramatically slower in one or 
both tests going from default_statistics_target=10 to 100.  Those look 
like the most useful data points on the X axis--the increases from 100 up 
to 1000 aren't particularly interesting in most of these, except in Q20 
where the Power Test seems to oscillate between degrees of good and bad 
behavior seemingly at random.


My picks for the most useful graphs from the long list Mark sent:

http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png

I think the tests you could consider next is to graph the target going 
from 10 to 100 in steps of 10 just for those 5 queries.  If it gradually 
degrades, that's interesting but hard to nail down.  But if there's a 
sharp transition, getting an explain plan for the two sides of that should 
provide some insight.  I'm really more interested in the ones that slowed 
down than the one that improved, understanding that might finally provide 
some evidence against increasing it by default.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-12-01 Thread Robert Haas
 Hi all,

 I have some data [...]

Thanks for gathering this data.

The first thing I notice is that the two versions of Q17 that you are
running are actually not the exact same query - there are hard-coded
constants that are different in each case, and that matters.  The
substituted parameter doesn't affect the plan, but the constants that
are actually in the query do.

It seems that some of the other queries may be similar - for example,
Q2 has a couple of very fast runs (with default_stats_target 10 and
600) that probably got a better plan than the other runs, and Q6, Q9,
and Q20 look suspiciously like there may have been two different plans
in the mix as well, presumably because the actual queries varied
somewhat.

I think the only way to fix this is to run the test a large number of
times with each DST and average.  :-(

Q17 looks like a pretty illustrative example of how a higher
statistics target can (potentially) help: it enables the planner to
realize that a qual on the part table is highly selective, and
therefore switch to a nested loop w/index-scan instead of a hash join.
 (This may not be the only case, but I can't think of the others right
now.)  I haven't actually looked at the selectivity estimation code,
but I'm assuming that if we have n MCVs then we can estimate that any
non-MCV occurs with frequency  1/n (in fact,  the frequency of the
least-frequent MCV, but 1/n at most).  So we want n to be large enough
that 1/n is below the cutoff for switching to an index scan
(otherwise, we'll end up using the hash join even when the qual
selects an extremely infrequent value).  It might be helpful to figure
out where that cutoff is and what factors it depends on.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-30 Thread Greg Smith

On Tue, 18 Nov 2008, Josh Berkus wrote:

Regarding the level of default_stats_target, it sounds like people agree 
that it ought to be raised for the DW use-case, but disagree how much. 
If that's the case, what if we compromize at 50 for mixed and 100 for 
DW?


That's what I ended up doing.  The attached version of this script and its 
data files (I dumped all the useful bits in the current HEAD pg_settings 
for it to use) now hits all of the initial goals I had for a useful 
working tool here.  Here's a sample run on a system with 8GB of RAM and 
the default mixed workload.  I told the tool absolutely nothing:


./pgtune -i ~/data/postgresql.conf

And it came up with a decent mixed starter configuration for this system 
appended to the input postgresql.conf:


default_statistics_target = 50 # pg_generate_conf wizard 2008-11-30
maintenance_work_mem = 480MB # pg_generate_conf wizard 2008-11-30
constraint_exclusion = on # pg_generate_conf wizard 2008-11-30
checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2008-11-30
effective_cache_size = 5632MB # pg_generate_conf wizard 2008-11-30
work_mem = 48MB # pg_generate_conf wizard 2008-11-30
wal_buffers = 8MB # pg_generate_conf wizard 2008-11-30
checkpoint_segments = 16 # pg_generate_conf wizard 2008-11-30
shared_buffers = 1920MB # pg_generate_conf wizard 2008-11-30
max_connections = 80 # pg_generate_conf wizard 2008-11-30

I do plan to make that look a bit prettier.

Loose ends
--

-Kernel shm suggestions.  The main hard part there is implementing the 
formula from the documentation to figure out the total expected shared 
block size that will be allocated.  That just needs another block of time 
to finish, will fit that in this week.


-Providing hints (to stderr) for things that fall somewhere between 
changes to be made automatically and just things to put into the docs.  I 
want to warn about the implications of the default listen_addresses for 
example.


-Documentation.  I've got the whole outline sketched out and some text 
written, just haven't had time yet to finish implementing that as Docbook.


-Tests on more platforms (this was developed on Linux).  Memory detection 
works on recent (=2.5) version of Python for Windows now.  I want to do 
some more tests there, on Mac OS X, and on Solaris.  Some *BSD testing 
would also be in order once I'm outputting shm suggestions, I don't have 
any systems to test that platform myself.


Other than the docs and testing, the rest of these are really optional 
anyway.  What is completely done is the tuning model itself and the inputs 
it needs.  I made some small tweaks to what Josh suggested, documented 
below, and consider that frozen from my side except for review feedback. 
I'll get everything mentioned above done next weekend, and expect to 
submit something of commit candidate quality at that time.


Naming and promotion


Now that the finish line is in sight, I thought a bit about what to name 
this thing.  The original idea Josh threw out was based on the idea that 
this would generate the postgresql.conf file from scratch, which may still 
happen eventually but is not a goal for this release.  This really just a 
tuning tool.  When searching for possible names that might fit, one of the 
ones I thought of was already suggested quite some time ago, in the 
original discussion that led to the TODO item I'm trying to complete:


http://archives.postgresql.org/pgsql-advocacy/2003-02/msg00085.php

Since that was the only claim I could find on the name, I've changed the 
name on this new version to pgtune and would propose that as its final 
name in contrib.


I'd like to add some text mentioning the availability of this tool to the 
sample postgresql.conf, and I'd certainly be in favor of the suggestion 
from that old message that a note suggesing you run it should even show up 
in the output from initdb.


Tuning model


The actual tuning part of the code is exactly 83 lines right now and 
pretty easy to read, the rest is mainly text and GUC-related baggage. 
For anyone who wants to dig into the model itself, here are the other 
changes I made from what Josh has suggested.


-Since I'm focused right now just on getting a good 8.4 version of this 
tool running, I dropped any attempt at tuning the FSM parameters since 
they're gone.  With that change, there's no need to even ask about 
database size anymore.  I'll put something related to that in a future 
release that also targets 8.3.


-The tool doesn't do anything memory-related if you have less than 256MB 
of RAM.  I wasn't sure how far down this model scaled and may revisit that 
cut-off.


-I was uncomfortable turning off auto-vacuum in the DW case, for fear it 
would cause trouble for a newbie admin who happens to be stuck with such a 
job.  Instead I'm putting a section in the documentation about the 
parameters that might be turned by hand beyond what the tool does.  That 
can mention that further 

Re: [HACKERS] Simple postgresql.conf wizard

2008-11-30 Thread Greg Smith

On Sun, 30 Nov 2008, Greg Smith wrote:

Memory detection works on recent (=2.5) version of Python for Windows 
now.


I just realized that the provided configuration is really not optimal for 
Windows users because of the known limitations that prevent larger 
shared_buffers settings from being effective on that platform.  I know 
there are some notes on that subject in the archives that I'll look 
though, but I'd appreciate a suggestion for what a good upper-limit for 
that setting is on Windows.  I also wonder whether any of the other 
parameters have similar restrictions on their useful range.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-30 Thread Martijn van Oosterhout
On Sun, Nov 30, 2008 at 09:17:37PM -0500, Greg Smith wrote:
 That's what I ended up doing.  The attached version of this script and its 
 data files (I dumped all the useful bits in the current HEAD pg_settings 
 for it to use) now hits all of the initial goals I had for a useful 
 working tool here.  Here's a sample run on a system with 8GB of RAM and 
 the default mixed workload.  I told the tool absolutely nothing:
 
 ./pgtune -i ~/data/postgresql.conf

Looks very nice.

 shared_buffers = 1920MB # pg_generate_conf wizard 2008-11-30

Do you have a check somewhere to see if this exceeds the total SYSV
memory allowed by the OS. Otherwise you've just output an unstartable
config. The output of /sbin/sysctl should tell you.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-30 Thread Greg Smith

On Mon, 1 Dec 2008, Martijn van Oosterhout wrote:


Do you have a check somewhere to see if this exceeds the total SYSV
memory allowed by the OS. Otherwise you've just output an unstartable
config. The output of /sbin/sysctl should tell you.


Something to address that is listed as the first thing in the Loose Ends 
section of the message.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-27 Thread Robert Haas
 Even though we all agree default_statistics_target = 10 is too low,
 proposing a 40X increase in the default value requires more evidence
 than this.  In particular, the prospect of a 1600-fold increase in
 the typical cost of eqjoinsel() is a mite scary.

I just did some very quick testing of a couple values for
default_statistics_target on a database for work whose contents I am
not at liberty to share.  Suffice it to say it is a database with
small tables, but a lot of the queries involve many joins.

ANALYZE with default_statistics_target set to 10 takes 13 s.  With
100, 92 s.  With 1000, 289 s.

I tried several representative queries and there was no measurable
change in run-time.  I compared a couple of the plans and the plans
didn't change either.

So then I said - gee, how much can I reduce this?  I looked at two of
the queries in more detail.  The lowest default_statistics_target that
produced the same plan for both queries was 5.  Reducing the
default_statistics_target all the way down to 1 changed the plans, but
the new plans were just as fast as the old plans.

Given the amount of clamor for a higher value for
default_statistics_target, I'm a little surprised by these results.
It may be that the queries I'm running are not ones for which more
statistics generate better plans, but I think they really are
representative of what we run.  Maybe someone could suggest some types
of query that would be likely to helped by better statistics?

A random thought: maybe the reason I'm not seeing any benefit is
because my tables are just too small - most contain at most a few
thousand rows, and some are much smaller.  Maybe
default_statistics_target should vary with the table size?  Something
like, 0.1% of the rows to a maximum of 100...  and then 0.01% of the
rows after that to some higher maximum but always a minimum of at
least 10.  I'm guessing that people with really big tables are less
likely to mind longer planning times and more likely to benefit from
finding better plans...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-27 Thread Joshua Tolley
On Thu, Nov 27, 2008 at 05:15:04PM -0500, Robert Haas wrote:
 A random thought: maybe the reason I'm not seeing any benefit is
 because my tables are just too small - most contain at most a few
 thousand rows, and some are much smaller.  Maybe
 default_statistics_target should vary with the table size?  Something
 like, 0.1% of the rows to a maximum of 100...  and then 0.01% of the
 rows after that to some higher maximum but always a minimum of at
 least 10.  I'm guessing that people with really big tables are less
 likely to mind longer planning times and more likely to benefit from
 finding better plans...

Something like this makes an awful lot of sense to me. A higher
statistics target (in theory) better captures the fine details of a
distribution of values; tables with fewer rows are unlikely to
demonstrate fine details (as are tables with categorical, as
opposed to continuous, values). Returning to the original topic of a
configuration creator, perhaps such a tool should ask the user how big
(s)he expects the tables to grow, or perhaps it should be able to
investigate a particular column and recommend a statistics target based
on its distribution, size, etc.

- Josh / eggyknap


signature.asc
Description: Digital signature


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-27 Thread Gregory Stark
Robert Haas [EMAIL PROTECTED] writes:

 ANALYZE with default_statistics_target set to 10 takes 13 s.  With
 100, 92 s.  With 1000, 289 s.

That is interesting. It would also be interesting to total up the time it
takes to run EXPLAIN (without ANALYZE) for a large number of queries.

I did start looking at running the experiment Tom was asking for with DBT3
data sets but, uh, the most recent dbt-3 data generation source I could find
was four years old. Is there an active source repository for this stuff?

 Given the amount of clamor for a higher value for
 default_statistics_target, I'm a little surprised by these results.
 It may be that the queries I'm running are not ones for which more
 statistics generate better plans, but I think they really are
 representative of what we run.  Maybe someone could suggest some types
 of query that would be likely to helped by better statistics?

I think there are a lot of variables here. One is the distribution of the
data. Your data sounds like it's reasonably evenly distributed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-27 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Nov 27, 2008 at 05:15:04PM -0500, Robert Haas wrote:
 [...]  Maybe
 default_statistics_target should vary with the table size?  Something
 like, 0.1% of the rows to a maximum of 100...  and then 0.01% of the
 rows after that to some higher maximum but always a minimum of at
 least 10.  I'm guessing that people with really big tables are less
 likely to mind longer planning times and more likely to benefit from
 finding better plans...

What was the running time of the planner? Quadratic in statistics_target?
Take the square root of table size. Exponential? The logarithm. Or
something like that.

My 1.75 cent
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFJL6HyBcgs9XrR2kYRAqfzAJ47w720VJLHVXeS8WKH6zUjAHkbeACfTtDh
rFq512eztVnib1ozjw9sibs=
=JXyS
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-26 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 On Tue, 2008-11-25 at 20:33 -0500, Tom Lane wrote:
 So we really don't have any methodically-gathered evidence about the
 effects of different stats settings.  It wouldn't take a lot to convince
 us to switch to a different default, I think, but it would be nice to
 have more than none.

 I don't this is not empirical but really, 150 is very reasonable. Let's
 just set it to that by default and be done with it.

What happened to the more than zero evidence part of the discussion?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-26 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Dann Corbit [EMAIL PROTECTED] writes:
 I also do not believe that there is any value that will be the right
 answer.  But a table of data might be useful both for people who want to
 toy with altering the values and also for those who want to set the
 defaults.  I guess that at one time such a table was generated to
 produce the initial estimates for default values.

 Sir, you credit us too much :-(.  The actual story is that the current
 default of 10 was put in when we first implemented stats histograms,
 replacing code that kept track of only a *single* most common value
 (and not very well, at that).  So it was already a factor of 10 more
 stats than we had experience with keeping, and accordingly conservatism
 suggested not boosting the default much past that.

I think that's actually too little credit. The sample size is chosen quite
carefully based on solid mathematics to provide a specific confidence interval
estimate for queries covering ranges the size of a whole bucket.

The actual number of buckets more of an arbitrary choice. It depends entirely
on how your data is distributed and how large a range your queries are
covering. A uniformly distributed data set should only need a single bucket to
generate good estimates. Less evenly distributed data sets need more.

I wonder actually if there are algorithms for estimating the number of buckets
needed for a histogram to achieve some measurable goal. That would close the
loop. It would be much more reassuring to base the size of the sample on solid
statistics than on hunches.

 So we really don't have any methodically-gathered evidence about the
 effects of different stats settings.  It wouldn't take a lot to convince
 us to switch to a different default, I think, but it would be nice to
 have more than none.

I think the difficulty (aside from testing being laborious at the best of
times) is that it's heavily dependent on data sets which are hard to generate
good examples for. Offhand I would think the census data might make a good
starting point -- it should have columns which range from perfectly uniform to
highly skewed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard -- Statistics idea...

2008-11-26 Thread Joshua Tolley
On Tue, Nov 25, 2008 at 06:59:25PM -0800, Dann Corbit wrote:
 I do have a statistics idea/suggestion (possibly useful with some future
 PostgreSQL 9.x or something):
 It is a simple matter to calculate lots of interesting univarate summary
 statistics with a single pass over the data (perhaps during a vacuum
 full).
 For instance with numerical columns, you can calculate mean, min, max,
 standard deviation, skew, kurtosis and things like that with a single
 pass over the data.

Calculating interesting univariate summary statistics and having
something useful to do with them are two different things entirely. Note
also that whereas this is simple for numeric columns, it's a very
different story for non-numeric data types, that don't come from a
metric space. That said, the idea of a probability metric space is well
explored in the literature, and may have valuable application. The
current histogram implementation is effectively a description of the
probability metric space the column data live in.

 Now, if you store a few numbers calculated in this way, it can be used
 to augment your histogram data when you want to estimate the volume of a
 request. So (for instance) if someone asks for a scalar that is 
 value you can look to see what percentage of the tail will hang out in
 that neck of the woods using standard deviation and the mean.

Only if you know that the data follow a distribution that can be
described accurately with a standard deviation and a mean. If your data
don't follow a Gaussian distribution, this will give you bad estimates.

- Josh / eggyknap


signature.asc
Description: Digital signature


Re: [HACKERS] Simple postgresql.conf wizard -- Statistics idea...

2008-11-26 Thread Decibel!

On Nov 25, 2008, at 8:59 PM, Dann Corbit wrote:
It is a simple matter to calculate lots of interesting univarate  
summary

statistics with a single pass over the data (perhaps during a vacuum
full).



I don't think that the problem we have is how to collect statistics  
(well, except for cross-field stuff); the problem is what to actually  
do with them. What we need people to look at is how we can improve  
query plan estimates across the board. Row count estimates, page  
access estimates, the cost estimates for accessing those pages, etc.  
This isn't a coding problem, it's an algorithm problem. It needs  
someone with an advanced (if not expert) grasp of statistics who can  
come up with better ways of estimating these things.


So, if you have a statistics hammer to wield, I think you'll find a  
lot of nails sticking up in the planner code. Hammer on those before  
worrying about additional stats to collect. :)

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-26 Thread Decibel!

On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote:
The thought occurs to me that we're looking at this from the  
wrong  side of the
coin. I've never, ever seen query plan time pose a  problem with  
Postgres, even

without using prepared statements.


I certainly have seen plan times be a problem. I wonder if you have  
too and
just didn't realize it. With a default_stats_target of 1000 you'll  
have
hundreds of kilobytes of data to slog through to plan a moderately  
complex
query with a few text columns. Forget about prepared queries, I've  
seen plan

times be unusable for ad-hoc interactive queries before.



Can you provide any examples?

And no, I've never seen a system where a few milliseconds of plan  
time difference would pose a problem. I'm not saying they don't  
exist, only that I haven't seen them (including 2 years working as a  
consultant).


I'll also make the argument that anyone with a system that does have  
those kind of requirements will have also needed to actually tune  
their config, and tune it well. I can't see them being bothered by  
having to set one more parameter. There are a lot of systems that are  
being impacted by our ultra-low stats target, and a lot of those  
don't necessarily need a lot of hand tuning beyond the stats target.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-26 Thread Gregory Stark
Decibel! [EMAIL PROTECTED] writes:

 On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote:
 The thought occurs to me that we're looking at this from the  wrong  side of
 the
 coin. I've never, ever seen query plan time pose a  problem with  Postgres,
 even
 without using prepared statements.

 I certainly have seen plan times be a problem. I wonder if you have  too and
 just didn't realize it. With a default_stats_target of 1000 you'll  have
 hundreds of kilobytes of data to slog through to plan a moderately  complex
 query with a few text columns. Forget about prepared queries, I've  seen plan
 times be unusable for ad-hoc interactive queries before.

 Can you provide any examples?

At the time I couldn't understand what the problem was. In retrospect I'm
certain this was the problem. I had a situation where just running EXPLAIN
took 5-10 seconds. I suspect I had some very large toasted arrays which were
having to be detoasted each time. IIRC I actually reloaded the database with
pg_dump and the problem went away.

 And no, I've never seen a system where a few milliseconds of plan  time
 difference would pose a problem. I'm not saying they don't  exist, only that I
 haven't seen them (including 2 years working as a  consultant).

How many milliseconds does it take to read a few hundred kilobytes of toasted,
compressed data? These can easily be more data than the actual query is going
to read.

Now ideally this will all be cached but the larger the data set the less
likely it will be.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-26 Thread Kevin Grittner
 Decibel! [EMAIL PROTECTED] wrote: 
 On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote:
 The thought occurs to me that we're looking at this from the  
 wrong  side of the
 coin. I've never, ever seen query plan time pose a  problem with  
 Postgres, even
 without using prepared statements.

 I certainly have seen plan times be a problem. I wonder if you have 

 too and
 just didn't realize it. With a default_stats_target of 1000 you'll 

 have
 hundreds of kilobytes of data to slog through to plan a moderately 

 complex
 query with a few text columns. Forget about prepared queries, I've 

 seen plan
 times be unusable for ad-hoc interactive queries before.
 
 Can you provide any examples?
 
 And no, I've never seen a system where a few milliseconds of plan  
 time difference would pose a problem.
 
When we first brought the statewide circuit court data onto
PostgreSQL, on some early version of 8.1, we tried boosting the
statistics targets for a few dozen important columns, and had to back
off because of plan times up in the 20 to 30 second range.  I hadn't
tried it lately, so I just gave it a go with switching from a default
statistics target of 10 with no overrides to 1000.
 
The plan time for a fairly complex query which is run over 300,000
times per day went from 55 ms to 315 ms; however, with the particular
search criteria I used (which I knew to be challenging) the run time
went from something which exceeded my patience tolerance for the test
(over two minutes) to two seconds, so a better plan was definitely
found.
 
I'm not sure what this suggests in terms of a good default value, but
just to put some numbers out there from a real-world application
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-26 Thread Kevin Grittner
 Kevin Grittner [EMAIL PROTECTED] wrote: 
 I hadn't
 tried it lately, so I just gave it a go with switching from a
default
 statistics target of 10 with no overrides to 1000.
 
Oh, this was on 8.2.7, Linux, pretty beefy machine.  Do you want the
whole set of config info and the hardware specs, or would that just be
clutter?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-25 Thread Decibel!

On Nov 19, 2008, at 11:51 PM, Tom Lane wrote:

Dann Corbit [EMAIL PROTECTED] writes:

I think the idea that there IS a magic number is the problem.

No amount of testing is ever going to refute the argument that,  
under

some other workload, a different value might better.

But that doesn't amount to a reason to leave it the way it is.


Perhaps a table of experimental data could serve as a rough  
guideline.


The problem is not that anyone wants to leave it the way it is.
The problem is that no one has done even a lick of work to identify
a specific number that is demonstrably better than others -- on *any*
scale.  How about fewer complaints and more effort?


Is there even a good way to find out what planning time was? Is there  
a way to gather that stat for every query a session runs?


The thought occurs to me that we're looking at this from the wrong  
side of the coin. I've never, ever seen query plan time pose a  
problem with Postgres, even without using prepared statements. Anyone  
who actually cares that much about plan time is certainly going to  
use prepared statements, which makes the whole plan time argument  
moot (plan time, not parse time, but of course stats_target doesn't  
impact parsing at all).


What I *have* seen, on many different databases, was problems with  
bad plans due to default_stats_target being too low. Most of the time  
this was solved by simply setting them to 1000. The only case where I  
backed down from that and went with like 100 was a database that had  
150k tables.


We've been talking about changing default_stats_target for at least 2  
or 3 years now. We know that the current value is causing problems.  
Can we at least start increasing it? 30 is pretty much guaranteed to  
be better than 10, even if it's nowhere close to an ideal value. If  
we start slowly increasing it then at least we can start seeing where  
people start having issues with query plan time.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-25 Thread Tom Lane
Decibel! [EMAIL PROTECTED] writes:
 The thought occurs to me that we're looking at this from the wrong  
 side of the coin. I've never, ever seen query plan time pose a  
 problem with Postgres, even without using prepared statements.

That tells more about the type of queries you tend to run than about
whether there's an issue in general.

 Anyone  
 who actually cares that much about plan time is certainly going to  
 use prepared statements,

This is simply false.  There's a significant performance hit caused
by using prepared statements in many cases where the planner needs
to know the parameter values in order to make good decisions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simple postgresql.conf wizard

2008-11-25 Thread Gregory Stark

Decibel! [EMAIL PROTECTED] writes:

 Is there even a good way to find out what planning time was? Is there  a way 
 to
 gather that stat for every query a session runs?

\timing
explain select ...

 The thought occurs to me that we're looking at this from the wrong  side of 
 the
 coin. I've never, ever seen query plan time pose a  problem with Postgres, 
 even
 without using prepared statements. 

I certainly have seen plan times be a problem. I wonder if you have too and
just didn't realize it. With a default_stats_target of 1000 you'll have
hundreds of kilobytes of data to slog through to plan a moderately complex
query with a few text columns. Forget about prepared queries, I've seen plan
times be unusable for ad-hoc interactive queries before.

 We've been talking about changing default_stats_target for at least 2  or 3
 years now. We know that the current value is causing problems.  Can we at 
 least
 start increasing it? 30 is pretty much guaranteed to  be better than 10, even
 if it's nowhere close to an ideal value. If  we start slowly increasing it 
 then
 at least we can start seeing where  people start having issues with query plan
 time.

How would you see anything from doing that? We only hear from people who have
problems so we only see half the picture. You would have no way of knowing
whether your change has helped or hurt anyone.

In any case I don't see we know that the current value is causing problems
as a reasonable statement. It's the *default* stats target. There's a reason
there's a facility to raise the stats target for individual columns.

As Dann said, the idea that there IS a magic number is the problem. *Any*
value of default_stats_target will cause problems. Some columns will always
have skewed data sets which require unusually large samples, but most won't
and the system will run faster with a normal sample size for that majority.

The question is what value represents a good trade-off between the costs of
having large stats targets -- longer analyze, more data stored in
pg_statistics, more vacuuming of pg_statistics needed, longer plan times --
and the benefits of having larger stats targets -- fewer columns which need
raised stats targets.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >