Re: [HACKERS] Simple postgresql.conf wizard
Mark Wong mark...@gmail.com writes: On Mon, Dec 8, 2008 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Mark Wong mark...@gmail.com writes: On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: Are any of the queries complicated enough to trigger GEQO planning? Sorry for the delay in responding, here's the queries and the number of tables in each: [ worst case is ] Q8: 8 (7 uniqie) + 1 temp table Well, geqo_threshold is 12 by default, so that theory loses ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, Dec 8, 2008 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Mark Wong mark...@gmail.com writes: On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: Are any of the queries complicated enough to trigger GEQO planning? Is there a debug option that we could use to see? Well, you could set geqo=off and see if the behavior changes, but it'd be easier just to count how many tables are in each query ... Sorry for the delay in responding, here's the queries and the number of tables in each: Q1: 1 Q2: 2 Q3: 3 Q4: 1 Q5: 6 Q6: 1 Q7: 6 (5 unique) + 1 temp table Q8: 8 (7 uniqie) + 1 temp table Q9: 6 + 1 temp table Q10: 4 Q11: 3 Q12: 2 Q13: 3 + 1 temp table Q14: 2 Q15: 5 Q16: 2 Q17: 2 Q18: 3 Q19: 2 Q20: 2 Q21: 4 Q22: 3 + 1 temp table Regards, Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Tom Lane [EMAIL PROTECTED] wrote: [ a bit off-topic for the thread, but ... ] Kevin Grittner [EMAIL PROTECTED] writes: I'll attach the query and plan. You'll note that the query looks a little odd, especially all the (1=1) tests. FWIW, it would be better to use TRUE as a placeholder in your generated queries. I don't suppose this would make a huge percentage difference in such complicated queries, but in and of itself there are a lot of cycles spent to parse 1=1 and then reduce it to constant TRUE. Thanks, I'll put in a request for enhancement for our framework. (Not all databases we support handle boolean literals, so we need a few lines in our plugin layer.) In case anyone cares in terms of interpreting the timings I posted, on the server where I just tested this change, the average plan time dropped from 65.0 ms to 63.7 ms -- a 2% improvement. Eliminating pretty whitespace shaved off another 0.2 ms, or 0.3%. So, worth doing on our end as a tuning measure, but not a significant distortion in terms of the issues discussed on the thread. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
[ a bit off-topic for the thread, but ... ] Kevin Grittner [EMAIL PROTECTED] writes: I'll attach the query and plan. You'll note that the query looks a little odd, especially all the (1=1) tests. FWIW, it would be better to use TRUE as a placeholder in your generated queries. I don't suppose this would make a huge percentage difference in such complicated queries, but in and of itself there are a lot of cycles spent to parse 1=1 and then reduce it to constant TRUE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane [EMAIL PROTECTED] wrote: Greg Smith [EMAIL PROTECTED] writes: ... where the Power Test seems to oscillate between degrees of good and bad behavior seemingly at random. Are any of the queries complicated enough to trigger GEQO planning? Is there a debug option that we could use to see? Regards, Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Mark Wong [EMAIL PROTECTED] writes: On Tue, Dec 2, 2008 at 2:25 AM, Tom Lane [EMAIL PROTECTED] wrote: Are any of the queries complicated enough to trigger GEQO planning? Is there a debug option that we could use to see? Well, you could set geqo=off and see if the behavior changes, but it'd be easier just to count how many tables are in each query ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Kevin Grittner [EMAIL PROTECTED] writes: One more data point to try to help. While the jump from a default_statistics_target from 10 to 1000 resulted in a plan time increase for a common query from 50 ms to 310 ms, at a target of 50 the plan time was 53 ms. Analyze time was 7.2 minutes and 18.5 minutes for targets of 10 and 50. This is an 842 GB database on an 8 processor (3.5 GHz Xeon) machine with 64 GB RAM running (soon to be updated) PostgreSQL 8.2.7. So my half-finished DBT3 tests showed hardly any change in planning time for default_statistics_targets 1000 and even Kevin Grittner's query which is the worst real example posted so far only went from 50ms to 310ms. So I started thinking perhaps substantially larger values might not have much effect on planning at all. (Actual data is a lot more convincing than simple assertions!). Kevin's query was an OLTP query so 300ms is still way too much -- 300ms is on the high end for OLTP response times for query *execution*. But seeing the intermediate values would be interesting. So I wondered what the worst-case would be for a synthetic case designed to exercise the planner severely. This would also be useful for optimizing the planner under gprof, though I suspect the hot spots are pretty obvious even without empirical data. So anyways, here's a script to create a table with a 75k pg_statistic record. And a worst-case query where the plan time goes from 34ms to 1.2s for histogram sizes between 10 and 1,000. Looking at eqjoinsel I think it could be improved algorithmically if we keep the mcv list in sorted order, even if it's just binary sorted order. But I'm not sure what else uses those values and whether the current ordering is significant. I'm also not sure it's the only O(n^2) algorithm there and there's no algorithmic gain unless they're all knocked down. Incidentally this timing is with the 75kB toasted arrays in shared buffers because the table has just been analyzed. If it was on a busy system then just planning the query could involve 75kB of I/O which is what I believe was happening to me way back when I last observed super-long plan times. postgres=# create table tk as select random()::text||random()::text||random()::text||random()::text||random()::text||random()::text as r from generate_series(1,1000); postgres=# insert into tk (select * from tk); postgres=# insert into tk (select * from tk); postgres=# insert into tk (select random()::text||random()::text||random()::text||random()::text||random()::text||random()::text as r from generate_series(1,2000)); postgres=# alter table tk alter r set statistics 1000; postgres=# analyze tk; postgres=# select pg_column_size(stavalues1) from pg_statistic where starelid = 'tk'::regclass; pg_column_size 75484 (1 row) postgres=# explain select count(*) from (select * from tk as k, tk as l,tk as m,tk as n,tk as o,tk as p where k.r=l.r and k.r=m.r and k.r=n.r and k.r=o.r and k.r=p.r) as x; QUERY PLAN - Aggregate (cost=41358.14..41358.15 rows=1 width=0) - Merge Join (cost=3213.13..37713.13 rows=1458000 width=0) Merge Cond: (k.r = l.r) - Merge Join (cost=2677.61..14092.61 rows=486000 width=510) Merge Cond: (k.r = m.r) - Merge Join (cost=2142.09..5862.09 rows=162000 width=408) Merge Cond: (k.r = n.r) - Merge Join (cost=1606.57..2761.57 rows=54000 width=306) Merge Cond: (k.r = o.r) - Merge Join (cost=1071.04..1371.04 rows=18000 width=204) Merge Cond: (k.r = p.r) - Sort (cost=535.52..550.52 rows=6000 width=102) Sort Key: k.r - Seq Scan on tk k (cost=0.00..159.00 rows=6000 width=102) - Sort (cost=535.52..550.52 rows=6000 width=102) Sort Key: p.r - Seq Scan on tk p (cost=0.00..159.00 rows=6000 width=102) - Sort (cost=535.52..550.52 rows=6000 width=102) Sort Key: o.r - Seq Scan on tk o (cost=0.00..159.00 rows=6000 width=102) - Sort (cost=535.52..550.52 rows=6000 width=102) Sort Key: n.r - Seq Scan on tk n (cost=0.00..159.00 rows=6000 width=102) - Sort (cost=535.52..550.52 rows=6000 width=102) Sort Key: m.r - Seq Scan on tk m (cost=0.00..159.00 rows=6000 width=102) - Sort (cost=535.52..550.52 rows=6000
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark [EMAIL PROTECTED] wrote: Incidentally this timing is with the 75kB toasted arrays in shared buffers because the table has just been analyzed. If it was on a busy system then just planning the query could involve 75kB of I/O which is what I believe was happening to me way back when I last observed super-long plan times. I'm pretty sure I saw that in some of my tests with larger targets. With a large database and a large target, some of the tables' statistics apparently weren't still cached the first time I planned the query, and I got an extremely long plan time on the first attempt, and then it settled in within a pretty narrow range on repeated plans. I discarded the initial plan time as immaterial for our purposes because a query that's run 300,000 times per day is probably going to keep its statistics in cache most of the time. I was looking at trying to modify the perl script from Robert Haas to run my query at a wide range of target values, collecting analyze and plan times at each. Now that you have an easy-to-create synthetic example, is this still worth it, considering that it would be on 8.2? (If we wait a week or two, I could probably do it on 8.3.) We do have gprof on these systems, although I'd need advice on how to use it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark [EMAIL PROTECTED] wrote: Incidentally, here's a graph of the explain time for that plan. It looks pretty linear to me Except for that sweet spot between 50 and 100. Any idea what's up with that? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
All, I'm thinking that default_statistics_target is disputable enough that we want to move discussion of it to pgsql-performance, and for version 0.1 of the tuning wizard, exclude it. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Thanks for putting out pgtune - it's a sorely needed tool. I had a chance to look over the source code and have a few comments, mostly about python specific coding conventions. - The windows specific try block ( line 16 ) raises a ValueError vs ImportError on my debian machine. Maybe it would be more appropriate to explicitly test platform.system()==Windows? - from ctypes import * ( 18 ) makes the block difficult to read and pollutes the namespace. - on line 45, the try block should probably catch exceptions derived from Exception ( to avoid catching SystemExit and KeyboardInterrupt errors ). ie, except Exception: return None. Also, printing out the expection in debug mode would probably be a good idea ( ie except Exception, e: print e\ return None ) - all classes ( 58, 135, 205 ) are 'old-style' classes. I dont see any reason to use classic classes ( unless Python 2.1 is a support goal? ) To make classes 'new style' classes ( http://www.python.org/doc/2.5.2/ref/node33.html ) they should inherit object. i.e. class PGConfigLine(object): - The doc strings ( 59, 136, 206 ) don't follow standard conventions, described here http://www.python.org/dev/peps/pep-0257/. - Functions also support doc strings ( 342, 351, etc. ) - Tuple unpacking doesn't require the tuple boundaries ( 446 and others ). ie, options, args = ReadOptions() works. This is more of a style comment about the 'Java-ish interface' ( line 112 ), feel free to ignore it. overloading __str__ and __repr__ are the python ways to return string representations of an object. ie, instead of toSting use __str__ and then ( on 197 ) print l or print str(l) instead of print l.toString() for the other methods ( getValue, getLineNumber, isSetting ) I'm assuming you didnt call the attributes directly because you didnt want them to be accidently overwritten. Have you considered the use of properties ( http://www.python.org/download/releases/2.2.3/descrintro/#property )? Also, it would be more clear to mark attributes as private ( i.e. _name or __name, _readable, _lineNumber, _setsParameter ) if you dont want them to be accessed directly. Hope my comments are useful! Thanks again for writing this. -Nathan P.S. I'd be happy to officially review this if it gets to that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Fri, 5 Dec 2008, Nathan Boley wrote: - all classes ( 58, 135, 205 ) are 'old-style' classes. I dont see any reason to use classic classes ( unless Python 2.1 is a support goal? ) I'm not targeting anything older then 2.4, as that's the oldest version I have installed anywhere. 2.4 is still an important target because it's the version that comes with the very popular RHEL4. I've worked on some SuSE 9 boxes that only have 2.3 as recently as last year, but I consider that a legacy version I don't care much about mainly because the kinds of programs I'm usually writing tend to rely heavily on using subprocess, added in 2.4. The answers to all of your other points is that I'm relatively new to Python and just didn't know any of that. I'll convert to new style classes, use __str__ properly, and switch all the getters to use properties now that you've pointed me toward those. Improving the rudimentary doc strings was already on my list, that's not a major target though because this project has its own documentation standards and I can't do anything useful with the output from pydoc. I'm only interested in documenting things about the source code itself in there, anything user-related needs to go in the official docs. I'd be happy to officially review this if it gets to that. Just added you to the wiki in that role. I should have a first version that's of commit candidate quality ready to go in the next week. If you could make a second pass then and see how I did implementing your suggestions, that would be great. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Fri, 2008-12-05 at 17:27 -0500, Greg Smith wrote: On Fri, 5 Dec 2008, Nathan Boley wrote: - all classes ( 58, 135, 205 ) are 'old-style' classes. I dont see any reason to use classic classes ( unless Python 2.1 is a support goal? ) I'm not targeting anything older then 2.4, as that's the oldest version I have installed anywhere. 2.4 is still an important target because it's the version that comes with the very popular RHEL4. And RHEL5 Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Looking at eqjoinsel I think it could be improved algorithmically if we keep the mcv list in sorted order, even if it's just binary sorted order. But I'm not sure what else uses those values and whether the current ordering is significant. I'm also not sure it's the only O(n^2) algorithm there and there's no algorithmic gain unless they're all knocked down. The current code seems to be trying to handle pathological cases where: (1) the operator for which it is invoked doesn't really represent equality and/or (2) the type has an equality operator but no comparison operator. Those are important cases, but maybe we could create an alternative version for the fairly common situation where neither holds? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: On Thu, 4 Dec 2008, Gregory Stark wrote: My point was more that you could have a data warehouse on a non-dedicated machine, you could have a web server on a non-dedicated machine, or you could have a mixed server on a non-dedicated machine. I should just finish the documentation, where there will be a big disclaimer saying THESE SETTINGS ASSUME A SERVER DEDICATED TO POSTGRESQL! That's the context here. Why, after you follow my tuning instructions, you're lucky if the server will run anything but the database afterwards. So you're getting rid of the desktop mode altogether? That's more drastic than I was suggesting. I was suggesting that you want to separate desktop into a separate option form the workload list. What users are getting now is kind of like asking users Would you like a small, medium, large, or diet Coke? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark [EMAIL PROTECTED] wrote: That sounds like it would be an interesting query to analyze in more detail. Is there any chance to could run the complete graph and get a chart of analyze times for all statistics values from 1..1000 ? And log the explain plans to a file so we can look for at what statistics targets the plan changed? Or if the data is public I would be interested in looking at doing it if you want to send it to me. There are some very big tables in that query which contain some confidential data. It would be hard do a lot of runs at high default_statistics_target values because the database analyze time goes so high. If I pick out which tables are used by the query, I might be able to put a script together which loops through analyze of those tables with different targets and capturing run time. There are two problems -- finding the time to set this up, and finding server time windows where other things wouldn't be distorting the results. If you could help with setting up the test script, that would go a long way toward solving the first problem. I think I could reserve a smaller multi-CPU machine with identical data but slower CPUs to run the test. I'll attach the query and plan. You'll note that the query looks a little odd, especially all the (1=1) tests. This is because the application allows users to plug in a variety of selection criteria, and any that aren't used are stubbed out that way. I picked one that was not too atypical for the 300,000 runs per day. I intentionally didn't clean up the white space, but left it just as it was emitted by our framework, in case any of that affected parse/plan time. If you need schema info not obvious from the plan, let me know. -Kevin name-search.sql Description: Binary data QUERY PLAN Unique (cost=473.48..473.57 rows=2 width=173) - Sort (cost=473.48..473.49 rows=2 width=173) Sort Key: caseNo, filingDate, countyName, statusCodeDescr, nameF, nameM, nameL, suffix, dob, caption, countyNo, caseType, isSeal, isPartySeal, lastModified, searchName, isPetitionerSeal - Append (cost=0.00..473.47 rows=2 width=173) - Subquery Scan *SELECT* 1 (cost=0.00..235.74 rows=1 width=169) - Nested Loop (cost=0.00..235.73 rows=1 width=169) - Nested Loop (cost=0.00..126.19 rows=1 width=163) - Nested Loop Left Join (cost=0.00..124.31 rows=1 width=153) - Nested Loop Left Join (cost=0.00..120.71 rows=1 width=153) - Nested Loop Left Join (cost=0.00..117.82 rows=1 width=145) Filter:
Re: [HACKERS] Simple postgresql.conf wizard
Kevin Grittner [EMAIL PROTECTED] writes: There are some very big tables in that query which contain some confidential data. oh well. I'll attach the query and plan. You'll note that the query looks a little odd, especially all the (1=1) tests. That is interesting. I seem to recall Tom is resistant to trying to optimize such queries but actually I've written lots of queries like that myself so I find them interesting. I'll look at the query and see if I can write a similar one using dbt3. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark [EMAIL PROTECTED] wrote: And log the explain plans to a file so we can look for at what statistics targets the plan changed? Well, I can give you explain analyze output for default_statistics_target 10 and 50, for whatever that's worth. Unfortunately I blew my save from the first run with target 50, but it ran much faster than the first run of target 10 (attached), I think it was about 2.8 seconds. That may or may not be due to the right pages being cached by coincidence. The machine with the target 50 run was serving the web app at the time, so there was significant other load, while the other was idle except as a replicaton target at the time of the run. -Kevin Unique (cost=457.72..457.81 rows=2 width=171) (actual time=94927.370..94932.613 rows=2388 loops=1) - Sort (cost=457.72..457.73 rows=2 width=171) (actual time=94927.366..94928.548 rows=2396 loops=1) Sort Key: caseNo, filingDate, countyName, statusCodeDescr, nameF, nameM, nameL, suffix, dob, caption, countyNo, caseType, isSeal, isPartySeal, lastModified, searchName, isPetitionerSeal - Append (cost=0.00..457.71 rows=2 width=171) (actual time=108.847..94913.855 rows=2396 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..227.86 rows=1 width=169) (actual time=108.846..93393.253 rows=2315 loops=1) - Nested Loop (cost=0.00..227.85 rows=1 width=169) (actual time=108.843..93383.339 rows=2315 loops=1) - Nested Loop (cost=0.00..122.30 rows=1 width=163) (actual time=97.389..93174.181 rows=2315 loops=1) - Nested Loop Left Join (cost=0.00..120.42 rows=1 width=153) (actual time=86.182..93144.680 rows=2315 loops=1) - Nested Loop Left Join (cost=0.00..116.82 rows=1 width=153) (actual time=75.841..73181.110 rows=2315 loops=1) - Nested Loop Left Join (cost=0.00..113.88 rows=1 width=145) (actual time=60.527..52107.799 rows=2315 loops=1) Filter: (((WPCT.profileName IS NOT NULL) OR (((C.caseType)::text = ANY (('{PA,JD}'::character varying[])::text[])) AND (NOT C.isConfidential))) AND (((WPCT.profileName)::text 'PUBLIC'::text) OR ((C.caseType)::text 'FA'::text) OR ((C.wcisClsCode)::text '40501'::text))) - Nested Loop (cost=0.00..113.56 rows=1 width=146) (actual time=53.104..52045.586 rows=2601 loops=1) Join Filter: (P.partyType)::text = ANY (('{JV,CH}'::character varying[])::text[])) AND ((C.caseType)::text = 'ZZ'::text)) OR ((P.partyType)::text ALL (('{JV,CH}'::character varying[])::text[]))) AND (((C.caseType)::text ALL (('{CF,CI,CM,CT,FO,TR}'::character varying[])::text[])) OR ((P.partyType)::text = 'DE'::text)) AND C.caseType)::text = ANY (('{JA,JC,JG,JM,JO,JV,JI,TP}'::character varying[])::text[])) AND ((P.partyType)::text = ANY (('{CH,JV}'::character varying[])::text[]))) OR (((C.caseType)::text ALL (('{JA,JC,JG,JM,JO,JV,JI,TP}'::character varying[])::text[])) AND ((P.partyType)::text ALL (('{CH,JV}'::character varying[])::text[] AND (((P.partyType)::text ALL (('{PE,PL,JP}'::character varying[])::text[])) OR C.filingDate)::date '2008-11-01'::date) OR ((C.wcisClsCode)::text '30709'::text)) AND (((C.caseType)::text ALL (('{CV,FA}'::character varying[])::text[])) OR ((C.wcisClsCode)::text '30711'::text) OR (NOT (subplan)) - Index Scan using Party_SearchName on Party P (cost=0.00..3.21 rows=1 width=81) (actual time=6.878..159.418 rows=4097 loops=1) Index Cond: (((searchName)::text = 'HILL,J'::character varying) AND ((searchName)::text 'HILL,K'::character varying)) Filter: ((NOT isSeal) AND ((searchName)::text ~~ 'HILL,J%'::text)) - Index Scan using Case_pkey on Case C (cost=0.00..5.01 rows=1 width=87) (actual time=12.607..12.625 rows=1 loops=4097) Index Cond: (((C.countyNo)::smallint = (P.countyNo)::smallint) AND ((C.caseNo)::text = (P.caseNo)::text)) Filter: ((isExpunge true) AND (NOT (subplan))) SubPlan - Index Scan using HiddenCase_pkey on HiddenCase HCA (cost=0.00..2.14 rows=1 width=0) (actual time=4.487..4.487 rows=0 loops=4097) Index Cond: (((countyNo)::smallint = ($0)::smallint) AND ((caseNo)::text = ($1)::text))
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith wrote: I'm not the sort to be too concerned myself that the guy who thinks he's running a DW on a system with 64MB of RAM might get bad settings, but it's a fair criticism to point that out as a problem. In defense of thinking about very small configurations, I've seen many cases where an enterprise-software salesperson's laptop is running a demo - either in a small virtual machine in the laptop, or on an overloaded windows box.Even though the customer might end up running with 64GB, the sales demo's more likely to be 64MB. In some ways, I'd be more interested in seeing automation of those than the large production systems. Large production systems are likely to have an administrator who's paid to read the documentation and learn how to configure the database. OTOH there tends to be less DBA time available to tune the smaller demo instances that comego as sales people upgrade their laptops; so improved automation would be much appreciated there. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 2008-12-04 at 10:20 -0800, Ron Mayer wrote: Greg Smith wrote: I'm not the sort to be too concerned myself that the guy who thinks he's running a DW on a system with 64MB of RAM might get bad settings, but it's a fair criticism to point that out as a problem. In defense of thinking about very small configurations, I've seen many cases where an enterprise-software salesperson's laptop is running a demo - either in a small virtual machine in the laptop, or on an overloaded windows box.Even though the customer might end up running with 64GB, the sales demo's more likely to be 64MB. Although I get your point, that is a job for sqllite not postgresql. PostgreSQL is not a end all be all solution and it is definitely not designed to be embedded which is essentially what you are suggesting with that kind of configuration. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Well that's a bit if hyperbole. There's a gulf of difference between an embedded use case where it should fit within an acceptable footprint for a desktop app component of maybe a megabyte or so of ram and disk - if we're generous and saying it should run comfortably without having to spec out special server hardware for a demo. That said 64mb of ram seems like hyperbole too. My NSLU2 has 32mb... greg On 4 Dec 2008, at 06:28 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: On Thu, 2008-12-04 at 10:20 -0800, Ron Mayer wrote: Greg Smith wrote: I'm not the sort to be too concerned myself that the guy who thinks he's running a DW on a system with 64MB of RAM might get bad settings, but it's a fair criticism to point that out as a problem. In defense of thinking about very small configurations, I've seen many cases where an enterprise-software salesperson's laptop is running a demo - either in a small virtual machine in the laptop, or on an overloaded windows box.Even though the customer might end up running with 64GB, the sales demo's more likely to be 64MB. Although I get your point, that is a job for sqllite not postgresql. PostgreSQL is not a end all be all solution and it is definitely not designed to be embedded which is essentially what you are suggesting with that kind of configuration. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
In defense of thinking about very small configurations, I've seen many cases where an enterprise-software salesperson's laptop is running a demo - either in a small virtual machine in the laptop, or on an overloaded windows box.Even though the customer might end up running with 64GB, the sales demo's more likely to be 64MB. Although I get your point, that is a job for sqllite not postgresql. PostgreSQL is not a end all be all solution and it is definitely not designed to be embedded which is essentially what you are suggesting with that kind of configuration. It's unlikely that someone would want to write a demo version of the software that runs on a completely different database than the production one... it's also totally unnecessary because PostgreSQL runs great on small systems. In fact, it runs great on small systems with NO TUNING AT ALL. That's exactly why a wizard is needed to set values for systems that aren't small. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake wrote: On Thu, 2008-12-04 at 10:20 -0800, Ron Mayer wrote: Greg Smith wrote: I'm not the sort to be too concerned myself that the guy who thinks he's running a DW on a system with 64MB of RAM might get bad settings, but it's a fair criticism to point that out as a problem. In defense of thinking about very small configurations, I've seen many cases where an enterprise-software salesperson's laptop is running a demo - either in a small virtual machine in the laptop, or on an overloaded windows box.Even though the customer might end up running with 64GB, the sales demo's more likely to be 64MB. Although I get your point, that is a job for sqllite not postgresql. PostgreSQL is not a end all be all solution and it is definitely not designed to be embedded which is essentially what you are suggesting with that kind of configuration. But these sales people are selling a postgres based product. It'd be both much less convincing to demo a different application stack; as well as not a very productive use of the developer's time. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 2008-12-04 at 10:55 -0800, Ron Mayer wrote: Joshua D. Drake wrote: Although I get your point, that is a job for sqllite not postgresql. PostgreSQL is not a end all be all solution and it is definitely not designed to be embedded which is essentially what you are suggesting with that kind of configuration. But these sales people are selling a postgres based product. It'd be both much less convincing to demo a different application stack; as well as not a very productive use of the developer's time. Fair enough, then make sure you are demoing on a platform that can handle PostgreSQL :) Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] wrote: Fair enough, then make sure you are demoing on a platform that can handle PostgreSQL :) There are a lot of good reasons for people to be running an instance of PostgreSQL on a small machine, running it on a machine with other software, or running many clusters of PostgreSQL on a single machine. It may not be possible for this tool to generate useful values for all of these situations, but it seems to me that should be viewed as a limitation of the tool, rather than some moral failing on the part of the people with those needs. Let's cover what we can, document the limitations, and avoid any user-hostile tones -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 2008-12-04 at 14:05 -0600, Kevin Grittner wrote: Joshua D. Drake [EMAIL PROTECTED] wrote: Fair enough, then make sure you are demoing on a platform that can handle PostgreSQL :) There are a lot of good reasons for people to be running an instance of PostgreSQL on a small machine, running it on a machine with other software, or running many clusters of PostgreSQL on a single machine. It may not be possible for this tool to generate useful values for all of these situations, but it seems to me that should be viewed as a limitation of the tool, rather than some moral failing on the part of the people with those needs. Let's cover what we can, document the limitations, and avoid any user-hostile tones I didn't say don't run on a small machine :) I said make sure you run on one that is up for the job. There is a difference. Joshua D. Drake -Kevin -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Ron Mayer wrote: OTOH there tends to be less DBA time available to tune the smaller demo instances that comego as sales people upgrade their laptops; so improved automation would be much appreciated there. I have a TODO list for things that might be interesting to add to a V2.0 version of this tool. I just added an item to there for extending the tuning model usefully into working on systems with smaller amounts of RAM. I'm not opposed to the idea, just don't have any background doing that and I'm trying to stay focused on the more common big-machine problems for the first release. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] wrote: On Thu, 4 Dec 2008, Ron Mayer wrote: OTOH there tends to be less DBA time available to tune the smaller demo instances that comego as sales people upgrade their laptops; so improved automation would be much appreciated there. I have a TODO list for things that might be interesting to add to a V2.0 version of this tool. I just added an item to there for extending the tuning model usefully into working on systems with smaller amounts of RAM. I'm not opposed to the idea, just don't have any background doing that and I'm trying to stay focused on the more common big-machine problems for the first release. I think there needs to be some easy way to choose an option which yields a configuration similar to what we've had in recent production releases -- something that will start up and allow minimal testing on even a small machine. It also occurred to me that if initdb is generating its initial configuration with this, some special handling might be needed for the make check runs. It isn't unusual to want to do a build and check it on a production server. If the generated configuration used in regression tests is assuming it owns the machine there could be a problem. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Kevin Grittner wrote: I think there needs to be some easy way to choose an option which yields a configuration similar to what we've had in recent production releases -- something that will start up and allow minimal testing on even a small machine. But that's the goal of what comes out of initdb already; I'm missing how that is something this script would even get involved in. Is your suggestion to add support for a minimal target that takes a tuned-up configuration file and returns it to that state, or did you have something else in mind? -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] wrote: On Thu, 4 Dec 2008, Kevin Grittner wrote: I think there needs to be some easy way to choose an option which yields a configuration similar to what we've had in recent production releases -- something that will start up and allow minimal testing on even a small machine. But that's the goal of what comes out of initdb already; I'm missing how that is something this script would even get involved in. Is your suggestion to add support for a minimal target that takes a tuned-up configuration file and returns it to that state, or did you have something else in mind? Perhaps I misunderstood some earlier post -- I had gotten the impression that initdb was going to use this utility for the initial postgresql.conf file. If that's not happening, then you can ignore my last post as noise, with my apologies. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith wrote: On Thu, 4 Dec 2008, Ron Mayer wrote: OTOH there tends to be less DBA time available to tune the smaller demo instances that comego as sales people upgrade their laptops; so improved automation would be much appreciated there. I have a TODO list for things that might be interesting to add to a V2.0 version of this tool. I just added an item to there for extending the tuning model usefully into working on systems with smaller amounts of RAM. I'm not opposed to the idea, just don't have any background doing that and I'm trying to stay focused on the more common big-machine problems for the first release. We *have* a configuration for small amounts of RAM; it's our current default configuration. However, I take the point that the workstation calculations should work down to 128MB of system RAM. I'll check. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, Dec 4, 2008 at 5:11 PM, Greg Smith [EMAIL PROTECTED] wrote: On Thu, 4 Dec 2008, Kevin Grittner wrote: I think there needs to be some easy way to choose an option which yields a configuration similar to what we've had in recent production releases -- something that will start up and allow minimal testing on even a small machine. But that's the goal of what comes out of initdb already; I'm missing how that is something this script would even get involved in. Is your suggestion to add support for a minimal target that takes a tuned-up configuration file and returns it to that state, or did you have something else in mind? I humbly suggest that the memory-related settings output by the tool don't need to match what initdb outputs. But the values of checkpoint_segments, constraint_exclusion, and default_statistics_target probably should, given a small mixed-mode database. You've probably all figured out by now that I, personally, in my own opinion, think that default_statistics_target = 10 is just fine for such a database, but if a decision is made to change that number, so be it. Just let's please change it both places, rather than letting contrib/pgtune be a backdoor to get around not liking what initdb does. And similarly with the other parameters... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Robert Haas wrote: Just let's please change it both places, rather than letting contrib/pgtune be a backdoor to get around not liking what initdb does. And similarly with the other parameters... Someone running pgtune has specifically asked for their database to be tuned for performance; someone running initdb has not. It's not a backdoor, the defaults for a tuned small system and what comes out of initdb have completely different priorities. The linking of the two that keeps happening in this thread makes no sense to me, and frankly I consider the whole topic an off-topic distraction. I never had any intention of making changes to the basic configuration that comes out of initdb, the burden of proof for making a change there is far higher than I feel justified in clearing. The last time I got an initdb setting changed I had days worth of focused test data to present with the suggestion. If I take a poll of half a dozen experienced PostgreSQL administrators with performance tuning background (which is basically where the pgtune settings are coming from) and I hear the same story about a setting from most of them, that's good enough for me to justify a settings change for this tool; the whole idea is to pool expert opinion and try to distill it into code. But that's not good enough for changing that setting for everybody who installs the database. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 2008-12-04 at 21:51 -0500, Greg Smith wrote: On Thu, 4 Dec 2008, Robert Haas wrote: Just let's please change it both places, rather than letting contrib/pgtune be a backdoor to get around not liking what initdb does. And similarly with the other parameters... Someone running pgtune has specifically asked for their database to be tuned for performance; someone running initdb has not. It's not a backdoor, Right. the defaults for a tuned small system and what comes out of initdb have completely different priorities. IMO the priority of initdb is, Get the damn thing running The linking of the two that keeps happening in this thread makes no sense to me, and frankly I consider the whole topic an off-topic distraction. Agreed. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Looks like I need to add Python 2.5+Linux to my testing set. I did not expect that the UNIX distributions of Python 2.5 would ship with wintypes.py at all. I think I can fix this on the spot though. On line 40, you'll find this bit: except ImportError: Change that to the following: except ImportError,ValueError: That didn't work, same error message. And it should pass that point. If it doesn't, you can try the completely general: except: That worked. The settings that this initially spit out (I guess it defaults to mixed mode) didn't look too sane to me, because as discussed elsewhere on this thread 50 is not a reasonable value for default_statistics_target for my installation. It also wanted to set constraint_exclusion to on, which I'm pretty confident is useless. Then I tried -T web and got what seemed like a more reasonable set of values. But I wasn't sure I needed that many connections, so I added -c 150 to see how much difference that made. Kaboom! $ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150 Traceback (most recent call last): File ./pgtune, line 463, in module wizardTune(config,options,settings) File ./pgtune, line 403, in wizardTune 'web':mem/con, 'oltp':mem/con,'dw':mem/con/2, TypeError: unsupported operand type(s) for /: 'int' and 'str' I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote: Looks like I need to add Python 2.5+Linux to my testing set. I did not expect that the UNIX distributions of Python 2.5 would ship with wintypes.py at all. I think I can fix this on the spot though. On line 40, you'll find this bit: except ImportError: Change that to the following: except ImportError,ValueError: That didn't work, same error message. And it should pass that point. If it doesn't, you can try the completely general: except: That worked. The settings that this initially spit out (I guess it defaults to mixed mode) didn't look too sane to me, because as discussed elsewhere on this thread 50 is not a reasonable value for default_statistics_target for my installation. It also wanted to set constraint_exclusion to on, which I'm pretty confident is useless. Then I tried -T web and got what seemed like a more reasonable set of values. But I wasn't sure I needed that many connections, so I added -c 150 to see how much difference that made. Kaboom! $ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150 Traceback (most recent call last): File ./pgtune, line 463, in module wizardTune(config,options,settings) File ./pgtune, line 403, in wizardTune 'web':mem/con, 'oltp':mem/con,'dw':mem/con/2, TypeError: unsupported operand type(s) for /: 'int' and 'str' I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. Why? I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. What does checkpoint_segments have to do with the size of the database? Joshua D. Drake ...Robert -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. Why? Because both of those settings are strictly worse for my database than the defaults. I don't have any partitioned tables, and see: http://archives.postgresql.org/pgsql-hackers/2008-11/msg01837.php I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. What does checkpoint_segments have to do with the size of the database? It seems unlikely that you would want 256 MB of checkpoint segments on a database that is only 100 MB (or even 500 MB). But you might very well want that on a database that is 1 TB. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote: I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. Why? Well did you have any response to what I posited before? I said mixed should produce the same settings that the default initdb settings produce. At least on a moderately low-memory machine that initdb targets. It sure seems strange to me to have initdb which presumably is targeting a mixed system -- where it doesn't know for sure what workload will be run -- produce a different set of values than the tuner on the same machine. I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. What does checkpoint_segments have to do with the size of the database? I had the same reaction but I think he's right. checkpoint_segments is the maximum amount of space you want the WAL to take up (ideally). Presumably on a small database you don't want hundreds of megabytes of WAL for a 10M database. But on a terabyte data warehouse sitting on a big SAN you're not going to be concerned with how much space the WAL files are taking. In fact, really it would be nice if we allowed units of space (MB, GB, etc) for checkpoint_segments. I used to think of checkpoint_segments in terms of transaction rate and maximum tolerable recovery time but really if those are your constraints you're better off using checkpoint_timeout I think. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Well did you have any response to what I posited before? I said mixed should produce the same settings that the default initdb settings produce. At least on a moderately low-memory machine that initdb targets. I'm actually really skeptical of this whole idea of modes. The main thing mode does, other than set max connections, is handle desktop differently than other modes by decreasing shared_buffers and effective_cache_size by 4x and work_mem by 3x. And the default settings for max_connections are a SWAG that could easily be way off for any particular installation. I think it would be more useful to get rid of modes, accept the user is going to have to specify max_connections if the default of, say, 100 is not reasonable, and handle the desktop case by telling the user to rerun the tool overriding the system memory with a lower value. I'm not sure if you've thought about this, but there is also a difference between max_connections and maximum LIKELY connections. For example my apps don't have too many users, since they are internal-facing. But setting max_connections to 100 gives me a nice buffer just in case everyone decides to log on at once. Still, for performance reasons, I'd prefer to calculate based on a more likely scenario, where the concurrent user count might be only 10 or 20. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
I can see an argument about constraint_exclusion but default_statistics_target I don't. Why not? I don't want to accept a big increase in ANALYZE times (or planning times, though I'm really not seeing that at this point) without some benefit. It seems unlikely that you would want 256 MB of checkpoint segments on a database that is only 100 MB (or even 500 MB). But you might very well want that on a database that is 1 TB. It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. So why do we have this parameter at all? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. Remember postgresql only creates the segments when it needs them. Should we change the initdb output then? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: If you are concerned about the analyze time between 10, 50 and 150, I would suggest that you are concerned about the wrong things. Remember I can't rule that out. What things do you think I should be concerned about? ISTM that default_statistics_target trades off ANALYZE time and query planning time vs. the possibility of better plans. If the former considerations are not an issue for dst = 50, then maybe we should emit 50 by default. But the limited evidence that has been published in this forum thus far doesn't support that contention. It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. So why do we have this parameter at all? Excellent question, for a different thread :) I think the rhetorical answer is so that we don't fill up the disk, which gets us back to database size. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 2008-12-03 at 17:33 -0500, Robert Haas wrote: On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: If you are concerned about the analyze time between 10, 50 and 150, I would suggest that you are concerned about the wrong things. Remember I can't rule that out. What things do you think I should be concerned about? Your databases production performance with the change of the parameter. Quite a bit more often than not, your problem (if you have one) isn't going to be default_statistics_target is too high. ISTM that default_statistics_target trades off ANALYZE time and query planning time vs. the possibility of better plans. If the former considerations are not an issue for dst = 50, then maybe we should emit 50 by default. But the limited evidence that has been published in this forum thus far doesn't support that contention. Actually there are years worth of evidence in these archives. Not that the 50 is the right number but that the current settings are definitely wrong and that higher ones are needed. That people generally start around 100 and go from there, except where they don't and then someone like Tom, I or some other person says, Oh you need to increase default_statistics_target. There is no empirical evidence that 50 is the right setting but there is more than enough anecdotal evidence to suggest that 50 is a lot better than 10 and that even higher than 50 is reasonable. In an effort to follow the PostgereSQL conservative mantra, 50 is a good compromise. It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. So why do we have this parameter at all? Excellent question, for a different thread :) I think the rhetorical answer is so that we don't fill up the disk, I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. *If* hard drive is a space (that much of) a concern then you are having other problems already that pgtune won't satisfy and you should be manually tuning the conf in the first place. Joshua D. Drake which gets us back to database size. ...Robert -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Robert Haas [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: If you are concerned about the analyze time between 10, 50 and 150, I would suggest that you are concerned about the wrong things. Remember I can't rule that out. What things do you think I should be concerned about? ISTM that default_statistics_target trades off ANALYZE time and query planning time vs. the possibility of better plans. If the former considerations are not an issue for dst = 50, then maybe we should emit 50 by default. But the limited evidence that has been published in this forum thus far doesn't support that contention. One more data point to try to help. While the jump from a default_statistics_target from 10 to 1000 resulted in a plan time increase for a common query from 50 ms to 310 ms, at a target of 50 the plan time was 53 ms. Analyze time was 7.2 minutes and 18.5 minutes for targets of 10 and 50. This is an 842 GB database on an 8 processor (3.5 GHz Xeon) machine with 64 GB RAM running (soon to be updated) PostgreSQL 8.2.7. Based on the minimal plan time increase of this test, we're going to try 50 in production and see how it goes. It's worth pondering that at the target of 1000, had we put that into production, running this query 300,000 times per day would have used 21 hours and 40 minutes of additional CPU time per day on planning the runs of this one query, while a target of 50 only consumes an additional 15 minutes of 3.5 GHz CPU time per day. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Gregory Stark wrote: It sure seems strange to me to have initdb which presumably is targeting a mixed system -- where it doesn't know for sure what workload will be run -- produce a different set of values than the tuner on the same machine. It's been a long time since the output from initdb was targeting anything but a minimal system with an untuned kernel and limited resources. If you invert the normal tuning recommendations, as if its initial configuration were the output from typical practice, it would be aiming at a system with approximately 128MB of RAM. That feels about right to me; when I had 128MB of RAM in my high-end P2-300 server running PG 7.0, 32MB of shared_buffers was huge and 3 checkpoints segments was plenty. I don't recall regularly dirtying things fast enough to see checkpoints occuring too fast then like you can do trivially nowadays. Here in 2008, I push checkpoint_segments up to 10 even for the most trivial apps lest the logs fill with those buggers the first time I run an update on a table. Right now, my program doesn't fiddle with any memory settings if you've got less than 256MB of RAM. Were someone to champion the idea that *nothing* should be fiddled with in those cases, that's not an unreasonable position. I'm not the sort to be too concerned myself that the guy who thinks he's running a DW on a system with 64MB of RAM might get bad settings, but it's a fair criticism to point that out as a problem. In fact, really it would be nice if we allowed units of space (MB, GB, etc) for checkpoint_segments. That's a good way to think about this, let's run with that for a minute. The values I'm throwing in there look like this (if your tab stops aren't at 8 characters this will suck): Completion Max Max TypeSegsTarget SegmentsUsage web 8 0.7 23 368MB oltp16 0.9 47 752MB dw 64 0.9 187 3GB mixed 16 0.9 47 752MB desktop 3 0.5 9 144MB Is 368MB of overhead unreasonable for a web application database today, where you can get a mirrored pair of disks for under $1/GB? It's only the DW case that even starts to leave trivial territory. Your example of somebody who thinks the overhead is too high on their 10MB database is already being blown away even at the default of 3 segments (assuming that data has enough churn on it to go through that many segments ever--if it doesn't then the maximum doesn't matter anyway). The reality here is that it's the recovery playback time that's the real bear. If I were trying to argue against me, what would be more persuasive is some tests showing how long it takes to sort through, cleanup, and replay the appropriate portions of as many as 47 segments worth of WAL after an unclean shutdown when checkpoint_segments=16. Given how long that takes, it might be possible to find a modern system takes a while to process that much WAL volume. It's pretty rare I run into that (usually only after I do something abusive), whereas complaints about the logs filling with checkpoint warnings on systems set to the default seem to pop up all the time. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Kevin Grittner [EMAIL PROTECTED] writes: One more data point to try to help. While the jump from a default_statistics_target from 10 to 1000 resulted in a plan time increase for a common query from 50 ms to 310 ms, at a target of 50 the plan time was 53 ms. That sounds like it would be an interesting query to analyze in more detail. Is there any chance to could run the complete graph and get a chart of analyze times for all statistics values from 1..1000 ? And log the explain plans to a file so we can look for at what statistics targets the plan changed? Or if the data is public I would be interested in looking at doing it if you want to send it to me. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I started to do this for you last week but got side-tracked. Do you have any time for this? I can do it if you have a script. So how big should a minimum postgres install be not including your data? Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever number we pick (or allow the user to pick) will determine how large this value ought to be. And incidentally also provide a bound on autovacuum_max_freeze_age as Heikki pointed out on another thread. I fail to see what any of the above paragraph has to do with checkpoint_segments. Anyway, I have made my arguments. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Robert Haas wrote: I'm not sure if you've thought about this, but there is also a difference between max_connections and maximum LIKELY connections. It's actually an implicit assumption of the model Josh threw out if you stare at the numbers. The settings for work_mem are twice as high per connection in the Web+OLTP application cases, based on the assumption that you're just not going to get everybody doing sorting at once in those situations. I toyed with exposing that as an explicit connection load duty factor, then remembered I was trying to deliver something rather than tweak the parameters forever. It may be a bit too aggressive as written right now in those cases. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Guillaume Smet wrote: - it would be really nice to make it work with Python 2.4 as RHEL 5 is a Python 2.4 thing and it is a very widespread platform out there, The 2.5 stuff is only required in order to detect memory on Windows. My primary box is RHEL5 and runs 2.4, it works fine there. - considering the audience of this tool, I think you should explain in the usage text which type of workload implies each database type (DW, OLTP, Web, Mixed, Desktop). Once I'm done with the docs I'll refer over to those, it's too much to put into the usage without cluttering it. - it would be nice to be able to define the architecture (32-64 bits) from the command line (especially considering I won't be able to run it on our target boxes which are all RHEL 5 :)) I'm starting to lean toward making everything that gets detected as also being possible to override, for these case. I want to make this just work in as many cases as possible, but the situation where someone is configuring/testing on a system other than the server is pretty common. Now that I think about it I often setup configs on my 32-bit laptop and them move them over onto 64-bit servers. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: On Wed, 3 Dec 2008, Gregory Stark wrote: It sure seems strange to me to have initdb which presumably is targeting a mixed system -- where it doesn't know for sure what workload will be run -- produce a different set of values than the tuner on the same machine. It's been a long time since the output from initdb was targeting anything but a minimal system with an untuned kernel and limited resources. If you invert the normal tuning recommendations, as if its initial configuration were the output from typical practice, it would be aiming at a system with approximately 128MB of RAM. That feels about right to me; when I had 128MB of RAM in my high-end P2-300 server running PG 7.0, 32MB of shared_buffers was huge and 3 checkpoints segments was plenty. I don't recall regularly dirtying things fast enough to see checkpoints occuring too fast then like you can do trivially nowadays. Well I think there's also an assumption in initdb that Postgres can't assume it's on a dedicated machine. So whether it's 32MB on a dedicated 128MB machine or 32MB on a 256MB machine where it's only expected to be half the workload of the machine it works out to about the same thing. Right now, my program doesn't fiddle with any memory settings if you've got less than 256MB of RAM. What I'm suggesting is that you shouldn't have to special case this. That you should expect whatever formulas you're using to produce the same values as initdb if they were run on the same machine initdb is targeting. But actually I'm more concerned with the *non* memory related parameters. It may make sense to tweak those one way or the other for oltp or dss but mixed should be exactly what initdb produces since that's exactly what it's targeting -- a system that will have a wide mixture of queries and must function reasonably well for both data warehouse and oltp queries. Completion Max Max Type SegsTarget SegmentsUsage web 8 0.7 23 368MB oltp 16 0.9 47 752MB dw64 0.9 187 3GB mixed 16 0.9 47 752MB desktop 3 0.5 9 144MB (incidentally using tab stops in emails is probably a bad idea because of quoting as above) Uhm, I hadn't actually seen this list before. I don't understand how web is different from oltp. A web service really is just one (very typical) example of an oltp application. And desktop seems like an outlier here. I suppose it's meant to capture whether postgres is on a dedicated box? But it's possible to have a non-dedicated oltp application or non-dedicated data warehouse box just as easily. It's an orthogonal issue from the oltp/data-warehouse axis. Is 368MB of overhead unreasonable for a web application database today Well I think it's more than most people expect a single application install to take up before they start putting data in it. It would probably work better if we were asking how big their database was and then could say, well, you said you had 10G of data so 300MB of overhead isn't going to be so bad. The reality here is that it's the recovery playback time that's the real bear. I agree, but then that's what checkpoint_timeout is for, no? It might take longer to replay but the recovery time should bear some relation to how long it took to write out the wal. More so than to the sheer size of the wal. whereas complaints about the logs filling with checkpoint warnings on systems set to the default seem to pop up all the time. filling? The cure to having too much space taken up by logs is to take up space with, well, logs? The logs are filling up with warnings which explain exactly what parameter to adjust. Are there really complaints about this? I'm really beginning to think the root of the problem is the name. If it were transaction_log_max_space and measured in megabytes people would be happy to say ok, I'll make space for 100MB of logs or whatever. Today they don't know what to set it to or what the impact of setting it will be. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Robert Haas wrote: Then I tried -T web and got what seemed like a more reasonable set of values. But I wasn't sure I needed that many connections, so I added -c 150 to see how much difference that made. Kaboom! That and the import errors fixed in the version attached (just replacing the script, not its data), thanks for testing that out. I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. The statistics stuff is obviously a broader discussion, will let that rage in existing threads. The reason for setting constraint_exclusion in the mixed case is that people who just pick the defaults without reading anything will get a configuration that supports partitions usefully. One of the HINTs I intend to throw out for that specific case is that they should turn it off if they don't ever intend to use paritions. The idea of the mixed mode is that you want to reduce the odds someone will get a massively wrong configuration if they're not paying attention. Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? That's a tough call; I could invert things, so that it defaults to off in mixed mode, as always, and just produces a HINT to turn it on. I don't have a really strong opinion there either way. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD pgtune.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? Surely that's the same question we faced when deciding what the Postgres default should be? That and the unstated other question Is someone more likely to use partitions without reading the manual or not use partitions without reading the manual about the down-sides of constraint_exclusion (in the partitioning section) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I started to do this for you last week but got side-tracked. Do you have any time for this? I can do it if you have a script. Well, I can send you what I have so far but it still needs more work. I only got as far as the graphs I sent earlier which don't include scanning for changed plans. Also, if you have any sample databases with skewed data sets that would be interesting. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark escribió: Joshua D. Drake [EMAIL PROTECTED] writes: I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. Well my phone has 16G of RAM, why not 1 ? I don't think the disk space used is the only consideration here. You also have to keep recovery time in mind. If you set it to 1000, recovery would take way too long. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I started to do this for you last week but got side-tracked. Do you have any time for this? I can do it if you have a script. So how big should a minimum postgres install be not including your data? Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever number we pick (or allow the user to pick) will determine how large this value ought to be. And incidentally also provide a bound on autovacuum_max_freeze_age as Heikki pointed out on another thread. I fail to see what any of the above paragraph has to do with checkpoint_segments. Are we all on the same page on what checkpoint_segments does? It's the number of segments of WAL log postgres will allow to accumulate before it triggers a checkpoint and trims off any it doesn't need. That means even if your database is just churning updating the same records over and over the WAL will grow to this size before Postgres makes any attempt to trim it (unless it hits checkpoint_timeout but that's a separate tunable). If you're loading data all your data will go into the heap *and* the wal log until it hits this size and triggers a checkpoint. So this is the minimum amount of extra space you need in addition to your data for a functioning postgres database install not including your data. If you don't anticipate postgres using this much space and set aside enough space for it, your database is at risk of randomly stopping and producing errors when it can't create new log files you told it needs. It's interesting that this is the *only* parameter we can come up with that really directly depends on disk space. The tables are obviously going to be as big as they have to be and there's not much to do about that. If we could eliminate this parameter it would be a lot nicer. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 2008-12-03 at 22:17 -0300, Alvaro Herrera wrote: Gregory Stark escribió: Joshua D. Drake [EMAIL PROTECTED] writes: I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. Well my phone has 16G of RAM, why not 1 ? I don't think the disk space used is the only consideration here. You also have to keep recovery time in mind. If you set it to 1000, recovery would take way too long. Well certainly but the original argument that came back was, (from Robert Haas): It seems unlikely that you would want 256 MB of checkpoint segments on a database that is only 100 MB (or even 500 MB). But you might very well want that on a database that is 1 TB. My whole point is that: 1. It seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. (the checkpoint segments) 2. taking up space is such a minute concern in comparison to the potential benefit. Recovery is certainly a consideration but let's be realistic it is the last consideration because it is the least likely to happen. What is more likely to happen is IO spikes because we are recycling logs too much. I know we have some other facilities to deal with that now too but it doesn't completely negate the problem and in my opinion, increasing the checkpoint_segments provides no perceivable downside in production use but does provide significant perceivable upside. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Gregory Stark wrote: Right now, my program doesn't fiddle with any memory settings if you've got less than 256MB of RAM. What I'm suggesting is that you shouldn't have to special case this. That you should expect whatever formulas you're using to produce the same values as initdb if they were run on the same machine initdb is targeting. The reason that approach isn't taken is that the model here assumes the OS overhead is negligable relative to everything else going on. If you've only got a small amount of RAM, that assumption is so badly broken that you can't just extend the curves for everything down to there and expect that what comes out will make any sense. I started to make a more complicated bit that did scale down to the bottom by modeling the overhead better, Josh talked me out of doing it for now. But actually I'm more concerned with the *non* memory related parameters. It may make sense to tweak those one way or the other for oltp or dss but mixed should be exactly what initdb produces since that's exactly what it's targeting -- a system that will have a wide mixture of queries and must function reasonably well for both data warehouse and oltp queries. The only way this line of discussion will go is toward talking about what should be changed in initdb to make it more representative of the current real world, and I know that's not going anywhere (see default_statistics_target=10). The idea that the sample configuration is tuned usefully for any application whatsoever gets nothing from me but a chuckle. And desktop seems like an outlier here. I suppose it's meant to capture whether postgres is on a dedicated box? But it's possible to have a non-dedicated oltp application or non-dedicated data warehouse box just as easily. That's the target for something that's not a dedicated server--a desktop PC you use as a general workstation, maybe you're installing PostgreSQL as a developer that's competing with your web server and other apps; something like that. There might be a better name for that. Is 368MB of overhead unreasonable for a web application database today Well I think it's more than most people expect a single application install to take up before they start putting data in it. Segments don't get allocated until you churn through that much WAL activity; that figure is an upper-bound after you've pushed more than that worth of data through WAL and into the database. The only example where this overhead isn't dwarfed by the size of the resulting database is where some small number of records are inserted, then constantly updated and vacuumed. And you know what? The person doing that is likely to really benefit from having checkpoint_segments set to a larger value. Update and vacuum heavy workloads are exactly the sort where you end up checkpointing too often with the default parameters. I'm really beginning to think the root of the problem is the name. If it were transaction_log_max_space and measured in megabytes people would be happy to say ok, I'll make space for 100MB of logs or whatever. Today they don't know what to set it to or what the impact of setting it will be. Unless they do something crazy like read the documentation: http://www.postgresql.org/docs/8.3/static/wal-configuration.html There will always be at least one WAL segment file, and will normally not be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 files. Each segment file is normally 16 MB (though this size can be altered when building the server). You can use this to estimate space requirements for WAL. Too complicated for most people you say? I agree; that's why I put some annotated examples for what those translate into http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server like 32 (checkpoint every 512MB). What fun. I'm beginning to remember why nobody has ever managed to deliver a community tool that helps with this configuration task before. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
The idea of the mixed mode is that you want to reduce the odds someone will get a massively wrong configuration if they're not paying attention. Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? That's a tough call; I could invert things, so that it defaults to off in mixed mode, as always, and just produces a HINT to turn it on. I don't have a really strong opinion there either way. I think that the strong feelings about default_statistics_target and constraint_exclusion come from the fact that when they are too low (in the first case) or off (in the second case), you can get very, very bad query plans. The penalties in the opposite direction are more subtle. If they're so subtle that we don't care about incurring them, then let's change initdb too. If not, then let's not have the tuning tool generate them by default either. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
What fun. I'm beginning to remember why nobody has ever managed to deliver a community tool that helps with this configuration task before. I have to say I really like this tool. It may not be perfect but it's a lot easier than trying to do this analysis from scratch. And we are really only arguing about a handful of settings. It wouldn't take a lot to convince me that checkpoint_segments=3 is too low. I easily blew through that testing the bulk-insert tuning patch. I'm curious why wal_buffers is being set to 512 * checkpoint_segments. Are they related? The default value for wal_buffers is only 64 kB, which means someone thought you shouldn't need much space for this at all, but this suggests a setting in the 4-32 MB range, an increase of ~2 orders of magnitude. For all I know that could be right but it's a big increase. Regarding the religious war now in progress, I think it would be awfully good for someone to offer some thoughts on how to figure out which particular columns on which particular tables need a higher statistics target. That might allow us to either (a) build a wizard that helps you find those problems that could perhaps be used alongside this one or (b) incorporate those same smarts into core. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: On Thu, 4 Dec 2008, Gregory Stark wrote: What I'm suggesting is that you shouldn't have to special case this. That you should expect whatever formulas you're using to produce the same values as initdb if they were run on the same machine initdb is targeting. The reason that approach isn't taken is that the model here assumes the OS overhead is negligable relative to everything else going on. ok that does make sense. But the non-memory parameters... ... I've cut part of my response for a separate thread ... And desktop seems like an outlier here. I suppose it's meant to capture whether postgres is on a dedicated box? But it's possible to have a non-dedicated oltp application or non-dedicated data warehouse box just as easily. That's the target for something that's not a dedicated server--a desktop PC you use as a general workstation, maybe you're installing PostgreSQL as a developer that's competing with your web server and other apps; something like that. There might be a better name for that. My point was more that you could have a data warehouse on a non-dedicated machine, you could have a web server on a non-dedicated machine, or you could have a mixed server on a non-dedicated machine. I don't see how you would decide whether to set enable_constraint_exclusion for desktop for example. Is 368MB of overhead unreasonable for a web application database today Well I think it's more than most people expect a single application install to take up before they start putting data in it. Segments don't get allocated until you churn through that much WAL activity; that figure is an upper-bound after you've pushed more than that worth of data through WAL and into the database. The only example where this overhead isn't dwarfed by the size of the resulting database Right, well, no, it won't be dwarfed -- it'll be about the same size. Ie, if you load 100MB into the database there'll be about 100MB of logs generated. Up to the point where you hit this maximum upper bound. But yes, right that it's the upper bound for the extra space allocated in addition to the size of the database. And how much extra space should we allocate? I don't see why this extra space bound should depend on the type of OLTP vss DSS workload. Only on how much disk space is available that the admin is willing to dedicate to Postgres. Assuming an admin of a 1TB server is willing to dedicate 1GB to logs and the admin of a 1GB server would be annoyed to have to throw more than a few hundred megs seems as reasonable a place as any to start. is where some small number of records are inserted, then constantly updated and vacuumed. And you know what? The person doing that is likely to really benefit from having checkpoint_segments set to a larger value. Update and vacuum heavy workloads are exactly the sort where you end up checkpointing too often with the default parameters. Well there are a few problems with this. a) we have HOT now so you don't need any vacuums to be part of the picture. b) if you're updating the same pages over and over again a checkpoint will be super-quick since there will only be a few pages to write out so no you don't really need some large checkpoint_segments for any performance reason. Josh's logic is impeccable -- for the specific use case he's describing of a truly dedicated server with enough disk space for a major production database. But not every install is going to have gigabytes of space reserved for it and not every admin is going to realize that he really should set aside gigabytes of space even though he only expects his database to be a few megabytes. I'm really beginning to think the root of the problem is the name. If it were transaction_log_max_space and measured in megabytes people would be happy to say ok, I'll make space for 100MB of logs or whatever. Today they don't know what to set it to or what the impact of setting it will be. Unless they do something crazy like read the documentation: Well we know nobody does that :/ It's great that Postgres has such great documentation but whenever we have the chance to replace something with an option which doesn't need any documentation that would be even better. I'm just exploring whether that's an option here. What fun. I'm beginning to remember why nobody has ever managed to deliver a community tool that helps with this configuration task before. Well I don't think this is why. Nobody's even tried to do this side of things before. They always got bogged down in trying to parse config files and such. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Gregory Stark wrote: Greg Smith [EMAIL PROTECTED] writes: Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? Surely that's the same question we faced when deciding what the Postgres default should be? Gosh, you're right. I'm really new here, and I just didn't understand how things work. I should have known that there was lots of thorough research into that setting before the default was set. (hangs head in shame) Wait, what list am I on? pgsql-hackers? Oh, crap, that can't be right at all then. This one is actually an interesting example of how this stuff ends up ossified without being revisited, I'm glad you brought it up. First we have to visit the 8.1 and 8.2 documentation. There we find the real reason it originally defaulted to off: http://www.postgresql.org/docs/8.1/static/runtime-config-query.html Currently, constraint_exclusion is disabled by default because it risks incorrect results if query plans are cached if a table constraint is changed or dropped, the previously generated plan might now be wrong, and there is no built-in mechanism to force re-planning. It stayed off for that reason for years. Then the plan invalidation stuff went into 8.3 that made this no longer true. Bruce even removed the item from the TODO list that used to say that constraint_exclusion should be improved to allow it to be used for all statements with little performance impact. Then a couple of months later, when the 8.3 docs were being worked on, Tom updated the text to remove the obsolete warning about the plan risks: http://archives.postgresql.org/pgsql-committers/2007-03/msg00372.php Leaving only the leftovers of the original caveat about how it can also cause some overhead as the reason for why it was still off--a concern which was certainly more serious when that text was written in 2005 than it is today for multiple reasons. How much was that overhead lowered by the work done in 8.3? I can't find any public information suggesting that was ever even discussed. The only thing I found when poking around looking for it is that Tom had expressed some concerns that the proof overhead was too still large back in 2006: http://archives.postgresql.org/pgsql-committers/2006-02/msg00035.php But you know what? The cached proof comparison bit Tom commited a couple of weeks ago shifted the mechanics of the overhead for this specific case around, so even if we did have 8.3 results they'd need to get re-run at this point anyway. See below for more on what might be different soon. So, if you want to say that turning on constraint_exclusion by default is a horrible idea because it adds significant overhead, and you have any sort of evidence that will still be true for 8.4 on the kind of hardware 8.4 is likely to run on, I would greatly appreciate that information. But presuming that serious thought must have went into every decision made about what the defaults for all the performance-related parameter in the postgresql.conf is something we all know just ain't so. What I see is a parameter that doesn't add enough overhead relative to query execution time on today's systems that I've noticed whether it was on or off, one that's set to off only by historical accident combined with basic conservatism (mainly from Tom far as I can tell, he's a nice reliable source for that). Whereas if it's accidentally set wrong, it can lead to massively wrong plans. I'm not sure what the right move here is, but the appeal to authority approach for defending the default here isn't going to work on me. That and the unstated other question Is someone more likely to use partitions without reading the manual or not use partitions without reading the manual about the down-sides of constraint_exclusion (in the partitioning section) Have you started thinking about the implications of http://archives.postgresql.org/message-id/[EMAIL PROTECTED] yet? It is a bold new world of people who partition with less time stuck in the manual first we approach, and I was very much thinking about that when mulling over whether I agreed with Josh's suggestion to put that into the default mixed settings before I went with it (that's right--I wrote all the above and it wasn't even my idea originally). If that doesn't make it into 8.4 I will yield to your statement of the boring, manual-reading status quo still being on target. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: On Thu, 4 Dec 2008, Gregory Stark wrote: Greg Smith [EMAIL PROTECTED] writes: Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? Surely that's the same question we faced when deciding what the Postgres default should be? Gosh, you're right. I'm really new here, and I just didn't understand how things work. I should have known that there was lots of thorough research into that setting before the default was set. (hangs head in shame) Oh no, in this case I meant just that if we want to change it we should change it in *both* places. That the argument you're making applies just as much to the Postgres default as it does to the mixed workload default in the tuner. But I admit I totally didn't remember that the main reason it was originally off was the lack of plan invalidation. That does rather change things. Perhaps we should be enabling it now. If we do though, it shouldn't default one way and then get randomly flipped by a tool that has the same information to make its decision on. What I'm saying is that mixed is the same information that initdb had about the workload. If we do change this then I wonder if we need the parameter at all. I mean, we don't generally have parameters to turn off random parts of the optimizer... How much was that overhead lowered by the work done in 8.3? I can't find any public information suggesting that was ever even discussed. Well it does have to compare every constraint with every clause and do a moderately complex analysis. It's never going to be super-fast for complex queries. But on the other hand it should drop out pretty fast if the tables haven't got any constraints so it does seem like it's only hurting people when they would want it on anyways. What I see is a parameter that doesn't add enough overhead relative to query execution time on today's systems that I've noticed whether it was on or off There's a danger in this. There's some famous, probably apocryphal, example of a fast food restaurant that taste tested their menu and got great results. Then they cheapened an ingredient and their testers couldn't taste the difference. Then they did that with another ingredient and another and so on and each time the testers couldn't taste a difference. And in the end they ended up rolling out a dramatically inferior menu which people panned compared to the original... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, Dec 1, 2008 at 9:32 PM, Greg Smith [EMAIL PROTECTED] wrote: On Mon, 1 Dec 2008, Mark Wong wrote: So then I attempted to see if there might have been difference between the executing time of each individual query with the above parameters. The queries that don't seem to be effected are Q1, Q4, Q12, Q13, and Q15. Q17 suggests that anything higher than default_statistics_target=10 is an improvement. The rest of the queries appears not to follow any particular trend with respect to default_statistics_target. The interesting ones are Q2, Q9, Q17, Q18, and Q20, and that data is much more useful than the summary. As you mention, Q17 improves significantly with a higher target. All of the rest are dramatically slower in one or both tests going from default_statistics_target=10 to 100. Those look like the most useful data points on the X axis--the increases from 100 up to 1000 aren't particularly interesting in most of these, except in Q20 where the Power Test seems to oscillate between degrees of good and bad behavior seemingly at random. My picks for the most useful graphs from the long list Mark sent: http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png I think the tests you could consider next is to graph the target going from 10 to 100 in steps of 10 just for those 5 queries. If it gradually degrades, that's interesting but hard to nail down. But if there's a sharp transition, getting an explain plan for the two sides of that should provide some insight. I'm really more interested in the ones that slowed down than the one that improved, understanding that might finally provide some evidence against increasing it by default. I've updated the charts to include results from setting default_statistics_target from 20-90. The links to the charts are the same. The links to the raw data are in http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52. Regards, Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
If we do though, it shouldn't default one way and then get randomly flipped by a tool that has the same information to make its decision on. What I'm saying is that mixed is the same information that initdb had about the workload. +1. If we do change this then I wonder if we need the parameter at all. I mean, we don't generally have parameters to turn off random parts of the optimizer... It probably isn't a good idea to both change the default setting and remove the parameter in the same release. It would be awesome if this is cheap enough now to have it on by default - but constraints are pretty useful for maintaining data integrity, so it's conceivable to me that someone could have a lot of constraints most of which are unuseful for query planning. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Gregory Stark wrote: My point was more that you could have a data warehouse on a non-dedicated machine, you could have a web server on a non-dedicated machine, or you could have a mixed server on a non-dedicated machine. I should just finish the documentation, where there will be a big disclaimer saying THESE SETTINGS ASSUME A SERVER DEDICATED TO POSTGRESQL! That's the context here. Why, after you follow my tuning instructions, you're lucky if the server will run anything but the database afterwards. Josh's logic is impeccable -- for the specific use case he's describing of a truly dedicated server with enough disk space for a major production database. But not every install is going to have gigabytes of space reserved for it and not every admin is going to realize that he really should set aside gigabytes of space even though he only expects his database to be a few megabytes. It's really quite simple. Josh and I don't care directly about disk space used by the WAL for people with trivial databases. At all. Whatsoever. Maybe once, long ago, when we were young and frugal and skinny[1]; not now, or probably ever again the future. If that's your concern, maybe there can be some companion utility named pgmiser that lowers parameters back down again. Your mascot can be some sort of animal that efficiently lives off small scraps of food or something.[2] The context here is pgtune, which is aiming to make a fat elephant of a server faster so that there's an answer to people who say My benchmarks are all running really slow, is this because my system with 16PT of RAM is only using 32MB of it for the database? This sucks, I'm going back to Oracle which used all my RAM. If there are people who instead think, hey, I'll run this tuning utility to make my database faster, then it will also be a lot smaller!, maybe we can find a class about space/time tradeoffs in algorithm design to send them to or something.[3] There are exactly two important things here. The first is how large checkpoint_settings needs to be in order to for the considerable overhead of checkpoints to be bearable. That drives the setting up. Our super-fat DW application gets set to at least 64 so that when you bulk-load another TB of data into it, that doesn't get bottlenecked dumping gigabytes of dirty buffers every few seconds. If the database crashes and recovery reads or writes a bunch of data, who cares about random writes because your SAN has a 4GB write cache on it and dozens of drives slaving away. Driving the setting down is knowing how much time you'll have to wait for recovery to happen, which is really a measure of what your tolerance for downtime is. We're thinking that someone who picks the Desktop tuning may have no tolerance for the database to be sluggish coming back up after Windows crashed and they rebooted, so tiny setting for them to make recovery super fast. Everybody else in our sample profiles fall in the middle of those two extremes, which is why the values curve the way they do. Web app? Probably not a lot of write volume, probably trouble if it's down a long time; how about 8, on the low side, but it gives checkpoints more time to spread out their I/O so worst-case latency isn't as bad. That's the sort of analysis those numbers come from. Do performance tuning and juggle these trade-offs for long enough for new people all the time, you get a gut feel for the right ballpark an app should start at based on its type. The whole idea behind this tool is that we're taking some of that hard-won knowledge and trying to automate the distribution of it. It's great that Postgres has such great documentation but whenever we have the chance to replace something with an option which doesn't need any documentation that would be even better. I'm just exploring whether that's an option here. I would be glad to have a post-CommitFest discussion of this very topic as it's quite a pain to me in its current form. Just not right now because it's too late to touch it. Nobody's even tried to do this side of things before. They always got bogged down in trying to parse config files and such. It's actually because most of them were working in Perl, which encourages deviant behavior where people delight in converting useful ideas into illegible punctuation rather than actually getting anything done. Except for that other Greg around here who's not involved in this discussion, his Perl is pretty good. [1] Josh is being aggressively bulked up right now for his next sumo match. [2] Like a rat, which would give you an excuse to add the long overdue PL/Ratfor. [3] This wouldn't actually help them learn anything, but it would make their heads explode at which point all their problems are gone. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
Re: [HACKERS] Simple postgresql.conf wizard
I think the tests you could consider next is to graph the target going from 10 to 100 in steps of 10 just for those 5 queries. If it gradually degrades, that's interesting but hard to nail down. But if there's a sharp transition, getting an explain plan for the two sides of that should provide some insight. I'm really more interested in the ones that slowed down than the one that improved, understanding that might finally provide some evidence against increasing it by default. I've updated the charts to include results from setting default_statistics_target from 20-90. The links to the charts are the same. The links to the raw data are in http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52. I still think we're missing the boat here because it's not really the same query every time. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Mark Wong wrote: http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png I've updated the charts to include results from setting default_statistics_target from 20-90. Interesting. Q2: Explodes hard when the target hits 100, but not before. Would be interesting to compare the explain plans at, say, 50 and 150 to see if it's possible to nail down what is shifting so much. Q3: goes bonkers as soon as the target hits 20, so the interesting plans to compare are 10 and 20. Q17: our one from the original set that improved a bunch with the larger target gets all that behefit just from going to 20. Would be interesting to compare the plans at 10 and 20 to see what changed so much with such a small difference. Q18: looks like it was in some sort of local bad area around 100-400 before, with some more context that one doesn't look interesting anymore. Q20: also doesn't look very interesting anymore. The results at 10 were nice, and the ones at 100 were among the unusuallly bad ones, but it's pretty random--if there was something inherantly bad related to the stats, there wouldn't be low points around 200. Out of those, the most interesting one to me (as someone who is trying to defend raising the target some but not going crazy with that) is Q3. The reason I say that is that everything else is better or basically the same raising the target from 10, as long as you don't go too high (=100). That one falls apart immediately with a larger target which seems weird. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Alvaro Herrera wrote: Gregory Stark escribió: Joshua D. Drake [EMAIL PROTECTED] writes: I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. Well my phone has 16G of RAM, why not 1 ? I don't think the disk space used is the only consideration here. You also have to keep recovery time in mind. If you set it to 1000, recovery would take way too long. Presumably if you set checkpoint_segments to a high value, you'd use checkpoint_timeout to limit recovery time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: ... where the Power Test seems to oscillate between degrees of good and bad behavior seemingly at random. Are any of the queries complicated enough to trigger GEQO planning? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Dann Corbit [EMAIL PROTECTED] writes: I also do not believe that there is any value that will be the right answer. But a table of data might be useful both for people who want to toy with altering the values and also for those who want to set the defaults. I guess that at one time such a table was generated to produce the initial estimates for default values. Sir, you credit us too much :-(. The actual story is that the current default of 10 was put in when we first implemented stats histograms, replacing code that kept track of only a *single* most common value (and not very well, at that). So it was already a factor of 10 more stats than we had experience with keeping, and accordingly conservatism suggested not boosting the default much past that. I think that's actually too little credit. The sample size is chosen quite carefully based on solid mathematics to provide a specific confidence interval estimate for queries covering ranges the size of a whole bucket. The actual number of buckets more of an arbitrary choice. It depends entirely on how your data is distributed and how large a range your queries are covering. A uniformly distributed data set should only need a single bucket to generate good estimates. Less evenly distributed data sets need more. I wonder actually if there are algorithms for estimating the number of buckets needed for a histogram to achieve some measurable goal. That would close the loop. It would be much more reassuring to base the size of the sample on solid statistics than on hunches. I have a few thoughts on this. First, people are correct that there is no perfect default_statistics_target value. This is similar to the problem with the pre-8.4 max_fsm_pages/max_fsm_relations, for which there also was never a perfect value. But, if the FSM couldn't store all the free space, a server log message was issued that recommended increasing these values; the same is still done for checkpoint_segments. Is there a way we could emit a server log message to recommend increasing the statistics targets for specific columns? Also, is there a way to increase the efficiency of the statistics targets lookups? I assume the values are already sorted in the pg_statistic arrays; do we already do a binary lookup on those? Does that help? -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg, On Mon, Dec 1, 2008 at 3:17 AM, Greg Smith [EMAIL PROTECTED] wrote: ./pgtune -i ~/data/postgresql.conf First, thanks for your work: it will really help a lot of people to have a decent default configuration. A couple of comments from reading the code (I didn't run it yet): - it would be really nice to make it work with Python 2.4 as RHEL 5 is a Python 2.4 thing and it is a very widespread platform out there, - considering the audience of this tool, I think you should explain in the usage text which type of workload implies each database type (DW, OLTP, Web, Mixed, Desktop). - not sure of the wording of The file this needs to operate correctly can be generated with: - it would be nice to be able to define the architecture (32-64 bits) from the command line (especially considering I won't be able to run it on our target boxes which are all RHEL 5 :)) I'll see if I have more feedback while testing it for real. -- Guillaume -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, Dec 1, 2008 at 3:21 AM, Greg Smith [EMAIL PROTECTED] wrote: On Sun, 30 Nov 2008, Greg Smith wrote: Memory detection works on recent (=2.5) version of Python for Windows now. I just realized that the provided configuration is really not optimal for Windows users because of the known limitations that prevent larger shared_buffers settings from being effective on that platform. I know there are some notes on that subject in the archives that I'll look though, but I'd appreciate a suggestion for what a good upper-limit for that setting is on Windows. I also wonder whether any of the other parameters have similar restrictions on their useful range. It's going to be of little use to 99% of Windows users anyway as it's written in Python. What was wrong with C? FWIW though, in some pgbench tests on XP Pro, on a 4GB machine, 512MB seemed to be consistently the most effective size (out of tests on 32MB, 512MB and 1GB). There wasn't much between 32 and 512 though - my suspicion is that 128 or 256 would be similarly effective. I didn't have time to test that though. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Dave Page wrote: On Mon, Dec 1, 2008 at 3:21 AM, Greg Smith [EMAIL PROTECTED] wrote: On Sun, 30 Nov 2008, Greg Smith wrote: Memory detection works on recent (=2.5) version of Python for Windows now. I just realized that the provided configuration is really not optimal for Windows users because of the known limitations that prevent larger shared_buffers settings from being effective on that platform. I know there are some notes on that subject in the archives that I'll look though, but I'd appreciate a suggestion for what a good upper-limit for that setting is on Windows. I also wonder whether any of the other parameters have similar restrictions on their useful range. It's going to be of little use to 99% of Windows users anyway as it's written in Python. What was wrong with C? It could be shipped with a bundled python, I guess. Python is a lot more common in these scenarios than any of the other unixy languages - much more common than perl for example - on Windows in my experience. It could be because it's easier to bundle or something? FWIW though, in some pgbench tests on XP Pro, on a 4GB machine, 512MB seemed to be consistently the most effective size (out of tests on 32MB, 512MB and 1GB). There wasn't much between 32 and 512 though - my suspicion is that 128 or 256 would be similarly effective. I didn't have time to test that though. That's about what I've heard around as well - I don't think I've ever heard of a case where 512 has actually helped. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
I just gave this a try and got: $ ./pgtune Traceback (most recent call last): File ./pgtune, line 20, in module from ctypes.wintypes import * File /usr/lib/python2.5/ctypes/wintypes.py, line 21, in module class VARIANT_BOOL(_SimpleCData): ValueError: _type_ 'v' not supported This is FC7, installed RPM is python-2.5-15.fc7. (This is also my #1 beef with Python: too many utterly impenetrable error messages... or at least impenetrable to me.) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, 1 Dec 2008, Dave Page wrote: It's going to be of little use to 99% of Windows users anyway as it's written in Python. What was wrong with C? It's 471 lines of Python code that leans heavily on that language's Dictionary type to organize everything. Had I insisted on writing directly to C first, it would have be of no use to 100% of all users in the 8.4 timeframe because it wouldn't be even remotely close to finished by now. I'd ultimately like to use the Python version as a spec to produce a C implementation, because that's the only path to get something like this integrated into initdb itself. There were just too many thing to get under control for that to practical just yet. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: I'd ultimately like to use the Python version as a spec to produce a C implementation, because that's the only path to get something like this integrated into initdb itself. It won't get integrated into initdb in any case: a standalone tool is the correct thing. Or do you think people should have to initdb in order to get fresh advice after changes in the environment? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, 1 Dec 2008, Robert Haas wrote: I just gave this a try and got: $ ./pgtune Traceback (most recent call last): File ./pgtune, line 20, in module from ctypes.wintypes import * File /usr/lib/python2.5/ctypes/wintypes.py, line 21, in module class VARIANT_BOOL(_SimpleCData): ValueError: _type_ 'v' not supported Looks like I need to add Python 2.5+Linux to my testing set. I did not expect that the UNIX distributions of Python 2.5 would ship with wintypes.py at all. I think I can fix this on the spot though. On line 40, you'll find this bit: except ImportError: Change that to the following: except ImportError,ValueError: And it should pass that point. If it doesn't, you can try the completely general: except: And then it should be OK. I should probably use that formulation in that particular section. (This is also my #1 beef with Python: too many utterly impenetrable error messages... or at least impenetrable to me.) That one made sense to me, but that is one of the sharper parts of its learning curve. I think part of the problem is that much of the tutorial Python code out there never gets into error handling. One of the things Java does that I miss is that if you create a try block, it won't work until you have addressed every possible exception the code in there can generate. It may be impractical for Python to know that much, but I wonder if a lint-ish tool for the language might be able to do it. The above mumbling is certainly off-topic for this list though. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, 1 Dec 2008, Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: I'd ultimately like to use the Python version as a spec to produce a C implementation, because that's the only path to get something like this integrated into initdb itself. It won't get integrated into initdb in any case: a standalone tool is the correct thing. Or do you think people should have to initdb in order to get fresh advice after changes in the environment? The long-term goal Josh suggested aiming for (that I abandoned for now) was replacing the part of initdb that creates the initial postgresql.conf via the sample with a call to a program that generates it instead. Said program could also run standalone for later updates to the configuration. I have no interest in arguing about whether that's a good or bad idea at this point, I've accepted that the only useful statements to be made in this area come in the form of a working implementation to review. Until I have one of those, this really isn't worth getting into. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, Nov 13, 2008 at 11:53 AM, Tom Lane [EMAIL PROTECTED] wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: A lot of people have suggested raising our default_statistics target, and it has been rejected because there's some O(n^2) behavior in the planner, and it makes ANALYZE slower, but it's not that crazy. I think everyone agrees it ought to be raised. Where the rubber meets the road is deciding just *what* to raise it to. We've got no convincing evidence in favor of any particular value. If someone actually wanted to put some effort into this, I'd suggest taking some reasonably complex benchmark (maybe TPCH or one of the DBT series) and plotting planner runtime for each query as a function of statistics_target, taking care to mark the breakpoints where it shifted to a better (or worse?) plan due to having better stats. Hi all, I have some data from a mostly un-tuned system. I have a 10GB scale factor using DBT-3 on a single 25 disk hardware RAID-0 lun. Generally speaking, it seems like on un-tuned systems increasing the default_statistics_target for this workload doesn't have a clear benefit. Here is a histogram of the Power Test and Throughput Test with default GUC values. The Power and Throughput test results are plotted against the default_statistics_target value. For reminders, the Power Test executes each query serially, while the Throughput Test executes 8 streams of the 22 queries simultaneously. Default values: http://207.173.203.223/~markwkm/pgsql/default_statistics_target/default.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 1 - 11. I started increasing the shared_buffers and effective_cache_size to see if there might be anything more interesting in the results, but still don't think so. shared_buffers=2048MB, effective_cache_size=1024MB http://207.173.203.223/~markwkm/pgsql/default_statistics_target/2048x1024.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 12 - 22. - shared_buffers=4096MB, effective_cache_size=2048MB http://207.173.203.223/~markwkm/pgsql/default_statistics_target/4096x2048.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 23 - 33. - shared_buffers=20480MB effective_cache_size=10240MB checkpoint_segments=3000 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/20480x10240.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 34 - 44. So then I attempted to see if there might have been difference between the executing time of each individual query with the above parameters. The queries that don't seem to be effected are Q1, Q4, Q12, Q13, and Q15. Q17 suggests that anything higher than default_statistics_target=10 is an improvement. The rest of the queries appears not to follow any particular trend with respect to default_statistics_target. Here's query 17 when default_statistics_target=10 EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#52' and p_container = 'SM PACK' and l_quantity ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); QUERY PLAN -- Aggregate (cost=15725486.01..15725486.02 rows=1 width=4) (actual time=31669.249..31669.249 rows=1 loops=1) - Hash Join (cost=70662.08..15725435.96 rows=20019 width=4) (actual time=690.551..31666.031 rows=5413 loops=1) Hash Cond: (public.lineitem.l_partkey = part.p_partkey) Join Filter: (public.lineitem.l_quantity (subplan)) - Seq Scan on lineitem (cost=0.00..1668441.22 rows=60052722 width=12) (actual time=0.007..11016.720 rows=60046144 loops=1) - Hash (cost=70637.08..70637.08 rows=2000 width=4) (actual time=548.192..548.192 rows=1986 loops=1) - Seq Scan on part (cost=0.00..70637.08 rows=2000 width=4) (actual time=0.066..547.106 rows=1986 loops=1) Filter: ((p_brand = 'Brand#52'::bpchar) AND (p_container = 'SM PACK'::bpchar)) SubPlan - Aggregate (cost=227.86..227.87 rows=1 width=4) (actual time=0.141..0.141 rows=1 loops=59857) - Index Scan using i_l_partkey on lineitem (cost=0.00..227.72 rows=54 width=4) (actual time=0.071..0.132 rows=31 loops=59857) Index Cond: (l_partkey = $0) Total runtime: 31669.340 ms (13 rows) Here's query 17 when default_statistics_target=1000 EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#21' and p_container = 'LG BAG' and l_quantity ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); QUERY PLAN
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, 1 Dec 2008, Mark Wong wrote: So then I attempted to see if there might have been difference between the executing time of each individual query with the above parameters. The queries that don't seem to be effected are Q1, Q4, Q12, Q13, and Q15. Q17 suggests that anything higher than default_statistics_target=10 is an improvement. The rest of the queries appears not to follow any particular trend with respect to default_statistics_target. The interesting ones are Q2, Q9, Q17, Q18, and Q20, and that data is much more useful than the summary. As you mention, Q17 improves significantly with a higher target. All of the rest are dramatically slower in one or both tests going from default_statistics_target=10 to 100. Those look like the most useful data points on the X axis--the increases from 100 up to 1000 aren't particularly interesting in most of these, except in Q20 where the Power Test seems to oscillate between degrees of good and bad behavior seemingly at random. My picks for the most useful graphs from the long list Mark sent: http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png I think the tests you could consider next is to graph the target going from 10 to 100 in steps of 10 just for those 5 queries. If it gradually degrades, that's interesting but hard to nail down. But if there's a sharp transition, getting an explain plan for the two sides of that should provide some insight. I'm really more interested in the ones that slowed down than the one that improved, understanding that might finally provide some evidence against increasing it by default. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Hi all, I have some data [...] Thanks for gathering this data. The first thing I notice is that the two versions of Q17 that you are running are actually not the exact same query - there are hard-coded constants that are different in each case, and that matters. The substituted parameter doesn't affect the plan, but the constants that are actually in the query do. It seems that some of the other queries may be similar - for example, Q2 has a couple of very fast runs (with default_stats_target 10 and 600) that probably got a better plan than the other runs, and Q6, Q9, and Q20 look suspiciously like there may have been two different plans in the mix as well, presumably because the actual queries varied somewhat. I think the only way to fix this is to run the test a large number of times with each DST and average. :-( Q17 looks like a pretty illustrative example of how a higher statistics target can (potentially) help: it enables the planner to realize that a qual on the part table is highly selective, and therefore switch to a nested loop w/index-scan instead of a hash join. (This may not be the only case, but I can't think of the others right now.) I haven't actually looked at the selectivity estimation code, but I'm assuming that if we have n MCVs then we can estimate that any non-MCV occurs with frequency 1/n (in fact, the frequency of the least-frequent MCV, but 1/n at most). So we want n to be large enough that 1/n is below the cutoff for switching to an index scan (otherwise, we'll end up using the hash join even when the qual selects an extremely infrequent value). It might be helpful to figure out where that cutoff is and what factors it depends on. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Tue, 18 Nov 2008, Josh Berkus wrote: Regarding the level of default_stats_target, it sounds like people agree that it ought to be raised for the DW use-case, but disagree how much. If that's the case, what if we compromize at 50 for mixed and 100 for DW? That's what I ended up doing. The attached version of this script and its data files (I dumped all the useful bits in the current HEAD pg_settings for it to use) now hits all of the initial goals I had for a useful working tool here. Here's a sample run on a system with 8GB of RAM and the default mixed workload. I told the tool absolutely nothing: ./pgtune -i ~/data/postgresql.conf And it came up with a decent mixed starter configuration for this system appended to the input postgresql.conf: default_statistics_target = 50 # pg_generate_conf wizard 2008-11-30 maintenance_work_mem = 480MB # pg_generate_conf wizard 2008-11-30 constraint_exclusion = on # pg_generate_conf wizard 2008-11-30 checkpoint_completion_target = 0.9 # pg_generate_conf wizard 2008-11-30 effective_cache_size = 5632MB # pg_generate_conf wizard 2008-11-30 work_mem = 48MB # pg_generate_conf wizard 2008-11-30 wal_buffers = 8MB # pg_generate_conf wizard 2008-11-30 checkpoint_segments = 16 # pg_generate_conf wizard 2008-11-30 shared_buffers = 1920MB # pg_generate_conf wizard 2008-11-30 max_connections = 80 # pg_generate_conf wizard 2008-11-30 I do plan to make that look a bit prettier. Loose ends -- -Kernel shm suggestions. The main hard part there is implementing the formula from the documentation to figure out the total expected shared block size that will be allocated. That just needs another block of time to finish, will fit that in this week. -Providing hints (to stderr) for things that fall somewhere between changes to be made automatically and just things to put into the docs. I want to warn about the implications of the default listen_addresses for example. -Documentation. I've got the whole outline sketched out and some text written, just haven't had time yet to finish implementing that as Docbook. -Tests on more platforms (this was developed on Linux). Memory detection works on recent (=2.5) version of Python for Windows now. I want to do some more tests there, on Mac OS X, and on Solaris. Some *BSD testing would also be in order once I'm outputting shm suggestions, I don't have any systems to test that platform myself. Other than the docs and testing, the rest of these are really optional anyway. What is completely done is the tuning model itself and the inputs it needs. I made some small tweaks to what Josh suggested, documented below, and consider that frozen from my side except for review feedback. I'll get everything mentioned above done next weekend, and expect to submit something of commit candidate quality at that time. Naming and promotion Now that the finish line is in sight, I thought a bit about what to name this thing. The original idea Josh threw out was based on the idea that this would generate the postgresql.conf file from scratch, which may still happen eventually but is not a goal for this release. This really just a tuning tool. When searching for possible names that might fit, one of the ones I thought of was already suggested quite some time ago, in the original discussion that led to the TODO item I'm trying to complete: http://archives.postgresql.org/pgsql-advocacy/2003-02/msg00085.php Since that was the only claim I could find on the name, I've changed the name on this new version to pgtune and would propose that as its final name in contrib. I'd like to add some text mentioning the availability of this tool to the sample postgresql.conf, and I'd certainly be in favor of the suggestion from that old message that a note suggesing you run it should even show up in the output from initdb. Tuning model The actual tuning part of the code is exactly 83 lines right now and pretty easy to read, the rest is mainly text and GUC-related baggage. For anyone who wants to dig into the model itself, here are the other changes I made from what Josh has suggested. -Since I'm focused right now just on getting a good 8.4 version of this tool running, I dropped any attempt at tuning the FSM parameters since they're gone. With that change, there's no need to even ask about database size anymore. I'll put something related to that in a future release that also targets 8.3. -The tool doesn't do anything memory-related if you have less than 256MB of RAM. I wasn't sure how far down this model scaled and may revisit that cut-off. -I was uncomfortable turning off auto-vacuum in the DW case, for fear it would cause trouble for a newbie admin who happens to be stuck with such a job. Instead I'm putting a section in the documentation about the parameters that might be turned by hand beyond what the tool does. That can mention that further
Re: [HACKERS] Simple postgresql.conf wizard
On Sun, 30 Nov 2008, Greg Smith wrote: Memory detection works on recent (=2.5) version of Python for Windows now. I just realized that the provided configuration is really not optimal for Windows users because of the known limitations that prevent larger shared_buffers settings from being effective on that platform. I know there are some notes on that subject in the archives that I'll look though, but I'd appreciate a suggestion for what a good upper-limit for that setting is on Windows. I also wonder whether any of the other parameters have similar restrictions on their useful range. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Sun, Nov 30, 2008 at 09:17:37PM -0500, Greg Smith wrote: That's what I ended up doing. The attached version of this script and its data files (I dumped all the useful bits in the current HEAD pg_settings for it to use) now hits all of the initial goals I had for a useful working tool here. Here's a sample run on a system with 8GB of RAM and the default mixed workload. I told the tool absolutely nothing: ./pgtune -i ~/data/postgresql.conf Looks very nice. shared_buffers = 1920MB # pg_generate_conf wizard 2008-11-30 Do you have a check somewhere to see if this exceeds the total SYSV memory allowed by the OS. Otherwise you've just output an unstartable config. The output of /sbin/sysctl should tell you. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, 1 Dec 2008, Martijn van Oosterhout wrote: Do you have a check somewhere to see if this exceeds the total SYSV memory allowed by the OS. Otherwise you've just output an unstartable config. The output of /sbin/sysctl should tell you. Something to address that is listed as the first thing in the Loose Ends section of the message. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Even though we all agree default_statistics_target = 10 is too low, proposing a 40X increase in the default value requires more evidence than this. In particular, the prospect of a 1600-fold increase in the typical cost of eqjoinsel() is a mite scary. I just did some very quick testing of a couple values for default_statistics_target on a database for work whose contents I am not at liberty to share. Suffice it to say it is a database with small tables, but a lot of the queries involve many joins. ANALYZE with default_statistics_target set to 10 takes 13 s. With 100, 92 s. With 1000, 289 s. I tried several representative queries and there was no measurable change in run-time. I compared a couple of the plans and the plans didn't change either. So then I said - gee, how much can I reduce this? I looked at two of the queries in more detail. The lowest default_statistics_target that produced the same plan for both queries was 5. Reducing the default_statistics_target all the way down to 1 changed the plans, but the new plans were just as fast as the old plans. Given the amount of clamor for a higher value for default_statistics_target, I'm a little surprised by these results. It may be that the queries I'm running are not ones for which more statistics generate better plans, but I think they really are representative of what we run. Maybe someone could suggest some types of query that would be likely to helped by better statistics? A random thought: maybe the reason I'm not seeing any benefit is because my tables are just too small - most contain at most a few thousand rows, and some are much smaller. Maybe default_statistics_target should vary with the table size? Something like, 0.1% of the rows to a maximum of 100... and then 0.01% of the rows after that to some higher maximum but always a minimum of at least 10. I'm guessing that people with really big tables are less likely to mind longer planning times and more likely to benefit from finding better plans... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, Nov 27, 2008 at 05:15:04PM -0500, Robert Haas wrote: A random thought: maybe the reason I'm not seeing any benefit is because my tables are just too small - most contain at most a few thousand rows, and some are much smaller. Maybe default_statistics_target should vary with the table size? Something like, 0.1% of the rows to a maximum of 100... and then 0.01% of the rows after that to some higher maximum but always a minimum of at least 10. I'm guessing that people with really big tables are less likely to mind longer planning times and more likely to benefit from finding better plans... Something like this makes an awful lot of sense to me. A higher statistics target (in theory) better captures the fine details of a distribution of values; tables with fewer rows are unlikely to demonstrate fine details (as are tables with categorical, as opposed to continuous, values). Returning to the original topic of a configuration creator, perhaps such a tool should ask the user how big (s)he expects the tables to grow, or perhaps it should be able to investigate a particular column and recommend a statistics target based on its distribution, size, etc. - Josh / eggyknap signature.asc Description: Digital signature
Re: [HACKERS] Simple postgresql.conf wizard
Robert Haas [EMAIL PROTECTED] writes: ANALYZE with default_statistics_target set to 10 takes 13 s. With 100, 92 s. With 1000, 289 s. That is interesting. It would also be interesting to total up the time it takes to run EXPLAIN (without ANALYZE) for a large number of queries. I did start looking at running the experiment Tom was asking for with DBT3 data sets but, uh, the most recent dbt-3 data generation source I could find was four years old. Is there an active source repository for this stuff? Given the amount of clamor for a higher value for default_statistics_target, I'm a little surprised by these results. It may be that the queries I'm running are not ones for which more statistics generate better plans, but I think they really are representative of what we run. Maybe someone could suggest some types of query that would be likely to helped by better statistics? I think there are a lot of variables here. One is the distribution of the data. Your data sounds like it's reasonably evenly distributed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, Nov 27, 2008 at 05:15:04PM -0500, Robert Haas wrote: [...] Maybe default_statistics_target should vary with the table size? Something like, 0.1% of the rows to a maximum of 100... and then 0.01% of the rows after that to some higher maximum but always a minimum of at least 10. I'm guessing that people with really big tables are less likely to mind longer planning times and more likely to benefit from finding better plans... What was the running time of the planner? Quadratic in statistics_target? Take the square root of table size. Exponential? The logarithm. Or something like that. My 1.75 cent - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJL6HyBcgs9XrR2kYRAqfzAJ47w720VJLHVXeS8WKH6zUjAHkbeACfTtDh rFq512eztVnib1ozjw9sibs= =JXyS -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: On Tue, 2008-11-25 at 20:33 -0500, Tom Lane wrote: So we really don't have any methodically-gathered evidence about the effects of different stats settings. It wouldn't take a lot to convince us to switch to a different default, I think, but it would be nice to have more than none. I don't this is not empirical but really, 150 is very reasonable. Let's just set it to that by default and be done with it. What happened to the more than zero evidence part of the discussion? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Tom Lane [EMAIL PROTECTED] writes: Dann Corbit [EMAIL PROTECTED] writes: I also do not believe that there is any value that will be the right answer. But a table of data might be useful both for people who want to toy with altering the values and also for those who want to set the defaults. I guess that at one time such a table was generated to produce the initial estimates for default values. Sir, you credit us too much :-(. The actual story is that the current default of 10 was put in when we first implemented stats histograms, replacing code that kept track of only a *single* most common value (and not very well, at that). So it was already a factor of 10 more stats than we had experience with keeping, and accordingly conservatism suggested not boosting the default much past that. I think that's actually too little credit. The sample size is chosen quite carefully based on solid mathematics to provide a specific confidence interval estimate for queries covering ranges the size of a whole bucket. The actual number of buckets more of an arbitrary choice. It depends entirely on how your data is distributed and how large a range your queries are covering. A uniformly distributed data set should only need a single bucket to generate good estimates. Less evenly distributed data sets need more. I wonder actually if there are algorithms for estimating the number of buckets needed for a histogram to achieve some measurable goal. That would close the loop. It would be much more reassuring to base the size of the sample on solid statistics than on hunches. So we really don't have any methodically-gathered evidence about the effects of different stats settings. It wouldn't take a lot to convince us to switch to a different default, I think, but it would be nice to have more than none. I think the difficulty (aside from testing being laborious at the best of times) is that it's heavily dependent on data sets which are hard to generate good examples for. Offhand I would think the census data might make a good starting point -- it should have columns which range from perfectly uniform to highly skewed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard -- Statistics idea...
On Tue, Nov 25, 2008 at 06:59:25PM -0800, Dann Corbit wrote: I do have a statistics idea/suggestion (possibly useful with some future PostgreSQL 9.x or something): It is a simple matter to calculate lots of interesting univarate summary statistics with a single pass over the data (perhaps during a vacuum full). For instance with numerical columns, you can calculate mean, min, max, standard deviation, skew, kurtosis and things like that with a single pass over the data. Calculating interesting univariate summary statistics and having something useful to do with them are two different things entirely. Note also that whereas this is simple for numeric columns, it's a very different story for non-numeric data types, that don't come from a metric space. That said, the idea of a probability metric space is well explored in the literature, and may have valuable application. The current histogram implementation is effectively a description of the probability metric space the column data live in. Now, if you store a few numbers calculated in this way, it can be used to augment your histogram data when you want to estimate the volume of a request. So (for instance) if someone asks for a scalar that is value you can look to see what percentage of the tail will hang out in that neck of the woods using standard deviation and the mean. Only if you know that the data follow a distribution that can be described accurately with a standard deviation and a mean. If your data don't follow a Gaussian distribution, this will give you bad estimates. - Josh / eggyknap signature.asc Description: Digital signature
Re: [HACKERS] Simple postgresql.conf wizard -- Statistics idea...
On Nov 25, 2008, at 8:59 PM, Dann Corbit wrote: It is a simple matter to calculate lots of interesting univarate summary statistics with a single pass over the data (perhaps during a vacuum full). I don't think that the problem we have is how to collect statistics (well, except for cross-field stuff); the problem is what to actually do with them. What we need people to look at is how we can improve query plan estimates across the board. Row count estimates, page access estimates, the cost estimates for accessing those pages, etc. This isn't a coding problem, it's an algorithm problem. It needs someone with an advanced (if not expert) grasp of statistics who can come up with better ways of estimating these things. So, if you have a statistics hammer to wield, I think you'll find a lot of nails sticking up in the planner code. Hammer on those before worrying about additional stats to collect. :) -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote: The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. I certainly have seen plan times be a problem. I wonder if you have too and just didn't realize it. With a default_stats_target of 1000 you'll have hundreds of kilobytes of data to slog through to plan a moderately complex query with a few text columns. Forget about prepared queries, I've seen plan times be unusable for ad-hoc interactive queries before. Can you provide any examples? And no, I've never seen a system where a few milliseconds of plan time difference would pose a problem. I'm not saying they don't exist, only that I haven't seen them (including 2 years working as a consultant). I'll also make the argument that anyone with a system that does have those kind of requirements will have also needed to actually tune their config, and tune it well. I can't see them being bothered by having to set one more parameter. There are a lot of systems that are being impacted by our ultra-low stats target, and a lot of those don't necessarily need a lot of hand tuning beyond the stats target. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Decibel! [EMAIL PROTECTED] writes: On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote: The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. I certainly have seen plan times be a problem. I wonder if you have too and just didn't realize it. With a default_stats_target of 1000 you'll have hundreds of kilobytes of data to slog through to plan a moderately complex query with a few text columns. Forget about prepared queries, I've seen plan times be unusable for ad-hoc interactive queries before. Can you provide any examples? At the time I couldn't understand what the problem was. In retrospect I'm certain this was the problem. I had a situation where just running EXPLAIN took 5-10 seconds. I suspect I had some very large toasted arrays which were having to be detoasted each time. IIRC I actually reloaded the database with pg_dump and the problem went away. And no, I've never seen a system where a few milliseconds of plan time difference would pose a problem. I'm not saying they don't exist, only that I haven't seen them (including 2 years working as a consultant). How many milliseconds does it take to read a few hundred kilobytes of toasted, compressed data? These can easily be more data than the actual query is going to read. Now ideally this will all be cached but the larger the data set the less likely it will be. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Decibel! [EMAIL PROTECTED] wrote: On Nov 25, 2008, at 7:06 PM, Gregory Stark wrote: The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. I certainly have seen plan times be a problem. I wonder if you have too and just didn't realize it. With a default_stats_target of 1000 you'll have hundreds of kilobytes of data to slog through to plan a moderately complex query with a few text columns. Forget about prepared queries, I've seen plan times be unusable for ad-hoc interactive queries before. Can you provide any examples? And no, I've never seen a system where a few milliseconds of plan time difference would pose a problem. When we first brought the statewide circuit court data onto PostgreSQL, on some early version of 8.1, we tried boosting the statistics targets for a few dozen important columns, and had to back off because of plan times up in the 20 to 30 second range. I hadn't tried it lately, so I just gave it a go with switching from a default statistics target of 10 with no overrides to 1000. The plan time for a fairly complex query which is run over 300,000 times per day went from 55 ms to 315 ms; however, with the particular search criteria I used (which I knew to be challenging) the run time went from something which exceeded my patience tolerance for the test (over two minutes) to two seconds, so a better plan was definitely found. I'm not sure what this suggests in terms of a good default value, but just to put some numbers out there from a real-world application -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Kevin Grittner [EMAIL PROTECTED] wrote: I hadn't tried it lately, so I just gave it a go with switching from a default statistics target of 10 with no overrides to 1000. Oh, this was on 8.2.7, Linux, pretty beefy machine. Do you want the whole set of config info and the hardware specs, or would that just be clutter? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Nov 19, 2008, at 11:51 PM, Tom Lane wrote: Dann Corbit [EMAIL PROTECTED] writes: I think the idea that there IS a magic number is the problem. No amount of testing is ever going to refute the argument that, under some other workload, a different value might better. But that doesn't amount to a reason to leave it the way it is. Perhaps a table of experimental data could serve as a rough guideline. The problem is not that anyone wants to leave it the way it is. The problem is that no one has done even a lick of work to identify a specific number that is demonstrably better than others -- on *any* scale. How about fewer complaints and more effort? Is there even a good way to find out what planning time was? Is there a way to gather that stat for every query a session runs? The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. Anyone who actually cares that much about plan time is certainly going to use prepared statements, which makes the whole plan time argument moot (plan time, not parse time, but of course stats_target doesn't impact parsing at all). What I *have* seen, on many different databases, was problems with bad plans due to default_stats_target being too low. Most of the time this was solved by simply setting them to 1000. The only case where I backed down from that and went with like 100 was a database that had 150k tables. We've been talking about changing default_stats_target for at least 2 or 3 years now. We know that the current value is causing problems. Can we at least start increasing it? 30 is pretty much guaranteed to be better than 10, even if it's nowhere close to an ideal value. If we start slowly increasing it then at least we can start seeing where people start having issues with query plan time. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Decibel! [EMAIL PROTECTED] writes: The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. That tells more about the type of queries you tend to run than about whether there's an issue in general. Anyone who actually cares that much about plan time is certainly going to use prepared statements, This is simply false. There's a significant performance hit caused by using prepared statements in many cases where the planner needs to know the parameter values in order to make good decisions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Decibel! [EMAIL PROTECTED] writes: Is there even a good way to find out what planning time was? Is there a way to gather that stat for every query a session runs? \timing explain select ... The thought occurs to me that we're looking at this from the wrong side of the coin. I've never, ever seen query plan time pose a problem with Postgres, even without using prepared statements. I certainly have seen plan times be a problem. I wonder if you have too and just didn't realize it. With a default_stats_target of 1000 you'll have hundreds of kilobytes of data to slog through to plan a moderately complex query with a few text columns. Forget about prepared queries, I've seen plan times be unusable for ad-hoc interactive queries before. We've been talking about changing default_stats_target for at least 2 or 3 years now. We know that the current value is causing problems. Can we at least start increasing it? 30 is pretty much guaranteed to be better than 10, even if it's nowhere close to an ideal value. If we start slowly increasing it then at least we can start seeing where people start having issues with query plan time. How would you see anything from doing that? We only hear from people who have problems so we only see half the picture. You would have no way of knowing whether your change has helped or hurt anyone. In any case I don't see we know that the current value is causing problems as a reasonable statement. It's the *default* stats target. There's a reason there's a facility to raise the stats target for individual columns. As Dann said, the idea that there IS a magic number is the problem. *Any* value of default_stats_target will cause problems. Some columns will always have skewed data sets which require unusually large samples, but most won't and the system will run faster with a normal sample size for that majority. The question is what value represents a good trade-off between the costs of having large stats targets -- longer analyze, more data stored in pg_statistics, more vacuuming of pg_statistics needed, longer plan times -- and the benefits of having larger stats targets -- fewer columns which need raised stats targets. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers