Re: [PERFORM] test / live environment, major performance difference
I wonder if my dump/restore routine isn't causing this issue. Seeing that I do the db development on my laptop (the fast one) and then restores it on the other two machines. I have confirmed if all the indexes are present after a restore. This is the routine: /usr/local/pgsql/bin/pg_dump -t layer mapdb | gzip layer.gz rsync --progress --rsh=ssh layer.gz [EMAIL PROTECTED]:/home/postgres/layer.gz -- /usr/local/pgsql/bin/pg_dump -t visiblelayer mapdb | gzip visiblelayer.gz rsync --progress --rsh=ssh visiblelayer.gz [EMAIL PROTECTED]:/home/postgres/visiblelayer.gz -- /usr/local/pgsql/bin/pg_dump -t style mapdb | gzip style.gz rsync --progress --rsh=ssh style.gz [EMAIL PROTECTED]:/home/postgres/style.gz -- /usr/local/pgsql/bin/pg_dump -t layertype mapdb | gzip layertype.gz rsync --progress --rsh=ssh layertype.gz [EMAIL PROTECTED]:/home/postgres/layertype.gz -- DROP TABLE visiblelayer; DROP TABLE style; DROP TABLE layer; DROP TABLE layertype; gunzip -c layertype.gz | /usr/local/pgsql/bin/psql mapdb gunzip -c style.gz | /usr/local/pgsql/bin/psql mapdb gunzip -c visiblelayer.gz | /usr/local/pgsql/bin/psql mapdb gunzip -c layer.gz | /usr/local/pgsql/bin/psql mapdb /usr/local/pgsql/bin/vacuumdb -d mapdb -z -v Craig James wrote: On 2007-06-11 Christo Du Preez wrote: I really hope someone can shed some light on my problem. I'm not sure if this is a posgres or potgis issue. Anyway, we have 2 development laptops and one live server, somehow I managed to get the same query to perform very well om my laptop, but on both the server and the other laptop it's really performing bad. One simple possibility that bit me in the past: If you do pg_dump/pg_restore to create a copy of the database, you have to ANALYZE the newly-restored database. I mistakenly assumed that pg_restore would do this, but you have to run ANALYZE explicitely after a restore. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Christo Du Preez Senior Software Engineer Mecola IT Mobile: +27 [0]83 326 8087 Skype: christodupreez Website: http://www.locateandtrade.co.za ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] test / live environment, major performance difference
Where do I set the planner settings or are you reffering to settings in postgres.conf that may affect the planner? The one badly performing laptop is the same as mine (the fast one) and the server is much more powerful. Laptops: Intel Centrino Duo T2600 @ 2.16GHz, 1.98 GB RAM Server: 2 xIntel Pentium D CPU 3.00GHz, 4 GB RAM All three systems are running Suse 10.2, with the same PosgreSQL, same configs, same databases. As far as I know, same everything. PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux) POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct 2006 USE_STATS Thanx for all the advice Dave Dutcher wrote: -Original Message- From: Christo Du Preez Sent: Monday, June 11, 2007 10:10 AM I have narrowed down the problem (I think) and it's the query planner using different plans and I haven't got a clue why. Can anyone please shed some light on this? Different plans can be caused by several different things like different server versions, different planner settings in the config file, different schemas, or different statistics. You say the server versions are the same, so that's not it. Is the schema the same? One isn't missing indexes that the other has? Do they both have the same data, or at least very close to the same data? Have you run analyze on both of them to update their statistics? Do they have the same planner settings in the config file? I would check that stuff out and see if it helps. Dave -- Christo Du Preez Senior Software Engineer Mecola IT Mobile: +27 [0]83 326 8087 Skype: christodupreez Website: http://www.locateandtrade.co.za ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] test / live environment, major performance difference
Good day, I have noticed that my server never uses indexing. No matter what I do. As an example I took a table with about 650 rows, having a parentid field with an index on parentid. EXPLAIN ANALYZE SELECT * FROM layertype where parentid = 300; On my laptop the explain analyze looks like this: Index Scan using fki_layertype_parentid on layertype (cost=0.00..8.27 rows=1 width=109) Index Cond: (parentid = 300) and on the problem server: Seq Scan on layertype (cost=0.00..20.39 rows=655 width=110) Filter: (parentid = 300) . I have dropped the index, recreated it, vacuumed the table, just about everything I could think of, And there is just no way I can get the query planner to use the index. PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux) POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct 2006 USE_STATS ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] test / live environment, major performance difference
try it with a table with 650K rows... On Tue, 2007-06-12 at 15:32 +0200, Christo Du Preez wrote: Good day, I have noticed that my server never uses indexing. No matter what I do. As an example I took a table with about 650 rows, having a parentid field with an index on parentid. EXPLAIN ANALYZE SELECT * FROM layertype where parentid = 300; On my laptop the explain analyze looks like this: Index Scan using fki_layertype_parentid on layertype (cost=0.00..8.27 rows=1 width=109) Index Cond: (parentid = 300) and on the problem server: Seq Scan on layertype (cost=0.00..20.39 rows=655 width=110) Filter: (parentid = 300) . I have dropped the index, recreated it, vacuumed the table, just about everything I could think of, And there is just no way I can get the query planner to use the index. PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux) POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct 2006 USE_STATS ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] test / live environment, major performance difference
On Jun 12, 2007, at 8:32 , Christo Du Preez wrote: I have noticed that my server never uses indexing. No matter what I do. As an example I took a table with about 650 rows, having a parentid field with an index on parentid. EXPLAIN ANALYZE SELECT * FROM layertype where parentid = 300; The planner weighs the cost of the different access methods and choses the one that it believes is lowest in cost. An index scan is not always faster than a sequential scan. With so few rows, it's probably faster for the server to read the whole table rather than reading the index and looking up the corresponding row. If you want to test this, you can set enable_seqscan to false and try running your query again. http://www.postgresql.org/docs/8.2/interactive/runtime-config- query.html#RUNTIME-CONFIG-QUERY-ENABLE Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] test / live environment, major performance difference
Christo Du Preez [EMAIL PROTECTED] writes: On my laptop the explain analyze looks like this: Index Scan using fki_layertype_parentid on layertype (cost=0.00..8.27 rows=1 width=109) Index Cond: (parentid = 300) That's not explain analyze, that's just plain explain. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] test / live environment, major performance difference
The actual table I noticed the problem has a million rows and it still doesn't use indexing Reid Thompson wrote: try it with a table with 650K rows... On Tue, 2007-06-12 at 15:32 +0200, Christo Du Preez wrote: Good day, I have noticed that my server never uses indexing. No matter what I do. As an example I took a table with about 650 rows, having a parentid field with an index on parentid. EXPLAIN ANALYZE SELECT * FROM layertype where parentid = 300; On my laptop the explain analyze looks like this: Index Scan using fki_layertype_parentid on layertype (cost=0.00..8.27 rows=1 width=109) Index Cond: (parentid = 300) and on the problem server: Seq Scan on layertype (cost=0.00..20.39 rows=655 width=110) Filter: (parentid = 300) . I have dropped the index, recreated it, vacuumed the table, just about everything I could think of, And there is just no way I can get the query planner to use the index. PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux) POSTGIS=1.2.1 GEOS=3.0.0rc4-CAPI-1.3.3 PROJ=Rel. 4.5.0, 22 Oct 2006 USE_STATS ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Christo Du Preez Senior Software Engineer Mecola IT Mobile: +27 [0]83 326 8087 Skype: christodupreez Website: http://www.locateandtrade.co.za ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] How much memory PostgreSQL is going to use?
Hi all, I have a server with 4GB of memory and I'm tweaking the PostgreSQL configuration. This server will be dedicated to run PostgreSQL so I'd like to dedicate as much as possible RAM to it. I have dedicated 1GB to shared_buffers (shared_buffers=131072) but I'm not sure if this will be the maximum memory used by PostgreSQL or additional to this it will take more memory. Because if shared_buffers is the maximum I could raise that value even more. Cheers! -- Arnau ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] test / live environment, major performance difference
From: Christo Du Preez Sent: Tuesday, June 12, 2007 2:38 AM Where do I set the planner settings or are you reffering to settings in postgres.conf that may affect the planner? Yes I'm reffering to settings in postgres.conf. I'm wondering if enable_indexscan or something got turned off on the server for some reason. Here is a description of those settings: http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html So when you move data from the laptop to the server, I see that your script correctly runs an analyze after the load, so have you run analyze on the fast laptop lately? Hopefully running analyze wouldn't make the planner choose a worse plan on the laptop, but if we are trying to get things consistant between the laptop and server, that is something I would try. If the consistancy problem really is a problem of the planner not using index scans on the server, then if you can, please post the table definition for the table with a million rows and an EXPLAIN ANALYZE of a query which selects a few rows from the table. Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] test / live environment, major performance difference
On Tue, Jun 12, 2007 at 03:32:40PM +0200, Christo Du Preez wrote: As an example I took a table with about 650 rows, having a parentid field with an index on parentid. Try a bigger table. Using an index for only 650 rows is almost always suboptimal, so it's no wonder the planner doesn't use the index. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How much memory PostgreSQL is going to use?
In response to Arnau [EMAIL PROTECTED]: Hi all, I have a server with 4GB of memory and I'm tweaking the PostgreSQL configuration. This server will be dedicated to run PostgreSQL so I'd like to dedicate as much as possible RAM to it. I have dedicated 1GB to shared_buffers (shared_buffers=131072) but I'm not sure if this will be the maximum memory used by PostgreSQL or additional to this it will take more memory. Because if shared_buffers is the maximum I could raise that value even more. Individual backend processes will allocate more memory above shared_buffers for processing individual queries. See work_mem. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] test / live environment, major performance difference
On Tue, Jun 12, 2007 at 04:11:33PM +0200, Christo Du Preez wrote: The actual table I noticed the problem has a million rows and it still doesn't use indexing Then please post an EXPLAIN ANALYZE of the query that is slow, along with the table definition and indexes. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Variable (degrading) performance
Heikki, Thanks for the response. Heikki Linnakangas wrote: Vladimir Stankovic wrote: I'm running write-intensive, TPC-C like tests. The workload consist of 150 to 200 thousand transactions. The performance varies dramatically, between 5 and more than 9 hours (I don't have the exact figure for the longest experiment). Initially the server is relatively fast. It finishes the first batch of 50k transactions in an hour. This is probably due to the fact that the database is RAM-resident during this interval. As soon as the database grows bigger than the RAM the performance, not surprisingly, degrades, because of the slow disks. My problem is that the performance is rather variable, and to me non-deterministic. A 150k test can finish in approx. 3h30mins but conversely it can take more than 5h to complete. Preferably I would like to see *steady-state* performance (where my interpretation of the steady-state is that the average throughput/response time does not change over time). Is the steady-state achievable despite the MVCC and the inherent non-determinism between experiments? What could be the reasons for the variable performance? Steadiness is a relative; you'll never achieve perfectly steady performance where every transaction takes exactly X milliseconds. That said, PostgreSQL is not as steady as many other DBMS's by nature, because of the need to vacuum. Another significant source of unsteadiness is checkpoints, though it's not as bad with fsync=off, like you're running. What I am hoping to see is NOT the same value for all the executions of the same type of transaction (after some transient period). Instead, I'd like to see that if I take appropriately-sized set of transactions I will see at least steady-growth in transaction average times, if not exactly the same average. Each chunk would possibly include sudden performance drop due to the necessary vacuum and checkpoints. The performance might be influenced by the change in the data set too. I am unhappy about the fact that durations of experiments can differ even 30% (having in mind that they are not exactly the same due to the non-determinism on the client side) . I would like to eliminate this variability. Are my expectations reasonable? What could be the cause(s) of this variability? I'd suggest using the vacuum_cost_delay to throttle vacuums so that they don't disturb other transactions as much. You might also want to set up manual vacuums for the bigger tables, instead of relying on autovacuum, because until the recent changes in CVS head, autovacuum can only vacuum one table at a time, and while it's vacuuming a big table, the smaller heavily-updated tables are neglected. The database server version is 8.1.5 running on Fedora Core 6. How about upgrading to 8.2? You might also want to experiment with CVS HEAD to get the autovacuum improvements, as well as a bunch of other performance improvements. I will try these, but as I said my primary goal is to have steady/'predictable' performance, not necessarily to obtain the fastest PG results. Best regards, Vladimir -- Vladimir Stankovic T: +44 20 7040 0273 Research Student/Research Assistant F: +44 20 7040 8585 Centre for Software Reliability E: [EMAIL PROTECTED] City University Northampton Square, London EC1V 0HB ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] VACUUM vs auto-vacuum daemon
Hi there, Using explicitly VACUUM command give me the opportunity to fine tune my VACUUM scheduling parameters, after I analyze the log generated by VACUUM VERBOSE. On the other hand I'd like to use the auto-vacuum mechanism because of its facilities. Unfortunately, after I made some initial estimations for autovacuum_naptime, and I set the specific data into pg_autovacuum table, I have not a feedback from the auto-vacuum mechanism to check that it works well or not. It would be nice to have some kind of log similar with the one generated by VACUUM VERBOSE. Is the auto-vacuum mechanism able to provide such a useful log ? TIA, Sabin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best way to delete unreferenced rows?
Craig James wrote: Tyrrill, Ed wrote: QUERY PLAN --- Merge Left Join (cost=38725295.93..42505394.70 rows=13799645 width=8) (actual time=6503583.342..8220629.311 rows=93524 loops=1) Merge Cond: (outer.record_id = inner.record_id) Filter: (inner.record_id IS NULL) - Index Scan using backupobjects_pkey on backupobjects (cost=0.00..521525.10 rows=13799645 width=8) (actual time=15.955..357813.621 rows=13799645 loops=1) - Sort (cost=38725295.93..39262641.69 rows=214938304 width=8) (actual time=6503265.293..7713657.750 rows=214938308 loops=1) Sort Key: backup_location.record_id - Seq Scan on backup_location (cost=0.00..3311212.04 rows=214938304 width=8) (actual time=11.175..1881179.825 rows=214938308 loops=1) Total runtime: 8229178.269 ms (8 rows) I ran vacuum analyze after the last time any inserts, deletes, or updates were done, and before I ran the query above. I've attached my postgresql.conf. The machine has 4 GB of RAM. I thought maybe someone with more expertise than me might answer this, but since they haven't I'll just make a comment. It looks to me like the sort of 214 million rows is what's killing you. I suppose you could try to increase the sort memory, but that's a lot of memory. It seems to me an index merge of a relation this large would be faster, but that's a topic for the experts. On a theoretical level, the problem is that it's sorting the largest table. Perhaps you could re-cast the query so that it only has to sort the smaller table, something like select a.id from a where a.id not in (select distinct b.id from b) where b is the smaller table. There's still no guarantee that it won't do a sort on a, though. In fact one of the clever things about Postgres is that it can convert a query like the one above into a regular join, unless you do something like select ... offset 0 which blocks the optimizer from doing the rearrangement. But I think the first approach is to try to tune for a better plan using your original query. Craig Thanks for the input Craig. I actually started out with a query similar to what you suggest, but the performance was days to complete back when the larger table, backup_location, was still under 100 million rows. The current query is the best performance to date. I have been playing around with work_mem, and doubling it to 128MB did result in some improvement, but doubleing it again to 256MB showed no further gain. Here is the explain analyze with work_mem increased to 128MB: mdsdb=# explain analyze select backupobjects.record_id from backupobjects left outer join backup_location using (record_id) where backup_location.record_id is null; QUERY PLAN Merge Left Join (cost=36876242.28..40658535.53 rows=13712990 width=8) (actual time=5795768.950..5795768.950 rows=0 loops=1) Merge Cond: (outer.record_id = inner.record_id) Filter: (inner.record_id IS NULL) - Index Scan using backupobjects_pkey on backupobjects (cost=0.00..520571.89 rows=13712990 width=8) (actual time=2.490..201516.228 rows=13706121 loops=1) - Sort (cost=36876242.28..37414148.76 rows=215162592 width=8) (actual time=4904205.255..5440137.309 rows=215162559 loops=1) Sort Key: backup_location.record_id - Seq Scan on backup_location (cost=0.00..3314666.92 rows=215162592 width=8) (actual time=4.186..1262641.774 rows=215162559 loops=1) Total runtime: 5796322.535 ms ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] VACUUM vs auto-vacuum daemon
Sabin Coanda wrote: Hi there, Using explicitly VACUUM command give me the opportunity to fine tune my VACUUM scheduling parameters, after I analyze the log generated by VACUUM VERBOSE. On the other hand I'd like to use the auto-vacuum mechanism because of its facilities. Unfortunately, after I made some initial estimations for autovacuum_naptime, and I set the specific data into pg_autovacuum table, I have not a feedback from the auto-vacuum mechanism to check that it works well or not. It would be nice to have some kind of log similar with the one generated by VACUUM VERBOSE. Is the auto-vacuum mechanism able to provide such a useful log ? No, sorry, autovacuum is not currently very good regarding reporting its activities. It's a lot better in 8.3 but even there it doesn't report the full VACUUM VERBOSE log. It looks like this: LOG: automatic vacuum of table alvherre.public.foo: index scans: 0 pages: 45 removed, 0 remain tuples: 1 removed, 0 remain system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec LOG: automatic analyze of table alvherre.public.foo system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] test / live environment, major performance difference
Christo Du Preez [EMAIL PROTECTED] writes: On my laptop the explain analyze looks like this: Index Scan using fki_layertype_parentid on layertype (cost=0.00..8.27 rows=1 width=109) Index Cond: (parentid = 300) OK ... and on the problem server: Seq Scan on layertype (cost=0.00..20.39 rows=655 width=110) Filter: (parentid = 300) The server thinks that every row of the table matches the WHERE clause. That being the case, it's making the right choice to use a seqscan. The question is why is the rows estimate so far off? Have you ANALYZEd the table lately? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] VACUUM vs auto-vacuum daemon
In response to Sabin Coanda [EMAIL PROTECTED]: Hi there, Using explicitly VACUUM command give me the opportunity to fine tune my VACUUM scheduling parameters, after I analyze the log generated by VACUUM VERBOSE. On the other hand I'd like to use the auto-vacuum mechanism because of its facilities. Unfortunately, after I made some initial estimations for autovacuum_naptime, and I set the specific data into pg_autovacuum table, I have not a feedback from the auto-vacuum mechanism to check that it works well or not. It would be nice to have some kind of log similar with the one generated by VACUUM VERBOSE. Is the auto-vacuum mechanism able to provide such a useful log ? Ditto what Alvaro said. However, you can get some measure of tracking my running VACUUM VERBOSE on a regular basis to see how well autovacuum is keeping up. There's no problem with running manual vacuum and autovacuum together, and you'll be able to gather _some_ information about how well autovacuum is keeping up. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] VACUUM vs auto-vacuum daemon
Hi Bill, ... However, you can get some measure of tracking my running VACUUM VERBOSE on a regular basis to see how well autovacuum is keeping up. There's no problem with running manual vacuum and autovacuum together, and you'll be able to gather _some_ information about how well autovacuum is keeping up. Well, I think it is useful just if I am able to synchronize the autovacuum to run always after I run vacuum verbose. But I don't know how to do that. Do you ? Sabin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] test / live environment, major performance difference
Yes, I have just about tried every combination of vacuum on the database. Just to make 100% sure. Tom Lane wrote: Christo Du Preez [EMAIL PROTECTED] writes: On my laptop the explain analyze looks like this: Index Scan using fki_layertype_parentid on layertype (cost=0.00..8.27 rows=1 width=109) Index Cond: (parentid = 300) OK ... and on the problem server: Seq Scan on layertype (cost=0.00..20.39 rows=655 width=110) Filter: (parentid = 300) The server thinks that every row of the table matches the WHERE clause. That being the case, it's making the right choice to use a seqscan. The question is why is the rows estimate so far off? Have you ANALYZEd the table lately? regards, tom lane -- Christo Du Preez Senior Software Engineer Mecola IT Mobile: +27 [0]83 326 8087 Skype: christodupreez Website: http://www.locateandtrade.co.za ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] test / live environment, major performance difference
Christo Du Preez [EMAIL PROTECTED] writes: Yes, I have just about tried every combination of vacuum on the database. Just to make 100% sure. Well, there's something mighty wacko about that rowcount estimate; even if you didn't have stats, the estimate for a simple equality constraint oughtn't be 100% match. What do you get from SELECT * FROM pg_stats WHERE tablename = 'layertype' on both systems? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] test / live environment, major performance difference
Fast: public;layertype;id;0;4;-1;;;{1,442,508,575,641,708,774,840,907,973,1040};0.95 public;layertype;label;0;14;-0.971429;{arch,bank,bench,canyon,gap,hill,hills,levee,mountain,mountains};{0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752};{abandoned airfield,boatyard,corridor,forest(s),intermittent lake,metro station,park headquarters,reefs,section of bank,swamp,zoo};0.107307 public;layertype;parentid;0.98797;4;2;{4,1};{0.00902256,0.00300752};;-0.142857 public;layertype;zorder;0;4;9;{0};{0.98797};{1,2,3,4,5,6,7,8};0.928955 public;layertype;description;0.100752;74;-0.888722;{a branch of a canyon or valley,a low, isolated, rounded hill,a near-level shallow, natural depression or basin, usually containing an intermittent lake, pond, or pool,a relatively shallow, wide depression, the bottom of which usually has a continuous gradient,a shore zone of coarse unconsolidated sediment that extends from the low-water line to the highest reach of storm waves,a surface-navigation hazard composed of consolidated material,a surface-navigation hazard composed of unconsolidated material};{0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752,0.00300752};{a barrier constructed across a stream to impound water,a comparatively depressed area on an icecap,a facility for pumping oil through a pipeline,a large house, mansion, or chateau, on a large estate,an area drained by a stream,an elongate (tongue-like) extension of a flat sea floor into an adjacent higher feature,a place where caravans stop for rest,a series of associated ridges or seamounts,a sugar mill no longer used as a sugar mill,bowl-like hollows partially surrounded by cliffs or steep slopes at the head of a glaciated valley,well-delineated subdivisions of a large and complex positive feature};-0.0178932 public;layertype;code;0.0135338;9;-1;;;{A.ADM1,H.HBRX,H.STMM,L.RGNL,S.BUSTN,S.HTL,S.PKLT,S.TRIG,T.MTS,U.GAPU,V.VINS};0.995628 Slow: public;layertype;id;0;4;-1;;;{1,437,504,571,638,705,772,839,906,973,1040};-0.839432 public;layertype;label;0;15;-0.965723;{arch,bank,bench,canyon,country,gap,hill,hills,levee,mountain};{0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063};{abandoned airfield,boatyard,cotton plantation,fork,intermittent oxbow lake,military installation,park headquarters,reef,second-order administrative division,swamp,zoo};-0.0551452 public;layertype;parentid;0.00745157;4;7;{300};{0.976155};{1,1,4,5,8,12};0.92262 public;layertype;zorder;0;4;8;{0};{0.971684};{1,2,3,3,5,7,7};0.983028 public;layertype;description;0.110283;74;-0.879285;{a branch of a canyon or valley,a low, isolated, rounded hill,a near-level shallow, natural depression or basin, usually containing an intermittent lake, pond, or pool,a relatively shallow, wide depression, the bottom of which usually has a continuous gradient,a shore zone of coarse unconsolidated sediment that extends from the low-water line to the highest reach of storm waves,a surface-navigation hazard composed of consolidated material,a surface-navigation hazard composed of unconsolidated material};{0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063,0.00298063};{a barrier constructed across a stream to impound water,a comparatively depressed area on an icecap,a facility for pumping water from a major well or through a pipeline,a large inland body of standing water,an area drained by a stream,an embankment bordering a canyon, valley, or seachannel,a place where diatomaceous earth is extracted,a series of associated ridges or seamounts,a sugar mill no longer used as a sugar mill,bowl-like hollows partially surrounded by cliffs or steep slopes at the head of a glaciated valley,well-delineated subdivisions of a large and complex positive feature};0.0103485 public;layertype;code;0.023845;9;-1;;;{A.ADM1,H.INLT,H.STMM,L.RNGA,S.BUSTN,S.HUT,S.PKLT,S.TRIG,T.MTS,U.GAPU,V.VINS};-0.852108 This table contains identical data. Thanx for your help Tom Tom Lane wrote: Christo Du Preez [EMAIL PROTECTED] writes: Yes, I have just about tried every combination of vacuum on the database. Just to make 100% sure. Well, there's something mighty wacko about that rowcount estimate; even if you didn't have stats, the estimate for a simple equality constraint oughtn't be 100% match. What do you get from SELECT * FROM pg_stats WHERE tablename = 'layertype' on both systems? regards, tom lane -- Christo Du Preez Senior Software Engineer Mecola IT Mobile: +27 [0]83 326 8087 Skype: christodupreez Website: http://www.locateandtrade.co.za ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Variable (degrading) performance
Vladimir Stankovic wrote: What I am hoping to see is NOT the same value for all the executions of the same type of transaction (after some transient period). Instead, I'd like to see that if I take appropriately-sized set of transactions I will see at least steady-growth in transaction average times, if not exactly the same average. Each chunk would possibly include sudden performance drop due to the necessary vacuum and checkpoints. The performance might be influenced by the change in the data set too. I am unhappy about the fact that durations of experiments can differ even 30% (having in mind that they are not exactly the same due to the non-determinism on the client side) . I would like to eliminate this variability. Are my expectations reasonable? What could be the cause(s) of this variability? You should see that if you define your chunk to be long enough. Long enough is probably hours, not minutes or seconds. As I said earlier, checkpoints and vacuum are a major source of variability. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Best use of second controller with faster disks?
On Jun 11, 2007, at 9:14 PM, Francisco Reyes wrote: RAID card 1 with 8 drives. 7200 RPM SATA RAID10 RAID card 2 with 4 drives. 10K RPM SATA RAID10 what raid card have you got? i'm playing with an external enclosure which has an areca sata raid in it and connects to the host via fibre channel. it is wicked fast, and supports a RAID6 which seems to be as fast as the RAID10 in my initial testing on this unit. What drives are you booting from? If you're booting from the 4-drive RAID10, perhaps split that into a pair of RAID1's and boot from one and use the other as the pg log disk. however, I must say that with my 16 disk array, peeling the log off the main volume actually slowed it down a bit. I think that the raid card is just so fast at doing the RAID6 computations and having the striping is a big gain over the dedicated RAID1 for the log. Right now I'm testing an 8-disk RAID6 configuration on the same device; it seems slower than the 16-disk RAID6, but I haven't yet tried 8-disk RAID10 with dedicated log yet. Besides having pg_xlog in the 10K RPM drives what else can I do to best use those drives other than putting some data in them? Iostat shows the drives getting used very little, even during constant updates and vacuum. Some of the postgresl.conf settings that may be relevant. wal_buffers = 64 checkpoint_segments = 64 i'd bump checkpoint_segements up to 256 given the amount of disk you've got dedicated to it. be sure to increase checkpoint timeout too. And if you can move to 6.2 FreeBSD you should pick up some speed on the network layer and possibly the disk I/O. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] performance drop on 8.2.4, reverting to 8.1.4
Thanks Tom and Alvaro. To follow up on this, I re-wrote and tweaked a number of queries (including the one provided) to change LEFT OUTER JOIN ... WHERE col IS NULL clauses to WHERE col NOT IN (...) clauses. This has brought performance to an acceptable level on 8.2. Thanks for your time, Steve On 6/7/07, Tom Lane [EMAIL PROTECTED] wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane escribió: I was hoping that the auto plan invalidation code in CVS HEAD would get it out of this problem, but it seems not to for the problem-as-given. The trouble is that it won't change plans until autovacuum analyzes the tables, and that won't happen until the transaction commits and sends off its I-inserted-lotsa-rows report to the stats collector. I think there is something we can do about this -- drop the default value for analyze threshold. Maybe worth doing, but it doesn't help for Steve's example. regards, tom lane
Re: [PERFORM] Best use of second controller with faster disks?
Vivek Khera writes: what raid card have you got? 2 3ware cards. I believe both are 9550SX i'm playing with an external enclosure which has an areca sata raid in it and connects to the host via fibre channel. What is the OS? FreeBSD? One of the reasons I stick with 3ware is that it is well supported in FreeBSD and has a pretty decent management program it is wicked fast, and supports a RAID6 which seems to be as fast as the RAID10 in my initial testing on this unit. My next large machine I am also leaning towards RAID6. The space different is just too big to ignore. 3ware recommends RAID6 for 5+ drives. What drives are you booting from? Booting from the 8 drive raid. If you're booting from the 4-drive RAID10, perhaps split that into a pair of RAID1's and boot from one and use the other as the pg log disk. Maybe for the next machine. however, I must say that with my 16 disk array, peeling the log off the main volume actually slowed it down a bit. I think that the raid card is just so fast at doing the RAID6 computations and having the striping is a big gain over the dedicated RAID1 for the log. Could be. Seems like RAID6 is supposed to be a good balance between performance and available space. Right now I'm testing an 8-disk RAID6 configuration on the same device; it seems slower than the 16-disk RAID6, but I haven't yet tried 8-disk RAID10 with dedicated log yet. Is all this within the same controller? i'd bump checkpoint_segements up to 256 given the amount of disk you've got dedicated to it. be sure to increase checkpoint timeout too. Thanks. Will try that. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org