Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Fri, 2007-11-09 at 16:41 +0100, Sebastian Hennebrueder wrote: If the queries are complex, this is understable. I had a performance review of a Hibernate project (Java Object Relation Mapping) using MySQL. ORM produces easily complex queries with joins and subqueries. MySQL uses nested loops for subqueries which lead to performance issues with growing database size. Even for Postgresql, nested loops are still evil and hampers performance. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Curious about dead rows.
On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote: Russell Smith [EMAIL PROTECTED] writes: It is possible that analyze is not getting the number of dead rows right? Hah, I think you are on to something. ANALYZE is telling the truth about how many dead rows it saw, but its notion of dead is not good according to SnapshotNow. Thus, rows inserted by a not-yet-committed transaction would be counted as dead. So if these are background auto-analyzes being done in parallel with inserting transactions that run for awhile, seeing a few not-yet-committed rows would be unsurprising. I wonder if that is worth fixing? I'm not especially concerned about the cosmetic aspect of it, but if we mistakenly launch an autovacuum on the strength of an inflated estimate of dead rows, that could be costly. Sounds to me like that could result in autovacuum kicking off while doing large data loads. This sounds suspiciously like problem someone on -novice was having - tripping over a windows autovac bug while doing a data load http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] autovacuum: recommended?
hi, we are moving one database from postgresql-7.4 to postgresql-8.2.4. we have some cronjobs set up that vacuum the database (some tables more often, some tables less often). now, in pg82, there is the possibility of using the autovacuum. my question is: is it recommended to use it? or in other words, should i only use autovacuum? or it's better to use manual-vacuuming? which one is the way of the future :) ? or should i use both auto-vacuum and manual-vacuum? in other words, i'd like to find out, if we should simply stay with the vacuuming-cronjobs, or should we move to using auto-vacuum? and if we should move, should we try to set it up the way that no manual-vacuuming is used anymore? thanks, gabor ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] autovacuum: recommended?
On Fri, Nov 16, 2007 at 10:40:43AM +0100, Gábor Farkas wrote: we are moving one database from postgresql-7.4 to postgresql-8.2.4. any particular reason why not 8.2.5? my question is: is it recommended to use it? or in other words, should i only use autovacuum? or it's better to use manual-vacuuming? which one is the way of the future :) ? or should i use both auto-vacuum and manual-vacuum? autovacuum is definitely prefered (for most of the cases). you might want to set vacuum delays though. depesz -- quicksil1er: postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV! :) http://www.depesz.com/ - blog dla ciebie (i moje CV) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
-Original Message- From: Ow Mun Heng Subject: Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD Even for Postgresql, nested loops are still evil and hampers performance. I don't know about that. There are times when it is the right plan: explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id = t2.id where t1.id = 'xyzzy'; QUERY PLAN Nested Loop (cost=0.00..17.65 rows=1 width=344) (actual time=0.080..0.096 rows=1 loops=1) - Index Scan using table1_pkey on table1 t (cost=0.00..9.18 rows=1 width=238) (actual time=0.044..0.048 rows=1 loops=1) Index Cond: ((id)::text = 'xyzzy'::text) - Index Scan using table2_pkey on table2 i (cost=0.00..8.46 rows=1 width=106) (actual time=0.019..0.023 rows=1 loops=1) Index Cond: (t.f_id = i.id) Total runtime: 0.224 ms set enable_nestloop=off; SET explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id = t2.id where t1.id = 'xyzzy'; QUERY PLAN Hash Join (cost=9.18..72250.79 rows=1 width=344) (actual time=13493.572..15583.049 rows=1 loops=1) Hash Cond: (i.id = t.f_id) - Seq Scan on table2 i (cost=0.00..61297.40 rows=2188840 width=106) (actual time=0.015..8278.347 rows=2188840 loops=1) - Hash (cost=9.18..9.18 rows=1 width=238) (actual time=0.056..0.056 rows=1 loops=1) - Index Scan using table1_pkey on table1 t (cost=0.00..9.18 rows=1 width=238) (actual time=0.040..0.045 rows=1 loops=1) Index Cond: ((id)::text = 'xyzzy'::text) Total runtime: 15583.212 ms (I changed the table names, but everything else is real.) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Clustered/covering indexes (or lack thereof :-)
This is probably a FAQ, but I can't find a good answer... So - are there common techniques to compensate for the lack of clustered/covering indexes in PostgreSQL? To be more specific - here is my table (simplified): topic_id int post_id int post_text varchar(1024) The most used query is: SELECT post_id, post_text FROM Posts WHERE topic_id=XXX. Normally I would have created a clustered index on topic_id, and the whole query would take ~1 disk seek. What would be the common way to handle this in PostgreSQL, provided that I can't afford 1 disk seek per record returned? -- View this message in context: http://www.nabble.com/Clustered-covering-indexes-%28or-lack-thereof-%3A-%29-tf4789321.html#a13700848 Sent from the PostgreSQL - performance mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)
adrobj wrote: This is probably a FAQ, but I can't find a good answer... So - are there common techniques to compensate for the lack of clustered/covering indexes in PostgreSQL? To be more specific - here is my table (simplified): topic_id int post_id int post_text varchar(1024) The most used query is: SELECT post_id, post_text FROM Posts WHERE topic_id=XXX. Normally I would have created a clustered index on topic_id, and the whole query would take ~1 disk seek. What would be the common way to handle this in PostgreSQL, provided that I can't afford 1 disk seek per record returned? You can cluster the table, see http://www.postgresql.org/docs/8.2/interactive/sql-cluster.html. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] autovacuum: recommended?
That being said, we have some huge tables in our database and pretty much traffic, and got quite some performance problems when the autovacuum kicked in and started vacuuming those huge tables, so we're currently running without. Autovacuum can be tuned to not touch those tables, but we've chosen to leave it off. We had some performance problems with the autovacuum on large and frequently modified tables too - but after a little bit of playing with the parameters the overall performance is much better than it was before the autovacuuming. The table was quite huge (say 20k of products along with detailed descriptions etc.) and was completely updated and about 12x each day, i.e. it qrew to about 12x the original size (and 11/12 of the rows were dead). This caused a serious slowdown of the application each day, as the database had to scan 12x more data. We set up autovacuuming with the default parameters, but it interfered with the usual traffic - we had to play a little with the parameters (increase the delays, decrease the duration or something like that) and now it runs much better than before. No nightly vacuuming, no serious performance degradation during the day, etc. So yes - autovacuuming is recommended, but in some cases the default parameters have to be tuned a little bit. tomas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Fri, 16 Nov 2007 11:06:11 -0500 Jonah H. Harris [EMAIL PROTECTED] wrote: On Nov 16, 2007 10:56 AM, Dave Dutcher [EMAIL PROTECTED] wrote: I don't know about that. There are times when it is the right plan: Agreed. IMHO, there's nothing wrong with nested-loop join as long as it's being used properly. Can you explain further please? (I'm not disagreeing with you, just want to know when nested loops are not used properly - does the planner make mistakes that you have to watch out for?) Thx, Josh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)
In response to Jeff Davis [EMAIL PROTECTED]: On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote: This is probably a FAQ, but I can't find a good answer... So - are there common techniques to compensate for the lack of clustered/covering indexes in PostgreSQL? To be more specific - here is my table (simplified): topic_id int post_id int post_text varchar(1024) The most used query is: SELECT post_id, post_text FROM Posts WHERE topic_id=XXX. Normally I would have created a clustered index on topic_id, and the whole query would take ~1 disk seek. What would be the common way to handle this in PostgreSQL, provided that I can't afford 1 disk seek per record returned? Periodically CLUSTER the table on the topic_id index. The table will not be perfectly clustered at all times, but it will be close enough that it won't make much difference. There's still the hit of performing a CLUSTER, however. Another option, if you have a relatively small number of topic_ids, is to break it into separate tables, one for each topic_id. Or materialize the data, if performance is the utmost requirement. Create second table: materialized_topics ( topic_id int, post_ids int[], post_texts text[] ) Now add a trigger to your original table that updates materialized_topics any time the first table is altered. Thus you always have fast lookups. Of course, this may be non-optimal if that table sees a lot of updates. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Nov 16, 2007 10:56 AM, Dave Dutcher [EMAIL PROTECTED] wrote: I don't know about that. There are times when it is the right plan: Agreed. IMHO, there's nothing wrong with nested-loop join as long as it's being used properly. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Curious about dead rows.
Craig James wrote: Alvaro Herrera wrote: To recap: - your app only does inserts - there has been no rollback lately - there are no updates - there are no deletes The only other source of dead rows I can think is triggers ... do you have any? (Not necessarily on this table -- perhaps triggers on other tables can cause updates on this one). Oh, rolled back COPY can cause dead rows too. What about an unreliable network that causes lot of disconnects? Wouldn't the server process do a rollback? Perhaps in theory, but in practice my client and the postgreSQL servers are on the same machine and the 127.0.0.1 is pretty reliable: loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:30097919 errors:0 dropped:0 overruns:0 frame:0 TX packets:30097919 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:931924602 (888.7 MiB) TX bytes:931924602 (888.7 MiB) -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ^^-^^ 22:10:01 up 22 days, 15:28, 0 users, load average: 4.25, 4.21, 4.12 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
Dimitri wrote: Reading this article I'm just happy for them to see progress done on FreeBSD :-) As well to demonstrate OS parallelism it's not so impressive to see 4CPU server results rather 8CPU or 32threaded Niagara... Don't know why they did not present similar performance graphs for these platform, strange no?... Well, most of the results in the document (http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf) are for 8-CPU machines, which is about the most you can get with off the shelf hardware (2x4-core CPU, the document has both Xeon and Opteron results). Niagara support is unfinished, so there's nothing to report there. On the other hand, the document does compare between several versions of Linux, FreeBSD, NetBSD and DragonflyBSD, with both MySQL and PostgreSQL, so you can draw your conclusions (if any) from there. signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Clustered/covering indexes (or lack thereof :-)
On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote: This is probably a FAQ, but I can't find a good answer... So - are there common techniques to compensate for the lack of clustered/covering indexes in PostgreSQL? To be more specific - here is my table (simplified): topic_id int post_id int post_text varchar(1024) The most used query is: SELECT post_id, post_text FROM Posts WHERE topic_id=XXX. Normally I would have created a clustered index on topic_id, and the whole query would take ~1 disk seek. What would be the common way to handle this in PostgreSQL, provided that I can't afford 1 disk seek per record returned? Periodically CLUSTER the table on the topic_id index. The table will not be perfectly clustered at all times, but it will be close enough that it won't make much difference. There's still the hit of performing a CLUSTER, however. Another option, if you have a relatively small number of topic_ids, is to break it into separate tables, one for each topic_id. Regards, Jeff Davis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
On Nov 16, 2007 3:36 PM, Josh Trutwin [EMAIL PROTECTED] wrote: Agreed. IMHO, there's nothing wrong with nested-loop join as long as it's being used properly. Can you explain further please? (I'm not disagreeing with you, just want to know when nested loops are not used properly - does the planner make mistakes that you have to watch out for?) As long as statistics are updated properly, it's generally not an issue. You just don't want the system using a nested-loop join incorrectly (like when table sizes are equal, the outer table is larger than the inner table, or the inner table itself is overly large). -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Curious about dead rows.
On Nov 16, 2007 10:56 AM, Brad Nicholson [EMAIL PROTECTED] wrote: On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote: Russell Smith [EMAIL PROTECTED] writes: It is possible that analyze is not getting the number of dead rows right? Hah, I think you are on to something. ANALYZE is telling the truth about how many dead rows it saw, but its notion of dead is not good according to SnapshotNow. Thus, rows inserted by a not-yet-committed transaction would be counted as dead. So if these are background auto-analyzes being done in parallel with inserting transactions that run for awhile, seeing a few not-yet-committed rows would be unsurprising. I wonder if that is worth fixing? I'm not especially concerned about the cosmetic aspect of it, but if we mistakenly launch an autovacuum on the strength of an inflated estimate of dead rows, that could be costly. Sounds to me like that could result in autovacuum kicking off while doing large data loads. This sounds suspiciously like problem someone on -novice was having - tripping over a windows autovac bug while doing a data load http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php I am almost 100% I've seen this behavior in the field... merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Curious about dead rows.
On Fri, Nov 16, 2007 at 4:01 PM, in message [EMAIL PROTECTED], Merlin Moncure [EMAIL PROTECTED] wrote: On Nov 16, 2007 10:56 AM, Brad Nicholson [EMAIL PROTECTED] wrote: On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote: Russell Smith [EMAIL PROTECTED] writes: It is possible that analyze is not getting the number of dead rows right? Hah, I think you are on to something. ANALYZE is telling the truth about how many dead rows it saw, but its notion of dead is not good according to SnapshotNow. Thus, rows inserted by a not-yet-committed transaction would be counted as dead. So if these are background auto-analyzes being done in parallel with inserting transactions that run for awhile, seeing a few not-yet-committed rows would be unsurprising. I wonder if that is worth fixing? I'm not especially concerned about the cosmetic aspect of it, but if we mistakenly launch an autovacuum on the strength of an inflated estimate of dead rows, that could be costly. Sounds to me like that could result in autovacuum kicking off while doing large data loads. This sounds suspiciously like problem someone on -novice was having - tripping over a windows autovac bug while doing a data load http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php I am almost 100% I've seen this behavior in the field... I know I've seen bulk loads go significantly faster with autovacuum turned off. It always seemed like a bigger difference than what the ANALYZE would cause. I bet this explains it. -Kevin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] autovacuum: recommended?
[Gábor Farkas - Fri at 10:40:43AM +0100] my question is: is it recommended to use it? or in other words, should i only use autovacuum? or it's better to use manual-vacuuming? which one is the way of the future :) ? or should i use both auto-vacuum and manual-vacuum? Nightly vacuums are great if the activity on the database is very low night time. A combination is also good, the autovacuum will benefit from the nightly vacuum. My gut feeling says it's a good idea to leave autovacuum on, regardless of whether the nightly vacuums have been turned on or not. That being said, we have some huge tables in our database and pretty much traffic, and got quite some performance problems when the autovacuum kicked in and started vacuuming those huge tables, so we're currently running without. Autovacuum can be tuned to not touch those tables, but we've chosen to leave it off. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] autovacuum: recommended?
On Fri, 2007-11-16 at 12:13 +0100, Tobias Brox wrote: [snip] should i use both auto-vacuum and manual-vacuum? I would say for 8.2 that's the best strategy (which might change with 8.3 and it's multiple vacuum workers thingy). That being said, we have some huge tables in our database and pretty much traffic, and got quite some performance problems when the autovacuum kicked in and started vacuuming those huge tables, so we're currently running without. Autovacuum can be tuned to not touch those tables, but we've chosen to leave it off. We are doing that here, i.e. set up autovacuum not to touch big tables, and cover those with nightly vacuums if there is still some activity on them, and one weekly complete vacuum of the whole DB (vacuum without other params, preferably as the postgres user to cover system tables too). In fact we also have a few very frequently updated small tables, those are also covered by very frequent crontab vacuums because in 8.2 autovacuum can spend quite some time vacuuming some medium sized tables and in that interval the small but frequently updated ones get bloated. This should be better with 8.3 and multiple autovacuum workers. For the disable for autovacuum part search for pg_autovacuum in the docs. I would say the autovacuum + disable autovacuum on big tables + nightly vacuum + weekly vacuumdb + frequent crontab vacuum of very updated small tables works well in 8.2. One thing which could be needed is to also schedule continuous vacuum of big tables which are frequently updated, with big delay settings to throttle the resources used by the vacuum. We don't need that here because we don't update frequently our big tables... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings