[PERFORM] Postgres Benchmark Results
I felt the world needed a new benchmark ;) So : Forum style benchmark with simulation of many users posting and viewing forums and topics on a PHP website. http://home.peufeu.com/ftsbench/forum1.png One of those curves is a very popular open-source database which claims to offer unparallelled speed. The other one is of course Postgres 8.2.3 which by popular belief is full-featured but slow What is your guess ? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Postgres Benchmark Results
I assume red is PostgreSQL and green is MySQL. That reflects my own benchmarks with those two. But I don't fully understand what the graph displays. Does it reflect the ability of the underlying database to support a certain amount of users per second given a certain database size? Or is the growth of the database part of the benchmark? Btw, did you consider that older topics are normally read much less and almost never get new postings? I think the size of the active data set is more dependent on the amount of active members than on the actual amount of data available. That can reduce the impact of the size of the database greatly, although we saw very nice gains in performance on our forum (over 22GB of messages) when replacing the databaseserver with one with twice the memory, cpu's and I/O. Best regards, Arjen On 20-5-2007 16:58 PFC wrote: I felt the world needed a new benchmark ;) So : Forum style benchmark with simulation of many users posting and viewing forums and topics on a PHP website. http://home.peufeu.com/ftsbench/forum1.png One of those curves is a very popular open-source database which claims to offer unparallelled speed. The other one is of course Postgres 8.2.3 which by popular belief is full-featured but slow What is your guess ? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Postgres Benchmark Results
I assume red is PostgreSQL and green is MySQL. That reflects my own benchmarks with those two. Well, since you answered first, and right, you win XD The little curve that dives into the ground is MySQL with InnoDB. The Energizer bunny that keeps going is Postgres. But I don't fully understand what the graph displays. Does it reflect the ability of the underlying database to support a certain amount of users per second given a certain database size? Or is the growth of the database part of the benchmark? Basically I have a test client which simulates a certain number of concurrent users browsing a forum, and posting (posting rate is artificially high in order to fill the tables quicker than the months it would take in real life). Since the fake users pick which topics to view and post in by browsing the pages, like people would do, it tends to pick the topics in the first few pages of the forum, those with the most recent posts. So, like in real life, some topics fall through the first pages, and go down to rot at the bottom, while others grow much more. So, as the database grows (X axis) ; the total number of webpages served per second (viewings + postings) is on the Y axis, representing the user's experience (fast / slow / dead server) The number of concurrent HTTP or Postgres connections is not plotted, it doesn't really matter anyway for benchmarking purposes, you need to have enough to keep the server busy, but not too much or you're just wasting RAM. For a LAN that's about 30 HTTP connections and about 8 PHP processes with each a database connection. Since I use lighttpd, I don't really care about the number of actual slow clients (ie. real concurrent HTTP connections). Everything is funneled through those 8 PHP processes, so postgres never sees huge concurrency. About 2/3 of the CPU is used by PHP anyway, only 1/3 by Postgres ;) Btw, did you consider that older topics are normally read much less and almost never get new postings? I think the size of the active data set is more dependent on the amount of active members than on the actual amount of data available. Yes, see above. The posts table is clustered on (topic_id, post_id) and this is key to performance. That can reduce the impact of the size of the database greatly, although we saw very nice gains in performance on our forum (over 22GB of messages) when replacing the databaseserver with one with twice the memory, cpu's and I/O. Well, you can see on the curve when it hits IO-bound behaviour. I'm writing a full report, but I'm having a lot of problems with MySQL, I'd like to give it a fair chance, but it shows real obstination in NOT working. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Diminishing bandwidth performance with multiple quad core X5355s
On 14-5-2007 0:00 jlmarin wrote: I wanted to post this even if it's a bit late on the thread because right now I have exactly this kind of problem. We're trying to figure out if a dual-Quadcore (Xeon) will be better (cost/benefit wise) than a 4-way Opteron dualcore, for *our* program. We've benchmarked the Sun Fire x4600 (with the older socket 939 cpu's) and compared it to a much cheaper dual quad core xeon X5355. As you can see on the end of this page: http://tweakers.net/reviews/674/8 The 4-way dual core opteron performs less (in our benchmark) than the 2-way quad core xeon. Our benchmark does not consume a lot of memory, but I don't know which of the two profits most of that. Obviously it may well be that the Socket F opterons with support for DDR2 memory perform better, but we haven't seen much proof of that. Given the cost of a 4-way dual core opteron vs a 2-way quad core xeon, I'd go for the latter for now. The savings can be used to build a system with heavier I/O and/or more memory, which normally yield bigger gains in database land. For example a Dell 2900 with 2x X5355 + 16GB of memory costs about 7000 euros less than a Dell 6950 with 4x 8220 + 16GB. You can buy an additional MD1000 with 15x 15k rpm disks for that... And I doubt you'll find any real-world database benchmark that will favour the opteron-system if you look at the price/performance-picture. Of course this picture might very well change as soon as the new 'Barcelona' quad core opterons are finally available. As you say, Opterons do definitely have a much better memory system. But then a 4-way mobo is WAY more expensive that a dual-socket one... And it might be limited by NUMA and the relatively simple broadcast architecture for cache coherency. Best regards, Arjen van der Meijden ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Postgres Benchmark Results
On 20-5-2007 19:09 PFC wrote: Since I use lighttpd, I don't really care about the number of actual slow clients (ie. real concurrent HTTP connections). Everything is funneled through those 8 PHP processes, so postgres never sees huge concurrency. Well, that would only be in favour of postgres anyway, it scales in our benchmarks better to multiple cpu's, multiple clients and appaerantly in yours to larger datasets. MySQL seems to be faster up untill a certain amount of concurrent clients (close to the amount of cpu's available) and beyond that can collapse dramatically. I'm writing a full report, but I'm having a lot of problems with MySQL, I'd like to give it a fair chance, but it shows real obstination in NOT working. Yeah, it displayed very odd behaviour when doing benchmarks here too. If you haven't done already, you can try the newest 5.0-verion (5.0.41?) which eliminates several scaling issues in InnoDB, but afaik not all of them. Besides that, it just can be pretty painful to get a certain query fast, although we've not very often seen it failing completely in the last few years. Best regards, Arjen van der Meijden ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Postgres Benchmark Results
PFC [EMAIL PROTECTED] writes: The little curve that dives into the ground is MySQL with InnoDB. The Energizer bunny that keeps going is Postgres. Just for comparison's sake it would be interesting to see a curve for mysql/myisam. Mysql's claim to speed is mostly based on measurements taken with myisam tables, but I think that doesn't hold up very well under concurrent load. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Postgres Benchmark Results
PFC írta: I felt the world needed a new benchmark ;) So : Forum style benchmark with simulation of many users posting and viewing forums and topics on a PHP website. http://home.peufeu.com/ftsbench/forum1.png One of those curves is a very popular open-source database which claims to offer unparallelled speed. The other one is of course Postgres 8.2.3 which by popular belief is full-featured but slow What is your guess ? Red is PostgreSQL. The advertised unparallelled speed must surely mean benchmarking only single-client access on the noname DB. ;-) I also went into benchmarking mode last night for my own amusement when I read on the linux-kernel ML that NCQ support for nForce5 chips was released. I tried current PostgreSQL 8.3devel CVS. pgbench over local TCP connection with 25 clients and 3000 transacts/client gave me around 445 tps before applying NCQ support. 680 tps after. It went over 840 tps after adding HOT v7 patch, still with 25 clients. It topped at 1062 tps with 3-4 clients. I used a single Seagate 320GB SATA2 drive for the test, which only has less than 40GB free. So it's already at the end of the disk giving smaller transfer rates then at the beginning. Filesystem is ext3. Dual core Athlon64 X2 4200 in 64-bit mode. I have never seen such a performance before on a desktop machine. -- -- Zoltán Böszörményi Cybertec Geschwinde Schönig GmbH http://www.postgresql.at/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Postgres Benchmark Results
On Sun, 20 May 2007 19:26:38 +0200, Tom Lane [EMAIL PROTECTED] wrote: PFC [EMAIL PROTECTED] writes: The little curve that dives into the ground is MySQL with InnoDB. The Energizer bunny that keeps going is Postgres. Just for comparison's sake it would be interesting to see a curve for mysql/myisam. Mysql's claim to speed is mostly based on measurements taken with myisam tables, but I think that doesn't hold up very well under concurrent load. regards, tom lane I'm doing that now. Here is what I wrote in the report : Using prepared statements (important), Postgres beats MyISAM on simple selects as they say, as well as complex selects, even with 1 thread. MyISAM caused massive data corruption : posts and topics disappear, storage engine errors pop off, random thrashed rows appear in the forums table, therefore screwing up everything, etc. In short : it doesn't work. But, since noone in their right mind would use MyISAM for critical data, I include this result anyway, as a curiosity. I had to write a repair SQL script to fix the corruption in order to see how MySQL will fare when it gets bigger than RAM... ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Postgres Benchmark Results
I'm writing a full report, but I'm having a lot of problems with MySQL, I'd like to give it a fair chance, but it shows real obstination in NOT working. Well that matches up well with my experience, better even yet, file a performance bug to the commercial support and you'll get an explanation why your schema (or your hardware, well anything but the database software used) is the guilty factor. but you know these IT manager journals consider mysql as the relevant opensource database. Guess it matches better with their expection than PG or say MaxDB (the artist known formerly as Sap DB). Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Ever Increasing IOWAIT
Ralph Mason wrote: We have a database running on a 4 processor machine. As time goes by the IO gets worse and worse peeking at about 200% as the machine loads up. The weird thing is that if we restart postgres it’s fine for hours but over time it goes bad again. (CPU usage graph here HYPERLINK http://www.flickr.com/photos/[EMAIL PROTECTED]/502596262/http://www.flickr .com/p hotos/[EMAIL PROTECTED]/502596262/ ) You can clearly see where the restart happens in the IO area I'm assuming here we're talking about that big block of iowait at about 4-6am? Actually no - that is a vacuum of the whole database to double check It's not a vacuuming problem (I am sure it's not). The restart is at at 22:00 where you see the io drop to nothing, the database is still doing the same work. I take it vmstat/iostat show a corresponding increase in disk activity at that time. I didn't know you could have IO/wait without disk activity - I will check that out. The question is - what? Does the number of PG processes increase at that time? If that's not intentional then you might need to see what your applications are up to. No the number of connections is stable and the jobs they do stays the same, just this deteriorating of i/o wait over time. Do you have a vacuum/backup scheduled for that time? Do you have some other process doing a lot of file I/O at that time? This is Postgres 8.1.4 64bit. You'll want to upgrade to the latest patch release - you're missing 5 lots of bug-fixes there. Thanks - will try that. -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Ever Increasing IOWAIT
You're not swapping are you? One explanation could be that PG is configured to think it has access to a little more memory than the box can really provide, which forces it to swap once it's been running for long enough to fill up its shared buffers or after a certain number of concurrent connections are opened. -- Mark Lewis No - no swap on this machine. The number of connections is stable. Ralph On Fri, 2007-05-18 at 10:45 +1200, Ralph Mason wrote: We have a database running on a 4 processor machine. As time goes by the IO gets worse and worse peeking at about 200% as the machine loads up. The weird thing is that if we restart postgres it’s fine for hours but over time it goes bad again. (CPU usage graph here http://www.flickr.com/photos/[EMAIL PROTECTED]/502596262/ ) You can clearly see where the restart happens in the IO area This is Postgres 8.1.4 64bit. Anyone have any ideas? Thanks Ralph -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m. -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Ever Increasing IOWAIT
Ralph Mason [EMAIL PROTECTED] writes: Ralph Mason wrote: We have a database running on a 4 processor machine. As time goes by the IO gets worse and worse peeking at about 200% as the machine loads up. The weird thing is that if we restart postgres it's fine for hours but over time it goes bad again. Do you by any chance have stats collection enabled and stats_reset_on_server_start set to true? If so, maybe this is explained by growth in the size of the stats file over time. It'd be interesting to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to- slow cycle. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Rewriting DISTINCT and losing performance
Hi all, I know we've covered this before but I'm having trouble with it today. I have some geographic data in tables that I'm working with. I have a country, state and city table. I was selecting the country_name out of the country table but discovered that some countries (like Antarctica) didn't have cities in the city table. I resolved to query the country table for only country_name's which had country_id's in the city table - meaning the country had cities listed. The problem was I had a couple different sources (in separate tables) with some extraneous column data so I chose to consolidate the city tables from the different sources and column data that I don't need because I don't have the hardware to support it. That was the end of my query time. Here's the original table and query: # \d geo.world_city Table geo.world_city Column | Type | Modifiers ++--- city_id| integer| not null state_id | smallint | country_id | smallint | rc | smallint | latitude | numeric(9,7) | longitude | numeric(10,7) | dsg| character(5) | cc1| character(2) | adm1 | character(2) | city_name | character varying(200) | Indexes: world_city_pk PRIMARY KEY, btree (city_id) idx_world_city_cc1 btree (cc1) idx_world_city_cc1_adm1 btree (cc1, adm1) idx_world_city_country_id btree (country_id) idx_world_city_name_first_letter btree (state_id, substring(lower(city_name::text), 1, 1)) idx_world_city_state_id btree (state_id) explain analyze SELECT country_id, country_name FROM geo.country WHERE country_id IN (select country_id FROM geo.world_city) ; QUERY PLAN - -- Nested Loop IN Join (cost=0.00..167.97 rows=155 width=15) (actual time=85.502..3479.449 rows=231 loops=1) - Seq Scan on country (cost=0.00..6.44 rows=244 width=15) (actual time=0.089..0.658 rows=244 loops=1) - Index Scan using idx_world_city_country_id on world_city (cost=0.00..8185.05 rows=12602 width=2) (actual time=14.250..14.250 rows=1 loops=244) Index Cond: (country.country_id = world_city.country_id) Total runtime: 3479.921 ms Odd that it took 3 seconds because every previous run has been much quicker. The next run was: QUERY PLAN - Nested Loop IN Join (cost=0.00..167.97 rows=155 width=15) (actual time=0.087..6.967 rows=231 loops=1) - Seq Scan on country (cost=0.00..6.44 rows=244 width=15) (actual time=0.028..0.158 rows=244 loops=1) - Index Scan using idx_world_city_country_id on world_city (cost=0.00..8185.05 rows=12602 width=2) (actual time=0.026..0.026 rows=1 loops=244) Index Cond: (country.country_id = world_city.country_id) Total runtime: 7.132 ms (5 rows) But that was irrelevant. I created a new table and eliminated the data and it looks like this: # \d geo.city Table geo.city Column | Type | Modifiers ++--- city_id| integer| not null state_id | smallint | country_id | smallint | latitude | numeric(9,7) | longitude | numeric(10,7) | city_name | character varying(100) | Indexes: city_pk PRIMARY KEY, btree (city_id) idx_city_country_id btree (country_id) CLUSTER Foreign-key constraints: city_state_id_fk FOREIGN KEY (state_id) REFERENCES geo.state(state_id) ON UPDATE CASCADE ON DELETE CASCADE explain analyze SELECT country_id, country_name FROM geo.country WHERE country_id IN (select country_id FROM geo.city) ; -- won't complete in a reasonable amount of time. This one won't use the country_id index. The two tables have almost the same number of rows: cmi=# select count(*) from geo.world_city; count - 1953314 (1 row) cmi=# select count(*) from geo.city; count - 2122712 (1 row) I tried to force it and didn't see any improvement. I've vacuummed, analyzed, clustered. Can someone help me to get only the countries who have cities in the city table in a reasonable amount of time? --- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Ever Increasing IOWAIT
Ralph Mason [EMAIL PROTECTED] writes: Ralph Mason wrote: We have a database running on a 4 processor machine. As time goes by the IO gets worse and worse peeking at about 200% as the machine loads up. The weird thing is that if we restart postgres it's fine for hours but over time it goes bad again. Do you by any chance have stats collection enabled and stats_reset_on_server_start set to true? If so, maybe this is explained by growth in the size of the stats file over time. It'd be interesting to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to- slow cycle. We do because we use the stats to figure out when we will vacuum. Our vacuum process reads that table and when it runs resets it using pg_stat_reset() to clear it down each time it runs (about ever 60 seconds when the db is very busy), stats_reset_on_server_restart is off. Interestingly after a suggestion here I went and looked at the IO stat at the same time. It shows the writes as expected and picking up exactly where they were before the reset, but the reads drop dramatically - like it's reading far less data after the reset. I will watch the size of the pgstat.stat table. Ralph -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 p.m. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly