[PERFORM] Postgres Benchmark Results

2007-05-20 Thread PFC


I felt the world needed a new benchmark ;)
	So : Forum style benchmark with simulation of many users posting and  
viewing forums and topics on a PHP website.


http://home.peufeu.com/ftsbench/forum1.png

	One of those curves is a very popular open-source database which claims  
to offer unparallelled speed.
	The other one is of course Postgres 8.2.3 which by popular belief is  
full-featured but slow


What is your guess ?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Postgres Benchmark Results

2007-05-20 Thread Arjen van der Meijden
I assume red is PostgreSQL and green is MySQL. That reflects my own 
benchmarks with those two.


But I don't fully understand what the graph displays. Does it reflect 
the ability of the underlying database to support a certain amount of 
users per second given a certain database size? Or is the growth of the 
database part of the benchmark?


Btw, did you consider that older topics are normally read much less and 
almost never get new postings? I think the size of the active data set 
is more dependent on the amount of active members than on the actual 
amount of data available.
That can reduce the impact of the size of the database greatly, although 
we saw very nice gains in performance on our forum (over 22GB of 
messages) when replacing the databaseserver with one with twice the 
memory, cpu's and I/O.


Best regards,

Arjen

On 20-5-2007 16:58 PFC wrote:


I felt the world needed a new benchmark ;)
So : Forum style benchmark with simulation of many users posting and 
viewing forums and topics on a PHP website.


http://home.peufeu.com/ftsbench/forum1.png

One of those curves is a very popular open-source database which 
claims to offer unparallelled speed.
The other one is of course Postgres 8.2.3 which by popular belief is 
full-featured but slow


What is your guess ?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Postgres Benchmark Results

2007-05-20 Thread PFC


I assume red is PostgreSQL and green is MySQL. That reflects my own  
benchmarks with those two.


Well, since you answered first, and right, you win XD

The little curve that dives into the ground is MySQL with InnoDB.
The Energizer bunny that keeps going is Postgres.

But I don't fully understand what the graph displays. Does it reflect  
the ability of the underlying database to support a certain amount of  
users per second given a certain database size? Or is the growth of the  
database part of the benchmark?


	Basically I have a test client which simulates a certain number of  
concurrent users browsing a forum, and posting (posting rate is  
artificially high in order to fill the tables quicker than the months it  
would take in real life).


	Since the fake users pick which topics to view and post in by browsing  
the pages, like people would do, it tends to pick the topics in the first  
few pages of the forum, those with the most recent posts. So, like in real  
life, some topics fall through the first pages, and go down to rot at the  
bottom, while others grow much more.


	So, as the database grows (X axis) ; the total number of webpages served  
per second (viewings + postings) is on the Y axis, representing the user's  
experience (fast / slow / dead server)


	The number of concurrent HTTP or Postgres connections is not plotted, it  
doesn't really matter anyway for benchmarking purposes, you need to have  
enough to keep the server busy, but not too much or you're just wasting  
RAM. For a LAN that's about 30 HTTP connections and about 8 PHP processes  
with each a database connection.
	Since I use lighttpd, I don't really care about the number of actual slow  
clients (ie. real concurrent HTTP connections). Everything is funneled  
through those 8 PHP processes, so postgres never sees huge concurrency.

About 2/3 of the CPU is used by PHP anyway, only 1/3 by Postgres ;)

Btw, did you consider that older topics are normally read much less and  
almost never get new postings? I think the size of the active data set  
is more dependent on the amount of active members than on the actual  
amount of data available.


Yes, see above.
	The posts table is clustered on (topic_id, post_id) and this is key to  
performance.


That can reduce the impact of the size of the database greatly, although  
we saw very nice gains in performance on our forum (over 22GB of  
messages) when replacing the databaseserver with one with twice the  
memory, cpu's and I/O.


Well, you can see on the curve when it hits IO-bound behaviour.

	I'm writing a full report, but I'm having a lot of problems with MySQL,  
I'd like to give it a fair chance, but it shows real obstination in NOT  
working.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Diminishing bandwidth performance with multiple quad core X5355s

2007-05-20 Thread Arjen van der Meijden

On 14-5-2007 0:00 jlmarin wrote:

I wanted to post this even if it's a bit late on the thread because
right now I have exactly this kind of problem.
We're trying to figure out if a dual-Quadcore (Xeon) will be better
(cost/benefit wise) than a 4-way Opteron dualcore, for *our* program.


We've benchmarked the Sun Fire x4600 (with the older socket 939 cpu's) 
and compared it to a much cheaper dual quad core xeon X5355.


As you can see on the end of this page:
http://tweakers.net/reviews/674/8

The 4-way dual core opteron performs less (in our benchmark) than the 
2-way quad core xeon. Our benchmark does not consume a lot of memory, 
but I don't know which of the two profits most of that. Obviously it may 
well be that the Socket F opterons with support for DDR2 memory perform 
better, but we haven't seen much proof of that.
Given the cost of a 4-way dual core opteron vs a 2-way quad core xeon, 
I'd go for the latter for now. The savings can be used to build a system 
with heavier I/O and/or more memory, which normally yield bigger gains 
in database land.
For example a Dell 2900 with 2x X5355 + 16GB of memory costs about 7000 
euros less than a Dell 6950 with 4x 8220 + 16GB. You can buy an 
additional MD1000 with 15x 15k rpm disks for that... And I doubt you'll 
find any real-world database benchmark that will favour the 
opteron-system if you look at the price/performance-picture.


Of course this picture might very well change as soon as the new 
'Barcelona' quad core opterons are finally available.



As you say, Opterons do definitely have a much better memory system.
But then a 4-way mobo is WAY more expensive that a dual-socket one...


And it might be limited by NUMA and the relatively simple broadcast 
architecture for cache coherency.


Best regards,

Arjen van der Meijden

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Postgres Benchmark Results

2007-05-20 Thread Arjen van der Meijden

On 20-5-2007 19:09 PFC wrote:
Since I use lighttpd, I don't really care about the number of actual 
slow clients (ie. real concurrent HTTP connections). Everything is 
funneled through those 8 PHP processes, so postgres never sees huge 
concurrency.


Well, that would only be in favour of postgres anyway, it scales in our 
benchmarks better to multiple cpu's, multiple clients and appaerantly in 
yours to larger datasets. MySQL seems to be faster up untill a certain 
amount of concurrent clients (close to the amount of cpu's available) 
and beyond that can collapse dramatically.


I'm writing a full report, but I'm having a lot of problems with 
MySQL, I'd like to give it a fair chance, but it shows real obstination 
in NOT working.


Yeah, it displayed very odd behaviour when doing benchmarks here too. If 
you haven't done already, you can try the newest 5.0-verion (5.0.41?) 
which eliminates several scaling issues in InnoDB, but afaik not all of 
them. Besides that, it just can be pretty painful to get a certain query 
fast, although we've not very often seen it failing completely in the 
last few years.


Best regards,

Arjen van der Meijden

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Postgres Benchmark Results

2007-05-20 Thread Tom Lane
PFC [EMAIL PROTECTED] writes:
   The little curve that dives into the ground is MySQL with InnoDB.
   The Energizer bunny that keeps going is Postgres.

Just for comparison's sake it would be interesting to see a curve for
mysql/myisam.  Mysql's claim to speed is mostly based on measurements
taken with myisam tables, but I think that doesn't hold up very well
under concurrent load.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Postgres Benchmark Results

2007-05-20 Thread Zoltan Boszormenyi

PFC írta:


I felt the world needed a new benchmark ;)
So : Forum style benchmark with simulation of many users posting 
and viewing forums and topics on a PHP website.


http://home.peufeu.com/ftsbench/forum1.png

One of those curves is a very popular open-source database which 
claims to offer unparallelled speed.
The other one is of course Postgres 8.2.3 which by popular belief 
is full-featured but slow


What is your guess ?


Red is PostgreSQL.

The advertised unparallelled speed must surely mean
benchmarking only single-client access on the noname DB. ;-)

I also went into benchmarking mode last night for my own
amusement when I read on the linux-kernel ML that
NCQ support for nForce5 chips was released.
I tried current PostgreSQL 8.3devel CVS.
pgbench over local TCP connection with
25 clients and 3000 transacts/client gave me
around 445 tps before applying NCQ support.
680 tps after.

It went over 840 tps after adding HOT v7 patch,
still with 25 clients. It topped at 1062 tps with 3-4 clients.
I used a single Seagate 320GB SATA2 drive
for the test, which only has less than 40GB free.
So it's already at the end of the disk giving smaller
transfer rates then at the beginning. Filesystem is ext3.
Dual core Athlon64 X2 4200 in 64-bit mode.
I have never seen such a performance before
on a desktop machine.

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Postgres Benchmark Results

2007-05-20 Thread PFC

On Sun, 20 May 2007 19:26:38 +0200, Tom Lane [EMAIL PROTECTED] wrote:


PFC [EMAIL PROTECTED] writes:

The little curve that dives into the ground is MySQL with InnoDB.
The Energizer bunny that keeps going is Postgres.


Just for comparison's sake it would be interesting to see a curve for
mysql/myisam.  Mysql's claim to speed is mostly based on measurements
taken with myisam tables, but I think that doesn't hold up very well
under concurrent load.

regards, tom lane



I'm doing that now. Here is what I wrote in the report :

	Using prepared statements (important), Postgres beats MyISAM on simple  
selects as they say, as well as complex selects, even with 1 thread.


	MyISAM caused massive data corruption : posts and topics disappear,  
storage engine errors pop off, random thrashed rows appear in the forums  
table, therefore screwing up everything, etc. In short : it doesn't work.  
But, since noone in their right mind would use MyISAM for critical data, I  
include this result anyway, as a curiosity.


	I had to write a repair SQL script to fix the corruption in order to see  
how MySQL will fare when it gets bigger than RAM...



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Postgres Benchmark Results

2007-05-20 Thread Andreas Kostyrka

   I'm writing a full report, but I'm having a 
 lot of problems with MySQL,  
 I'd like to give it a fair chance, but it shows 
 real obstination in NOT  
 working.

Well that matches up well with my experience, better even yet, file a 
performance bug to the commercial support and you'll get an explanation why 
your schema (or your hardware, well anything but the database software used) is 
the guilty factor.

but you know these IT manager journals consider mysql as the relevant 
opensource database. Guess it matches better with their expection than PG or 
say MaxDB (the artist known formerly as Sap DB).

Andreas


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Ever Increasing IOWAIT

2007-05-20 Thread Ralph Mason
Ralph Mason wrote:
 We have a database running on a 4 processor machine.  As time goes by 
 the IO gets worse and worse peeking at about 200% as the machine loads up.
 
 The weird thing is that if we restart postgres it’s fine for hours but 
 over time it goes bad again.
 
 (CPU usage graph here HYPERLINK
 http://www.flickr.com/photos/[EMAIL PROTECTED]/502596262/http://www.flickr
 .com/p hotos/[EMAIL PROTECTED]/502596262/ )  You can clearly see where the 
 restart happens in the IO area
I'm assuming here we're talking about that big block of iowait at about 
4-6am?

Actually no - that is a vacuum of the whole database to double check It's
not a vacuuming problem (I am sure it's not).  The restart is at at 22:00
where you see the io drop to nothing, the database is still doing the same
work.

I take it vmstat/iostat show a corresponding increase in disk activity 
at that time.

I didn't know you could have IO/wait without disk activity - I will check
that out. 

The question is - what?
Does the number of PG processes increase at that time? If that's not 
intentional then you might need to see what your applications are up to.

No the number of connections is stable and the jobs they do stays the same,
just this deteriorating of i/o wait over time.

Do you have a vacuum/backup scheduled for that time? Do you have some 
other process doing a lot of file I/O at that time?

 This is Postgres  8.1.4 64bit.

You'll want to upgrade to the latest patch release - you're missing 5 
lots of bug-fixes there.

Thanks - will try that.


-- 
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006
4:07 p.m.
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Ever Increasing IOWAIT

2007-05-20 Thread Ralph Mason


You're not swapping are you?  One explanation could be that PG is
configured to think it has access to a little more memory than the box
can really provide, which forces it to swap once it's been running for
long enough to fill up its shared buffers or after a certain number of
concurrent connections are opened.

-- Mark Lewis

No - no swap on this machine. The number of connections is stable.

Ralph


On Fri, 2007-05-18 at 10:45 +1200, Ralph Mason wrote:
 We have a database running on a 4 processor machine.  As time goes by
 the IO gets worse and worse peeking at about 200% as the machine loads
 up.
 
  
 
 The weird thing is that if we restart postgres it’s fine for hours but
 over time it goes bad again.
 
  
 
 (CPU usage graph here
 http://www.flickr.com/photos/[EMAIL PROTECTED]/502596262/ )  You can clearly
 see where the restart happens in the IO area
 
  
 
 This is Postgres  8.1.4 64bit.
 
  
 
 Anyone have any ideas?
 
  
 
 Thanks
 
 Ralph
 
  
 
 
 
 --
 Internal Virus Database is out-of-date.
 Checked by AVG Free Edition.
 Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date:
 5/12/2006 4:07 p.m.
 
 

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

-- 
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 
p.m.
 

-- 
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006 4:07 
p.m.
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Ever Increasing IOWAIT

2007-05-20 Thread Tom Lane
Ralph Mason [EMAIL PROTECTED] writes:
 Ralph Mason wrote:
 We have a database running on a 4 processor machine.  As time goes by 
 the IO gets worse and worse peeking at about 200% as the machine loads up.
  
 The weird thing is that if we restart postgres it's fine for hours but
 over time it goes bad again.

Do you by any chance have stats collection enabled and
stats_reset_on_server_start set to true? If so, maybe this is explained
by growth in the size of the stats file over time.  It'd be interesting
to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to-
slow cycle.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Rewriting DISTINCT and losing performance

2007-05-20 Thread Chuck D.
Hi all,

I know we've covered this before but I'm having trouble with it today.

I have some geographic data in tables that I'm working with.  I have a
country, state and city table.  I was selecting the country_name out of the
country table but discovered that some countries (like Antarctica) didn't
have cities in the city table.

I resolved to query the country table for only country_name's which had
country_id's in the city table - meaning the country had cities listed.

The problem was I had a couple different sources (in separate tables) with
some extraneous column  data so I chose to consolidate the city tables from
the different sources and column data that I don't need because I don't have
the hardware to support it.

That was the end of my query time.

Here's the original table and query:

# \d geo.world_city
 Table geo.world_city
   Column   |  Type  | Modifiers
++---
 city_id| integer| not null
 state_id   | smallint   |
 country_id | smallint   |
 rc | smallint   |
 latitude   | numeric(9,7)   |
 longitude  | numeric(10,7)  |
 dsg| character(5)   |
 cc1| character(2)   |
 adm1   | character(2)   |
 city_name  | character varying(200) |
Indexes:
world_city_pk PRIMARY KEY, btree (city_id)
idx_world_city_cc1 btree (cc1)
idx_world_city_cc1_adm1 btree (cc1, adm1)
idx_world_city_country_id btree (country_id)
idx_world_city_name_first_letter btree
(state_id, substring(lower(city_name::text), 1, 1))
idx_world_city_state_id btree (state_id)

explain analyze
SELECT   country_id, country_name
FROM geo.country
WHERE country_id IN
 (select country_id FROM geo.world_city)
;

  QUERY
PLAN
-
--
 Nested Loop IN Join  (cost=0.00..167.97 rows=155 width=15) (actual
time=85.502..3479.449 rows=231 loops=1)
   -  Seq Scan on country  (cost=0.00..6.44 rows=244 width=15) (actual
time=0.089..0.658 rows=244 loops=1)
   -  Index Scan using idx_world_city_country_id on world_city
(cost=0.00..8185.05 rows=12602 width=2) (actual time=14.250..14.250 rows=1
loops=244)
 Index Cond: (country.country_id = world_city.country_id)
 Total runtime: 3479.921 ms

Odd that it took 3 seconds because every previous run has been much quicker.
The next run was:

 QUERY
PLAN
-

 Nested Loop IN Join  (cost=0.00..167.97 rows=155 width=15) (actual
time=0.087..6.967 rows=231 loops=1)
   -  Seq Scan on country  (cost=0.00..6.44 rows=244 width=15) (actual
time=0.028..0.158 rows=244 loops=1)
   -  Index Scan using idx_world_city_country_id on world_city
(cost=0.00..8185.05 rows=12602 width=2) (actual time=0.026..0.026 rows=1
loops=244)
 Index Cond: (country.country_id = world_city.country_id)
 Total runtime: 7.132 ms
(5 rows)


But that was irrelevant.  I created a new table and eliminated the data and
 it looks like this:

# \d geo.city
Table geo.city
   Column   |  Type  | Modifiers
++---
 city_id| integer| not null
 state_id   | smallint   |
 country_id | smallint   |
 latitude   | numeric(9,7)   |
 longitude  | numeric(10,7)  |
 city_name  | character varying(100) |
Indexes:
city_pk PRIMARY KEY, btree (city_id)
idx_city_country_id btree (country_id) CLUSTER
Foreign-key constraints:
city_state_id_fk FOREIGN KEY (state_id) REFERENCES geo.state(state_id)
ON UPDATE CASCADE ON DELETE CASCADE

explain analyze
SELECT   country_id, country_name
FROM geo.country
WHERE country_id IN
 (select country_id FROM geo.city)
;

-- won't complete in a reasonable amount of time.

This one won't use the country_id index. The two tables have almost the same
number of rows:

cmi=# select count(*) from geo.world_city;
  count
-
 1953314
(1 row)

cmi=# select count(*) from geo.city;
  count
-
 2122712
(1 row)


 I tried to force it and didn't see any improvement.  I've vacuummed,
analyzed, clustered.  Can someone help me to get only the countries who have
cities in the city table in a reasonable amount of time?

---

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Ever Increasing IOWAIT

2007-05-20 Thread Ralph Mason
Ralph Mason [EMAIL PROTECTED] writes:
 Ralph Mason wrote:
 We have a database running on a 4 processor machine.  As time goes by 
 the IO gets worse and worse peeking at about 200% as the machine loads
up.
  
 The weird thing is that if we restart postgres it's fine for hours but
 over time it goes bad again.

Do you by any chance have stats collection enabled and
stats_reset_on_server_start set to true? If so, maybe this is explained
by growth in the size of the stats file over time.  It'd be interesting
to keep an eye on the size of $PGDATA/global/pgstat.stat over a fast-to-
slow cycle.

We do because we use the stats to figure out when we will vacuum.  Our
vacuum process reads that table and when it runs resets it using
pg_stat_reset() to clear it down each time it runs (about ever 60 seconds
when the db is very busy), stats_reset_on_server_restart is off.

Interestingly after a suggestion here I went and looked at the IO stat at
the same time.  It shows the writes as expected and picking up exactly where
they were before the reset, but the reads drop dramatically - like it's
reading far less data after the reset.

I will watch the size of the pgstat.stat table.

Ralph




-- 
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.9/573 - Release Date: 5/12/2006
4:07 p.m.
 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly