[[I'm posting this on behalf of my co-worker who cannot post to this list at the moment]]


I had installed PostgreSQL on a 4-way AMD Opteron 875 (dual core) and the performance isn't on the expected level.

The "old" server is a 4-way XEON MP 3.0 GHz with 4MB L3 cache, 32 GB RAM (PC1600)  and local FC-RAID 10. Hyper-Threading is off. (DL580)
The "old" server is using Red Hat Enterprise Linux 3 Update 5.
The "new" server is a 4-way Opteron 875 with 1 MB L2 cache, 32 GB RAM (PC3200) and the same local FC-RAID 10. (HP DL585)
The "new" server is using Red Hat Enterprise Linux 4 (with the latest x86_64 kernel from Red Hat - 2.6.9-11.ELsmp #1 SMP Fri May 20 18:25:30 EDT 2005 x86_64)
I use PostgreSQL version 8.0.3.

The issue is that the Opteron is slower as the XEON MP under high load. I have created a test with parallel queries which are typical for my application. The queries are in a range of small queries (0.1 seconds) and larger queries using join (15 seconds).
The test starts parallel clients. Each clients runs the queries in a random order. The test takes care that a client use always the same random order to get valid results.

Here are the number of queries which the server has finished in a fix period of time.
I used PostgreSQL 8.1 snapshot from last week compiled as 64bit binary for DL585-64bit.
I used PostgreSQL 8.0.3 compiled as 32bit binary for DL585-32bit and DL580.
During the tests everything which is needed is in the file cache. I didn't have read activity.
Context switch  spikes are over 50000 during the test on both server. My feeling is that the XEON has a tick more context switches.

PostgreSQL params:
max_locks_per_transaction = 256
shared_buffers = 40000
effective_cache_size = 3840000
work_mem = 300000
maintenance_work_mem = 512000
wal_buffers = 32
checkpoint_segments = 24

I was expecting two times more queries on the DL585. The DL585 with PostgreSQL 8.0.3 32bit does meltdown earlier as the XEON in production use. Please compare 4 clients and 8 clients. With 4 clients the Opteron is in front and with 8 clients the XEON doesn't meltdown that much as the Opteron.

I don't have any idea what cause this. Benchmarks like SAP's SD 2-tier showing that the DL585 can handle nearly three times more load as the DL580 with XEON 3.0. We choose the 4-way Opteron 875 based on such benchmark to replace the 4-way XEON MP.

Does anyone have comments or ideas on which I have to focus my work?

I guess, the shared buffer cause the meltdown when to many clients are accessing the same data.
I didn't understand why the 4-way XEON MP 3.0 can deal with this better as the 4-way Opteron 875.
The system load on the Opteron is never over 3.0. The XEON MP has a load up to 4.0.

Should I try other settings for PostgreSQL in postgresql.conf?
Should I try other setting for the compilation?

I will compile the latest PostgreSQL 8.1 snapshot for 32bit to evaluate the new shared buffer code from Tom.
I think, the 64bit is slow because my queries are CPU intensive.

Can someone provide a commercial support contact for this issue?


