Re: [PERFORM] Really bad diskio

2005-07-16 Thread Ron Wills
At Fri, 15 Jul 2005 14:39:36 -0600,
Ron Wills wrote:

  I just wanted to thank everyone for their help. I believe we found a
solution that will help with this problem, with the hardware
configuration and caching the larger tables into smaller data sets. 
A valuable lesson learned from this ;)

 Hello all
 
   I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
 an 3Ware SATA raid. Currently the database is only 16G with about 2
 tables with 50+ row, one table 20+ row and a few small
 tables. The larger tables get updated about every two hours. The
 problem I having with this server (which is in production) is the disk
 IO. On the larger tables I'm getting disk IO wait averages of
 ~70-90%. I've been tweaking the linux kernel as specified in the
 PostgreSQL documentations and switched to the deadline
 scheduler. Nothing seems to be fixing this. The queries are as
 optimized as I can get them. fsync is off in an attempt to help
 preformance still nothing. Are there any setting I should be look at
 the could improve on this???
 
 Thanks for and help in advance.
 
 Ron
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Joshua D. Drake

Ron Wills wrote:

Hello all

  I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
an 3Ware SATA raid. 


2 drives?
4 drives?
8 drives?

RAID 1? 0? 10? 5?


Currently the database is only 16G with about 2

tables with 50+ row, one table 20+ row and a few small
tables. The larger tables get updated about every two hours. The
problem I having with this server (which is in production) is the disk
IO. On the larger tables I'm getting disk IO wait averages of
~70-90%. I've been tweaking the linux kernel as specified in the
PostgreSQL documentations and switched to the deadline
scheduler. Nothing seems to be fixing this. The queries are as
optimized as I can get them. fsync is off in an attempt to help
preformance still nothing. Are there any setting I should be look at
the could improve on this???

Thanks for and help in advance.

Ron

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Really bad diskio

2005-07-15 Thread Ron Wills
Hello all

  I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
an 3Ware SATA raid. Currently the database is only 16G with about 2
tables with 50+ row, one table 20+ row and a few small
tables. The larger tables get updated about every two hours. The
problem I having with this server (which is in production) is the disk
IO. On the larger tables I'm getting disk IO wait averages of
~70-90%. I've been tweaking the linux kernel as specified in the
PostgreSQL documentations and switched to the deadline
scheduler. Nothing seems to be fixing this. The queries are as
optimized as I can get them. fsync is off in an attempt to help
preformance still nothing. Are there any setting I should be look at
the could improve on this???

Thanks for and help in advance.

Ron

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Ron Wills
At Fri, 15 Jul 2005 13:45:07 -0700,
Joshua D. Drake wrote:
 
 Ron Wills wrote:
  Hello all
  
I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
  an 3Ware SATA raid. 
 
 2 drives?
 4 drives?
 8 drives?

  3 drives raid 5. I don't believe it's the raid. I've tested this by
moving the database to the mirrors software raid where the root is
found and onto the the SATA raid. Neither relieved the IO problems.
  I was also was thinking this could be from the transactional
subsystem getting overloaded? There are several automated processes
that use the DB. Most are just selects, but the data updates and one
that updates the smaller tables that are the heavier queries. On
their own they seem to work ok, (still high IO, but fairly quick). But
if even the simplest select is called during the heavier operation,
then everything goes out through the roof. Maybe there's something I'm
missing here as well?

 RAID 1? 0? 10? 5?
 
 
 Currently the database is only 16G with about 2
  tables with 50+ row, one table 20+ row and a few small
  tables. The larger tables get updated about every two hours. The
  problem I having with this server (which is in production) is the disk
  IO. On the larger tables I'm getting disk IO wait averages of
  ~70-90%. I've been tweaking the linux kernel as specified in the
  PostgreSQL documentations and switched to the deadline
  scheduler. Nothing seems to be fixing this. The queries are as
  optimized as I can get them. fsync is off in an attempt to help
  preformance still nothing. Are there any setting I should be look at
  the could improve on this???
  
  Thanks for and help in advance.
  
  Ron
  
  ---(end of broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 
 
 -- 
 Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
 Managed Services, Shared and Dedicated Hosting
 Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Dan Harris


On Jul 15, 2005, at 2:39 PM, Ron Wills wrote:


Hello all

  I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
an 3Ware SATA raid.


Operating System?  Which file system are you using?  I was having a  
similar problem just a few days ago and learned that ext3 was the  
culprit.


-Dan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Alvaro Herrera
On Fri, Jul 15, 2005 at 03:04:35PM -0600, Ron Wills wrote:

  Ron Wills wrote:
   Hello all
   
 I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
   an 3Ware SATA raid. 
  
   3 drives raid 5. I don't believe it's the raid. I've tested this by
 moving the database to the mirrors software raid where the root is
 found and onto the the SATA raid. Neither relieved the IO problems.

What filesystem is this?

-- 
Alvaro Herrera (alvherre[a]alvh.no-ip.org)
Si no sabes adonde vas, es muy probable que acabes en otra parte.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Jeffrey W. Baker
On Fri, 2005-07-15 at 15:04 -0600, Ron Wills wrote:
 At Fri, 15 Jul 2005 13:45:07 -0700,
 Joshua D. Drake wrote:
  
  Ron Wills wrote:
   Hello all
   
 I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
   an 3Ware SATA raid. 
  
  2 drives?
  4 drives?
  8 drives?
 
   3 drives raid 5. I don't believe it's the raid. I've tested this by
 moving the database to the mirrors software raid where the root is
 found and onto the the SATA raid. Neither relieved the IO problems.

Hard or soft RAID?  Which controller?  Many of the 3Ware controllers
(85xx and 95xx) have extremely bad RAID 5 performance.

Did you take any pgbench or other benchmark figures before you started
using the DB?

-jwb

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Ron Wills
At Fri, 15 Jul 2005 14:00:07 -0700,
Jeffrey W. Baker wrote:
 
 On Fri, 2005-07-15 at 14:39 -0600, Ron Wills wrote:
  Hello all
  
I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
  an 3Ware SATA raid. Currently the database is only 16G with about 2
  tables with 50+ row, one table 20+ row and a few small
  tables. The larger tables get updated about every two hours. The
  problem I having with this server (which is in production) is the disk
  IO. On the larger tables I'm getting disk IO wait averages of
  ~70-90%. I've been tweaking the linux kernel as specified in the
  PostgreSQL documentations and switched to the deadline
  scheduler. Nothing seems to be fixing this. The queries are as
  optimized as I can get them. fsync is off in an attempt to help
  preformance still nothing. Are there any setting I should be look at
  the could improve on this???
 
 Can you please characterize this a bit better?  Send the output of
 vmstat or iostat over several minutes, or similar diagnostic
 information.
 
 Also please describe your hardware more.

Here's a bit of a dump of the system that should be useful.

Processors x2:

vendor_id   : AuthenticAMD
cpu family  : 6
model   : 8
model name  : AMD Athlon(tm) MP 2400+
stepping: 1
cpu MHz : 2000.474
cache size  : 256 KB

MemTotal:   903804 kB

Mandrake 10.0 Linux kernel 2.6.3-19mdk

The raid controller, which is using the hardware raid configuration:

3ware 9000 Storage Controller device driver for Linux v2.26.02.001.
scsi0 : 3ware 9000 Storage Controller
3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xe802, IRQ: 17.
3w-9xxx: scsi0: Firmware FE9X 2.02.00.011, BIOS BE9X 2.02.01.037, Ports: 4.
  Vendor: 3ware Model: Logical Disk 00   Rev: 1.00
  Type:   Direct-Access  ANSI SCSI revision: 00
SCSI device sda: 624955392 512-byte hdwr sectors (319977 MB)
SCSI device sda: drive cache: write back, no read (daft)

This is also on a 3.6 reiser filesystem.

Here's the iostat for 10mins every 10secs. I've removed the stats from
the idle drives to reduce the size of this email.

Linux 2.6.3-19mdksmp (photo_server) 07/15/2005

avg-cpu:  %user   %nice%sys %iowait   %idle
   2.851.532.15   39.52   53.95

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  82.49  4501.73   188.38 1818836580   76110154

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.300.001.00   96.302.40

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  87.80  6159.20   340.00  61592   3400

avg-cpu:  %user   %nice%sys %iowait   %idle
   2.500.001.45   94.351.70

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  89.60  5402.40   320.80  54024   3208

avg-cpu:  %user   %nice%sys %iowait   %idle
   1.000.101.35   97.550.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 105.20  5626.40   332.80  56264   3328

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.400.001.00   87.40   11.20

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  92.61  4484.32   515.48  44888   5160

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.450.001.00   92.665.89

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  89.10  4596.00   225.60  45960   2256

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.300.000.80   96.302.60

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  86.49  3877.48   414.01  38736   4136

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.500.001.00   98.150.35

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  97.10  4710.49   405.19  47152   4056

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.350.001.00   98.650.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  93.30  5324.80   186.40  53248   1864

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.400.001.10   96.701.80

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 117.88  5481.72   402.80  54872   4032

avg-cpu:  %user   %nice%sys %iowait   %idle
   0.500.001.05   98.300.15

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 124.00  6081.60   403.20  60816   4032

avg-cpu:  %user   %nice%sys %iowait   %idle
   8.750.002.55   84.464.25

Device:tps   Blk_read/s   Blk_wrtn/s   

Re: [PERFORM] Really bad diskio

2005-07-15 Thread Ron Wills
At Fri, 15 Jul 2005 14:17:34 -0700,
Jeffrey W. Baker wrote:
 
 On Fri, 2005-07-15 at 15:04 -0600, Ron Wills wrote:
  At Fri, 15 Jul 2005 13:45:07 -0700,
  Joshua D. Drake wrote:
   
   Ron Wills wrote:
Hello all

  I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and
an 3Ware SATA raid. 
   
   2 drives?
   4 drives?
   8 drives?
  
3 drives raid 5. I don't believe it's the raid. I've tested this by
  moving the database to the mirrors software raid where the root is
  found and onto the the SATA raid. Neither relieved the IO problems.
 
 Hard or soft RAID?  Which controller?  Many of the 3Ware controllers
 (85xx and 95xx) have extremely bad RAID 5 performance.
 
 Did you take any pgbench or other benchmark figures before you started
 using the DB?

  No, unfortunatly, I'm more or less just the developer for the
automation systems and admin the system to keep everything going. I
have very little say in the hardware used and I don't have any
physical access to the machine, it's found a province over :P.
  But, for what the system, this IO seems unreasonable. I run
development on a 1.4Ghz Athlon, Gentoo system, with no raid and I
can't reproduce this kind of IO :(.

 -jwb

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Jeffrey W. Baker
On Fri, 2005-07-15 at 15:29 -0600, Ron Wills wrote:
 Here's a bit of a dump of the system that should be useful.
 
 Processors x2:
 
 vendor_id   : AuthenticAMD
 cpu family  : 6
 model   : 8
 model name  : AMD Athlon(tm) MP 2400+
 stepping: 1
 cpu MHz : 2000.474
 cache size  : 256 KB
 
 MemTotal:   903804 kB
 
 Mandrake 10.0 Linux kernel 2.6.3-19mdk
 
 The raid controller, which is using the hardware raid configuration:
 
 3ware 9000 Storage Controller device driver for Linux v2.26.02.001.
 scsi0 : 3ware 9000 Storage Controller
 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xe802, IRQ: 17.
 3w-9xxx: scsi0: Firmware FE9X 2.02.00.011, BIOS BE9X 2.02.01.037, Ports: 4.
   Vendor: 3ware Model: Logical Disk 00   Rev: 1.00
   Type:   Direct-Access  ANSI SCSI revision: 00
 SCSI device sda: 624955392 512-byte hdwr sectors (319977 MB)
 SCSI device sda: drive cache: write back, no read (daft)
 
 This is also on a 3.6 reiser filesystem.
 
 Here's the iostat for 10mins every 10secs. I've removed the stats from
 the idle drives to reduce the size of this email.
 
 Linux 2.6.3-19mdksmp (photo_server)   07/15/2005
 
 avg-cpu:  %user   %nice%sys %iowait   %idle
2.851.532.15   39.52   53.95
 
 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda  82.49  4501.73   188.38 1818836580   76110154
 
 avg-cpu:  %user   %nice%sys %iowait   %idle
0.300.001.00   96.302.40
 
 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda  87.80  6159.20   340.00  61592   3400

These I/O numbers are not so horrible, really.  100% iowait is not
necessarily a symptom of misconfiguration.  It just means you are disk
limited.  With a database 20 times larger than main memory, this is no
surprise.

If I had to speculate about the best way to improve your performance, I
would say:

1a) Get a better RAID controller.  The 3ware hardware RAID5 is very bad.
1b) Get more disks.
2) Get a (much) newer kernel.
3) Try XFS or JFS.  Reiser3 has never looked good in my pgbench runs

By the way, are you experiencing bad application performance, or are you
just unhappy with the iostat figures?

Regards,
jwb


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Really bad diskio

2005-07-15 Thread Ron Wills
At Fri, 15 Jul 2005 14:53:26 -0700,
Jeffrey W. Baker wrote:
 
 On Fri, 2005-07-15 at 15:29 -0600, Ron Wills wrote:
  Here's a bit of a dump of the system that should be useful.
  
  Processors x2:
  
  vendor_id   : AuthenticAMD
  cpu family  : 6
  model   : 8
  model name  : AMD Athlon(tm) MP 2400+
  stepping: 1
  cpu MHz : 2000.474
  cache size  : 256 KB
  
  MemTotal:   903804 kB
  
  Mandrake 10.0 Linux kernel 2.6.3-19mdk
  
  The raid controller, which is using the hardware raid configuration:
  
  3ware 9000 Storage Controller device driver for Linux v2.26.02.001.
  scsi0 : 3ware 9000 Storage Controller
  3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xe802, IRQ: 
  17.
  3w-9xxx: scsi0: Firmware FE9X 2.02.00.011, BIOS BE9X 2.02.01.037, Ports: 4.
Vendor: 3ware Model: Logical Disk 00   Rev: 1.00
Type:   Direct-Access  ANSI SCSI revision: 00
  SCSI device sda: 624955392 512-byte hdwr sectors (319977 MB)
  SCSI device sda: drive cache: write back, no read (daft)
  
  This is also on a 3.6 reiser filesystem.
  
  Here's the iostat for 10mins every 10secs. I've removed the stats from
  the idle drives to reduce the size of this email.
  
  Linux 2.6.3-19mdksmp (photo_server) 07/15/2005
  
  avg-cpu:  %user   %nice%sys %iowait   %idle
 2.851.532.15   39.52   53.95
  
  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sda  82.49  4501.73   188.38 1818836580   76110154
  
  avg-cpu:  %user   %nice%sys %iowait   %idle
 0.300.001.00   96.302.40
  
  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sda  87.80  6159.20   340.00  61592   3400
 
 These I/O numbers are not so horrible, really.  100% iowait is not
 necessarily a symptom of misconfiguration.  It just means you are disk
 limited.  With a database 20 times larger than main memory, this is no
 surprise.
 
 If I had to speculate about the best way to improve your performance, I
 would say:
 
 1a) Get a better RAID controller.  The 3ware hardware RAID5 is very bad.
 1b) Get more disks.
 2) Get a (much) newer kernel.
 3) Try XFS or JFS.  Reiser3 has never looked good in my pgbench runs

Not good news :(. I can't change the hardware, hopefully a kernel
update and XFS of JFS will make an improvement. I was hoping for
software raid (always has worked well), but the client didn't feel
conforable with it :P.
 
 By the way, are you experiencing bad application performance, or are you
 just unhappy with the iostat figures?

  It's affecting the whole system. It is sending the load averages
through the roof (from 4 to 12) and processes that would take only a
few minutes starts going over an hour, until it clears up. Well, I
guess I'll have to drum up some more programming magic... and I'm
starting to run out of tricks... I love my job some day :$
 
 Regards,
 jwb
 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings