People recommend LSI MegaRAID controllers on here regularly, but I have found that they do not work that well. I have bonnie++ numbers that show the controller is not performing anywhere near the disk's saturation level in a simple RAID 1 on RedHat Linux EL4 on two seperate machines provided by two different hosting companies. In one case I asked them to replace the card, and the numbers got a bit better, but still not optimal.
LSI MegaRAID has proved to be a bit of a disapointment. I have seen better numbers from the HP SmartArray 6i, and from 3ware cards with 7200RPM SATA drives. for the output: http://www.infoconinc.com/test/bonnie++.html (the first line is a six drive RAID 10 on a 3ware 9500S, the next three are all RAID 1s on LSI MegaRAID controllers, verified by lspci). Alex. On 12/4/06, Greg Smith <[EMAIL PROTECTED]> wrote:
On Thu, 30 Nov 2006, Carlos H. Reimer wrote: > I would like to discover how much cache is present in > the controller, how can I find this value from Linux? As far as I know there is no cache on an Adaptec 39320. The write-back cache Linux was reporting on was the one in the drives, which is 8MB; see http://www.seagate.com/cda/products/discsales/enterprise/tech/1,1593,541,00.html Be warned that running your database with the combination of an uncached controller plus disks with write caching is dangerous to your database integrity. There is a common problem with the Linux driver for this card (aic7902) where it enters what's they're calling an "Infinite Interrupt Loop". That seems to match your readings: > Here is a typical iostat -x: > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s > sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80 56.80 > avgrq-sz avgqu-sz await svctm %util > 22.82 570697.50 10.59 147.06 100.00 An avgqu-sz of 570697.50 is extremely large. That explains why the utilization is 100%, because there's a massive number of I/O operations queued up that aren't getting flushed out. The read and write data says these drives are barely doing anything, as 20kB/s and 57KB/s are practically idle; they're not even remotely close to saturated. See http://lkml.org/lkml/2005/10/1/47 for a suggested workaround that may reduce the magnitude of this issue; lower the card's speed to U160 in the BIOS was also listed as a useful workaround. You might get better results by upgrading to a newer Linux kernel, and just rebooting to clear out the garbage might help if you haven't tried that yet. On the pessimistic side, other people reporting issues with this controller are: http://lkml.org/lkml/2005/12/17/55 http://www.ussg.iu.edu/hypermail/linux/kernel/0512.2/0390.html http://www.linuxforums.org/forum/peripherals-hardware/59306-scsi-hangs-boot.html and even under FreeBSD at http://lists.freebsd.org/pipermail/aic7xxx/2003-August/003973.html This Adaptec card just barely works under Linux, which happens regularly with their controllers, and my guess is that you've run into one of the ways it goes crazy sometimes. I just chuckled when checking http://linux.adaptec.com/ again and noticing they can't even be bothered to keep that server up at all. According to http://www.adaptec.com/en-US/downloads/linux_source/linux_source_code?productId=ASC-39320-R&dn=Adaptec+SCSI+Card+39320-R the driver for your card is "*minimally tested* for Linux Kernel v2.6 on all platforms." Adaptec doesn't care about Linux support on their products; if you want a SCSI controller that actually works under Linux, get an LSI MegaRAID. If this were really a Postgres problem, I wouldn't expect %iowait=1.10. Were the database engine waiting to read/write data, that number would be dramatically higher. Whatever is generating all these I/O requests, it's not waiting for them to complete like the database would be. Besides the driver problems that I'm very suspicious of, I'd suspect a runaway process writing garbage to the disks might also cause this behavior. > Ive taken a look in the /var/log/messages and found some temperature > messages about the disk drives: > Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature changed 2 > Celsius to 51 Celsius since last report > Can this temperature influence in the performance? That's close to the upper tolerance for this drive (55 degrees), which means the drive is being cooked and will likely wear out quickly. But that won't slow it down, and you'd get much scarier messages out of smartd if the drives had a real problem. You should improve cooling in this case if you want to drives to have a healthy life, odds are low this is relevant to your performance issue though. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate