On Tue, Apr 24, 2012 at 1:32 PM, Shaun Thomas <stho...@peak6.com> wrote:
> On 04/23/2012 09:56 PM, Jan Nielsen wrote: > > The new hardware for the 50GB PG 9.0 machine is: >> * 24 cores across 2 sockets >> * 64 GB RAM >> * 10 x 15k SAS drives on SAN >> * 1 x 15k SAS drive local >> * CentOS 6.2 (2.6.32 kernel) >> > > This is a pretty good build. Nice and middle-of-the-road for current > hardware. I think it's probably relevant what your "24 cores across 2 > sockets" are, though. Then again, based on the 24-cores, I have to assume > you've got hex-core Xeons of some sort, with hyperthreading. That suggests > a higher end Sandy Bridge Xeon, like the X5645 or higher. If that's the > case, you're in good hands. > The processors are Intel(R) Xeon(R) CPU X5650 @ 2.67GHz. > As a note, though... make sure you enable Turbo and other performance > settings (disable power-down of unused CPUs, etc) in the BIOS when setting > this up. We found that the defaults for the CPUs did not allow processor > scaling, and it was far too aggressive in cycling down cores, such that > cycling them back up had a non-zero cost. We saw roughly a 20% improvement > by forcing the CPUs into full online performance mode. Is there a way to tell what the BIOS power-down settings are for the cores from the CLI? > We are considering the following drive allocations: > >> >> * 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG data >> * 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG indexes >> * 2 x 15k SAS drives, XFS, RAID 1 on SAN for PG xlog >> * 1 x 15k SAS drive, XFS, on local storage for OS >> > > Please don't do this. If you have the system you just described, give > yourself an 8x RAID10, and the 2x RAID1. I've found that your indexes will > generally be about 1/3 to 1/2 the total sixe of your database. So, not only > does your data partition lose read spindles, but you've wasted 1/2 to 2/3s > of your active drive space. This may not be a concern based on your data > growth curves, but it could be. > After reading Richard Foote's articles that Robert Klemme referenced in the previous post, I'm convinced. > In addition, add another OS drive and put it into a RAID-1. If you have > server-class hardware, you'll want that extra drive. I'm frankly surprised > you were even able to acquire a dual Xeon class server without a RAID-1 for > OS data by default. > Agreed. > I'm not sure if you've done metrics or not, but XFS performance is highly > dependent on your init and mount options. I can give you some guidelines > there, but one of the major changes is that the Linux 3.X kernels have some > impressive performance improvements you won't see using CentOS 6.2. > Metadata in particular has undergone a massive upgrade that drastically > enhances its parallel scalability on metadata modifications. > Alas, a 3.x Linux kernel would be nice but I'm stuck with CentOS 6.2 on 2.6.32. I would very much appreciate any guidelines you can provide. > If possible, you might consider the new Ubuntu 12.04 LTS that's coming out > soon. It should have the newer XFS performance. If not, consider injecting > a newer kernel to the CentOS 6.2 install. And again, testing is the only > way to know for sure. > > And test with pgbench, if possible. I used this to get our XFS init and > mount options, along with other OS/kernel settings. Yes; that does seem important. I found this: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/xfs.txt;hb=HEAD which and while I was planning to set 'noatime', I'm a bit stumped on most of the rest. Anyone with comparable hardware willing to share their settings as a starting point for my testing? > You can have very different performance metrics from dd/bonnie than an > actual use pattern from real DB usage. As a hint, before you run any of > these tests, both write a '3' to /proc/sys/vm/drop_caches, and restart your > PG instance. You want to test your drives, not your memory. :) > > > kernel.shmall = 4,294,967,296 (commas added for clarity) >> kernel.shmax = 68,719,476,736 (commas added for clarity) >> kernel.sem = 250 32000 32 128 >> vm.swappiness = 0 >> dirty_ratio = 10 >> dirty_background_ratio = 5 >> > > Good. Though you might consider lowering dirty_background_ratio. At that > setting, it won't even try to write out data until you have about 3GB of > dirty pages. Even high-end disk controllers only have 1GB of local > capacitor-backed cache. If you really do have a good SAN, it probably has > more than that, but try to induce a high-turnover database test to see what > happens during heavy IO. Like, a heavy long-running PG-bench should invoke > several checkpoints and also flood the local write cache. When that > happens, monitor /proc/meminfo. Like this: > > grep -A1 Dirty /proc/meminfo > > That will tell you how much of your memory is dirty, but the 'Writeback' > entry is what you care about. If you see that as a non-zero value for more > than one consecutive check, you've saturated your write bandwidth to the > point performance will suffer. But the only way you can really know any of > this is with testing. Some SANs scale incredibly well to large pool > flushes, and others don't. > > Also, make iostat your friend. Particularly with the -x option. During > your testing, keep one of these running in the background for the devices > on your SAN. Watch your %util column in particular. Graph it, if you can. > You can almost build a complete performance profile for different workloads > before you put a single byte of real data on this hardware. > > > If there are "obviously correct" choices in PG configuration, this would >> be tremendously helpful information to me. I'm planning on using pgbench >> to test the configuration options. >> > > You sound like you've read up on this quite a bit. Greg's book is a very > good thing to have and learn from. It'll cover all the basics about the > postgresql.conf file. I don't see how I could add much to that, so just pay > attention to what he says. :) > I'm doing my best but the numbers will tell the story. :-) Thanks for your review and feedback, Shaun. Cheers, Jan > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 > 312-444-8534 > stho...@peak6.com > > ______________________________**________________ > > See > http://www.peak6.com/email_**disclaimer/<http://www.peak6.com/email_disclaimer/>for > terms and conditions related to this email >