Re: [PERFORM] Configuration Recommendations

Jan Nielsen Tue, 24 Apr 2012 22:08:41 -0700

On Tue, Apr 24, 2012 at 1:32 PM, Shaun Thomas <stho...@peak6.com> wrote:


> On 04/23/2012 09:56 PM, Jan Nielsen wrote:
>
>  The new hardware for the 50GB PG 9.0 machine is:
>> * 24 cores across 2 sockets
>> * 64 GB RAM
>> * 10 x 15k SAS drives on SAN
>> * 1 x 15k SAS drive local
>> * CentOS 6.2 (2.6.32 kernel)
>>
>
> This is a pretty good build. Nice and middle-of-the-road for current
> hardware. I think it's probably relevant what your "24 cores across 2
> sockets" are, though. Then again, based on the 24-cores, I have to assume
> you've got hex-core Xeons of some sort, with hyperthreading. That suggests
> a higher end Sandy Bridge Xeon, like the X5645 or higher. If that's the
> case, you're in good hands.
>

The processors are Intel(R) Xeon(R) CPU X5650 @ 2.67GHz.


> As a note, though... make sure you enable Turbo and other performance
> settings (disable power-down of unused CPUs, etc) in the BIOS when setting
> this up. We found that the defaults for the CPUs did not allow processor
> scaling, and it was far too aggressive in cycling down cores, such that
> cycling them back up had a non-zero cost. We saw roughly a 20% improvement
> by forcing the CPUs into full online performance mode.


Is there a way to tell what the BIOS power-down settings are for the cores
from the CLI?


> We are considering the following drive allocations:
>
>>
>> * 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG data
>> * 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG indexes
>> * 2 x 15k SAS drives, XFS, RAID 1 on SAN for PG xlog
>> * 1 x 15k SAS drive, XFS, on local storage for OS
>>
>
> Please don't do this. If you have the system you just described, give
> yourself an 8x RAID10, and the 2x RAID1. I've found that your indexes will
> generally be about 1/3 to 1/2 the total sixe of your database. So, not only
> does your data partition lose read spindles, but you've wasted 1/2 to 2/3s
> of your active drive space. This may not be a concern based on your data
> growth curves, but it could be.
>

After reading Richard Foote's articles that Robert Klemme referenced in the
previous post, I'm convinced.


> In addition, add another OS drive and put it into a RAID-1. If you have
> server-class hardware, you'll want that extra drive. I'm frankly surprised
> you were even able to acquire a dual Xeon class server without a RAID-1 for
> OS data by default.
>

Agreed.


> I'm not sure if you've done metrics or not, but XFS performance is highly
> dependent on your init and mount options. I can give you some guidelines
> there, but one of the major changes is that the Linux 3.X kernels have some
> impressive performance improvements you won't see using CentOS 6.2.
> Metadata in particular has undergone a massive upgrade that drastically
> enhances its parallel scalability on metadata modifications.
>

Alas, a 3.x Linux kernel would be nice but I'm stuck with CentOS 6.2 on
2.6.32. I would very much appreciate any guidelines you can provide.


> If possible, you might consider the new Ubuntu 12.04 LTS that's coming out
> soon. It should have the newer XFS performance. If not, consider injecting
> a newer kernel to the CentOS 6.2 install. And again, testing is the only
> way to know for sure.
>
> And test with pgbench, if possible. I used this to get our XFS init and
> mount options, along with other OS/kernel settings.


Yes; that does seem important. I found this:


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/xfs.txt;hb=HEAD

which and while I was planning to set 'noatime', I'm a bit stumped on most
of the rest. Anyone with comparable hardware willing to share their
settings as a starting point for my testing?


> You can have very different performance metrics from dd/bonnie than an
> actual use pattern from real DB usage. As a hint, before you run any of
> these tests, both write a '3' to /proc/sys/vm/drop_caches, and restart your
> PG instance. You want to test your drives, not your memory. :)
>
>
>  kernel.shmall = 4,294,967,296 (commas added for clarity)
>> kernel.shmax = 68,719,476,736 (commas added for clarity)
>> kernel.sem = 250 32000 32 128
>> vm.swappiness = 0
>> dirty_ratio = 10
>> dirty_background_ratio = 5
>>
>
> Good. Though you might consider lowering dirty_background_ratio. At that
> setting, it won't even try to write out data until you have about 3GB of
> dirty pages. Even high-end disk controllers only have 1GB of local
> capacitor-backed cache. If you really do have a good SAN, it probably has
> more than that, but try to induce a high-turnover database test to see what
> happens during heavy IO. Like, a heavy long-running PG-bench should invoke
> several checkpoints and also flood the local write cache. When that
> happens, monitor /proc/meminfo. Like this:
>
> grep -A1 Dirty /proc/meminfo
>
> That will tell you how much of your memory is dirty, but the 'Writeback'
> entry is what you care about. If you see that as a non-zero value for more
> than one consecutive check, you've saturated your write bandwidth to the
> point performance will suffer. But the only way you can really know any of
> this is with testing. Some SANs scale incredibly well to large pool
> flushes, and others don't.
>
> Also, make iostat your friend. Particularly with the -x option. During
> your testing, keep one of these running in the background for the devices
> on your SAN. Watch your %util column in particular. Graph it, if you can.
> You can almost build a complete performance profile for different workloads
> before you put a single byte of real data on this hardware.
>
>
>  If there are "obviously correct" choices in PG configuration, this would
>> be tremendously helpful information to me. I'm planning on using pgbench
>> to test the configuration options.
>>
>
> You sound like you've read up on this quite a bit. Greg's book is a very
> good thing to have and learn from. It'll cover all the basics about the
> postgresql.conf file. I don't see how I could add much to that, so just pay
> attention to what he says. :)
>

I'm doing my best but the numbers will tell the story. :-)

Thanks for your review and feedback, Shaun.


Cheers,

Jan



>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> stho...@peak6.com
>
> ______________________________**________________
>
> See 
> http://www.peak6.com/email_**disclaimer/<http://www.peak6.com/email_disclaimer/>for
>  terms and conditions related to this email
>

Re: [PERFORM] Configuration Recommendations

Reply via email to