Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Yeb Havinga

On 2011-05-09 22:32, Chris Hoover wrote:


The issue we are running into is how do we benchmark this server, 
specifically, how do we get valid benchmarks for the Fusion IO card? 
 Normally to eliminate the cache effect, you run iozone and other 
benchmark suites at 2x the ram.  However, we can't do that due to 2TB 
 1.3TB.


So, does anyone have any suggestions/experiences in benchmarking 
storage when the storage is smaller then 2x memory?


Oracle's Orion test tool has a configurable cache size parameter - it's 
a separate download and specifically written to benchmark database oltp 
and olap like io patterns, see 
http://www.oracle.com/technetwork/topics/index-089595.html


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Claudio Freire
On Mon, May 9, 2011 at 10:32 PM, Chris Hoover revo...@gmail.com wrote:
 So, does anyone have any suggestions/experiences in benchmarking storage
 when the storage is smaller then 2x memory?

Try writing a small python script (or C program) to mmap a large chunk
of memory, with MAP_LOCKED, this will keep it in RAM and avoid that
RAM from being used for caching.
The script should touch the memory at least once to avoid overcommit
from getting smart on you.

I think only root can lock memory, so that small program would have to
run as root.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Jeff


On May 9, 2011, at 4:50 PM, Merlin Moncure wrote:


hm, if it was me, I'd write a small C program that just jumped
directly on the device around and did random writes assuming it wasn't
formatted.  For sequential read, just flush caches and dd the device
to /dev/null.  Probably someone will suggest better tools though.

merlin



shameless plug
http://pgfoundry.org/projects/pgiosim

it is a small program we use to beat the [bad word] out of io systems.
it randomly seeks, does an 8kB read, optionally writes it out (and  
optionally fsyncing) and reports how fast it is going (you need to  
watch iostat output as well so you can see actual physical tps without  
hte OS cache interfering).


It goes through regular read  write calls like PG (I didn't want to  
bother with junk like o_direct  friends).


it is also now multithreaded so you can fire up a bunch of random read  
threads (rather than firing up a bunch of pgiosims in parallel) and  
see how things scale up.



--
Jeff Trout j...@jefftrout.com
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Cédric Villemain
2011/5/10 Greg Smith g...@2ndquadrant.com:
 On 05/09/2011 11:13 PM, Shaun Thomas wrote:

 Take a look at /proc/sys/vm/dirty_ratio and
 /proc/sys/vm/dirty_background_ratio if you have an older Linux system, or
 /proc/sys/vm/dirty_bytes, and /proc/sys/vm/dirty_background_bytes with a
 newer one.
 On older systems for instance, those are set to 40 and 20 respectively
 (recent kernels cut these in half).

 1/4 actually; 10% and 5% starting in kernel 2.6.22.  The main sources of
 this on otherwise new servers I see are RedHat Linux RHEL5 systems  running
 2.6.18.  But as you say, even the lower defaults of the newer kernels can be
 way too much on a system with lots of RAM.

one can experiment writeback storm with this script from Chris Mason,
under GPLv2:
http://oss.oracle.com/~mason/fsync-tester.c

You need to tweak it a bit, AFAIR, this  #define SIZE (32768*32) must
be reduced to be equal to 8kb blocks if you want similar to pg write
pattern.

The script does a big file, many small fsync, writing on both. Please,
see http://www.spinics.net/lists/linux-ext4/msg24308.html

It is used as a torture program by some linuxfs-hackers and may be
useful for the OP on his large server to validate hardware and kernel.



 The main downside I've seen of addressing this by using a kernel with
 dirty_bytes and dirty_background_bytes is that VACUUM can slow down
 considerably.  It really relies on the filesystem having a lot of write
 cache to perform well.  In many cases people are happy with VACUUM
 throttling if it means nasty I/O spikes go away, but the trade-offs here are
 still painful at times.

 --
 Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
 PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Greg Smith

Greg Smith wrote:

On 05/09/2011 11:13 PM, Shaun Thomas wrote:
Take a look at /proc/sys/vm/dirty_ratio and 
/proc/sys/vm/dirty_background_ratio if you have an older Linux 
system, or /proc/sys/vm/dirty_bytes, and 
/proc/sys/vm/dirty_background_bytes with a newer one.
On older systems for instance, those are set to 40 and 20 
respectively (recent kernels cut these in half).


1/4 actually; 10% and 5% starting in kernel 2.6.22.  The main sources 
of this on otherwise new servers I see are RedHat Linux RHEL5 systems  
running 2.6.18.  But as you say, even the lower defaults of the newer 
kernels can be way too much on a system with lots of RAM.


Ugh...we're both right, sort of.  2.6.22 dropped them to 5/10:  
http://kernelnewbies.org/Linux_2_6_22 as I said.  But on the new 
Scientific Linux 6 box I installed yesterday, they're at 10/20--as you 
suggested.


Can't believe I'm going to need a table by kernel version and possibly 
distribution to keep this all straight now, what a mess.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Benchmarking a large server

2011-05-09 Thread Chris Hoover
I've got a fun problem.

My employer just purchased some new db servers that are very large.  The
specs on them are:

4 Intel X7550 CPU's (32 physical cores, HT turned off)
1 TB Ram
1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10)
3TB Sas Array (48 15K 146GB spindles)

The issue we are running into is how do we benchmark this server,
specifically, how do we get valid benchmarks for the Fusion IO card?
 Normally to eliminate the cache effect, you run iozone and other benchmark
suites at 2x the ram.  However, we can't do that due to 2TB  1.3TB.

So, does anyone have any suggestions/experiences in benchmarking storage
when the storage is smaller then 2x memory?

Thanks,

Chris


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Merlin Moncure
On Mon, May 9, 2011 at 3:32 PM, Chris Hoover revo...@gmail.com wrote:
 I've got a fun problem.
 My employer just purchased some new db servers that are very large.  The
 specs on them are:
 4 Intel X7550 CPU's (32 physical cores, HT turned off)
 1 TB Ram
 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10)
 3TB Sas Array (48 15K 146GB spindles)

my GOODNESS!  :-D.  I mean, just, wow.

 The issue we are running into is how do we benchmark this server,
 specifically, how do we get valid benchmarks for the Fusion IO card?
  Normally to eliminate the cache effect, you run iozone and other benchmark
 suites at 2x the ram.  However, we can't do that due to 2TB  1.3TB.
 So, does anyone have any suggestions/experiences in benchmarking storage
 when the storage is smaller then 2x memory?

hm, if it was me, I'd write a small C program that just jumped
directly on the device around and did random writes assuming it wasn't
formatted.  For sequential read, just flush caches and dd the device
to /dev/null.  Probably someone will suggest better tools though.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread David Boreham



hm, if it was me, I'd write a small C program that just jumped
directly on the device around and did random writes assuming it wasn't
formatted.  For sequential read, just flush caches and dd the device
to /dev/null.  Probably someone will suggest better tools though.
I have a program I wrote years ago for a purpose like this. One of the 
things it can
do is write to the filesystem at the same time as dirtying pages in a 
large shared
or non-shared memory region. The idea was to emulate the behavior of a 
database
reasonably accurately. Something like bonnie++ would probably be a good 
starting

point these days though.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Ben Chobot
On May 9, 2011, at 1:32 PM, Chris Hoover wrote:

 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10)

Be careful here. What if the entire card hiccups, instead of just a device on 
it? (We've had that happen to us before.) Depending on how you've done your 
raid 10, either all your parity is gone or your data is.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Shaun Thomas

On 05/09/2011 03:32 PM, Chris Hoover wrote:


So, does anyone have any suggestions/experiences in benchmarking storage
when the storage is smaller then 2x memory?


We had a similar problem when benching our FusionIO setup. What I did 
was write a script that cleared out the Linux system cache before every 
iteration of our pgbench tests. You can do that easily with:


echo 3  /proc/sys/vm/drop_caches

Executed as root.

Then we ran short (10, 20, 30, 40 clients, 10,000 transactions each) 
pgbench tests, resetting the cache and the DB after every iteration. It 
was all automated in a script, so it wasn't too much work.


We got (roughly) a 15x speed improvement over a 6x15k RPM RAID-10 setup 
on the same server, with no other changes. This was definitely 
corroborated after deployment, when our frequent periods of 100% disk IO 
utilization vanished and were replaced by occasional 20-30% spikes. Even 
that's an unfair comparison in favor of the RAID, because we added DRBD 
to the mix because you can't share a PCI card between two servers.


If you do have two 1.3TB Duo cards in a 4x640GB RAID-10, you should get 
even better read times than we did.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Merlin Moncure
On Mon, May 9, 2011 at 3:59 PM, David Boreham david_l...@boreham.org wrote:

 hm, if it was me, I'd write a small C program that just jumped
 directly on the device around and did random writes assuming it wasn't
 formatted.  For sequential read, just flush caches and dd the device
 to /dev/null.  Probably someone will suggest better tools though.

 I have a program I wrote years ago for a purpose like this. One of the
 things it can
 do is write to the filesystem at the same time as dirtying pages in a large
 shared
 or non-shared memory region. The idea was to emulate the behavior of a
 database
 reasonably accurately. Something like bonnie++ would probably be a good
 starting
 point these days though.

The problem with bonnie++ is that the results aren't valid, especially
the read tests.  I think it refuses to even run unless you set special
switches.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread David Boreham

On 5/9/2011 3:11 PM, Merlin Moncure wrote:

The problem with bonnie++ is that the results aren't valid, especially
the read tests.  I think it refuses to even run unless you set special
switches.


I only care about writes ;)

But definitely, be careful with the tools. I tend to prefer small 
programs written in house myself,

and of course simply running your application under a synthesized load.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Greg Smith

On 05/09/2011 04:32 PM, Chris Hoover wrote:
So, does anyone have any suggestions/experiences in benchmarking 
storage when the storage is smaller then 2x memory? 


If you do the Linux trick to drop its caches already mentioned, you can 
start a database test with zero information in memory.  In that 
situation, whether or not everything could fit in RAM doesn't matter as 
much; you're starting with none of it in there.  In that case, you can 
benchmark things without having twice as much disk space.  You just have 
to recognize that the test become less useful the longer you run it, and 
measure the results accordingly.


A test starting from that state will start out showing you random I/O 
speed on the device, slowing moving toward in-memory cached speeds as 
the benchmark runs for a while.  You really need to capture the latency 
data for every transaction and graph it over time to make any sense of 
it.  If you look at Using and Abusing pgbench at 
http://projects.2ndquadrant.com/talks , starting on P33 I have several 
slides showing such a test, done with pgbench and pgbench-tools.  I 
added a quick hack to pgbench-tools around then to make it easier to run 
this specific type of test, but to my knowledge no one else has ever 
used it.  (I've had talks about PostgreSQL in my yard that were better 
attended than that session, for which I blame Jonah Harris for doing a 
great talk in the room next door concurrent with it.)


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Cédric Villemain
2011/5/9 Chris Hoover revo...@gmail.com:
 I've got a fun problem.
 My employer just purchased some new db servers that are very large.  The
 specs on them are:
 4 Intel X7550 CPU's (32 physical cores, HT turned off)
 1 TB Ram
 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10)
 3TB Sas Array (48 15K 146GB spindles)
 The issue we are running into is how do we benchmark this server,
 specifically, how do we get valid benchmarks for the Fusion IO card?
  Normally to eliminate the cache effect, you run iozone and other benchmark
 suites at 2x the ram.  However, we can't do that due to 2TB  1.3TB.
 So, does anyone have any suggestions/experiences in benchmarking storage
 when the storage is smaller then 2x memory?

You can reduce the memory size on server boot.
If you use linux, you can add a 'mem=512G' to your boot time
parameters. (maybe it supports only K or M, so 512*1024...)

 Thanks,
 Chris



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Craig James


2011/5/9 Chris Hooverrevo...@gmail.com:


I've got a fun problem.
My employer just purchased some new db servers that are very large.  The
specs on them are:
4 Intel X7550 CPU's (32 physical cores, HT turned off)
1 TB Ram
1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10)
3TB Sas Array (48 15K 146GB spindles)
The issue we are running into is how do we benchmark this server,
specifically, how do we get valid benchmarks for the Fusion IO card?
  Normally to eliminate the cache effect, you run iozone and other benchmark
suites at 2x the ram.  However, we can't do that due to 2TB  1.3TB.
So, does anyone have any suggestions/experiences in benchmarking storage
when the storage is smaller then 2x memory?

Maybe this is a dumb question, but why do you care?  If you have 1TB RAM and just a 
little more actual disk space, it seems like your database will always be cached in 
memory anyway.  If you eliminate the cach effect, won't the benchmark 
actually give you the wrong real-life results?

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread david

On Mon, 9 May 2011, David Boreham wrote:


On 5/9/2011 6:32 PM, Craig James wrote:
Maybe this is a dumb question, but why do you care?  If you have 1TB RAM 
and just a little more actual disk space, it seems like your database will 
always be cached in memory anyway.  If you eliminate the cach effect, 
won't the benchmark actually give you the wrong real-life results?


The time it takes to populate the cache from a cold start might be important.


you may also have other processes that will be contending with the disk 
buffers for memory (for that matter, postgres may use a significant amount 
of that memory as it's producing it's results)


David Lang

Also, if it were me, I'd be wanting to check for weird performance behavior 
at this memory scale.
I've seen cases in the past where the VM subsystem went bananas because the 
designers
and testers of its algorithms never considered the physical memory size we 
deployed.


How many times was the kernel tested with this much memory, for example ? 
(never??)







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Shaun Thomas
 How many times was the kernel tested with this much memory, for example
 ? (never??)

This is actually *extremely* relevant.

Take a look at /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio 
if you have an older Linux system, or /proc/sys/vm/dirty_bytes, and 
/proc/sys/vm/dirty_background_bytes with a newer one.

On older systems for instance, those are set to 40 and 20 respectively (recent 
kernels cut these in half). That's significant because ratio is the 
*percentage* of memory that can remain dirty before causing async, and 
background_ratio tells it when it should start writing in the background to 
avoid hitting that higher and much more disruptive number. This is another 
source of IO that can be completely independent of the checkpoint spikes that 
long plagued PostgreSQL versions prior to 8.3.

With that much memory (1TB!), that's over 100GB of dirty memory before it 
starts writing that out to disk even with the newer conservative settings. We 
had to tweak and test for days to find good settings for these, and our servers 
only have 96GB of RAM. You also have to consider, as fast as the FusionIO 
drives are, they're still NVRAM, which has write-amplification issues. How fast 
do you think it can commit 100GB of dirty memory to disk? Even with a 
background setting of 1%, that's 10GB on your system.

That means you'd need to use a very new kernel so you can utilize the 
dirty_bytes and dirty_background_bytes settings so you can force those settings 
into more sane levels to avoid unpredictable several-minute long asyncs. I'm 
not sure how much testing Linux sees on massive hardware like that, but that's 
just one hidden danger of not properly benchmarking the server and just 
thinking 1TB of memory and caching the entire dataset is only an improvement.

--
Shaun Thomas
Peak6 | 141 W. Jackson Blvd. | Suite 800 | Chicago, IL 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-09 Thread Greg Smith

On 05/09/2011 11:13 PM, Shaun Thomas wrote:
Take a look at /proc/sys/vm/dirty_ratio and 
/proc/sys/vm/dirty_background_ratio if you have an older Linux system, 
or /proc/sys/vm/dirty_bytes, and /proc/sys/vm/dirty_background_bytes 
with a newer one.

On older systems for instance, those are set to 40 and 20 respectively (recent 
kernels cut these in half).


1/4 actually; 10% and 5% starting in kernel 2.6.22.  The main sources of 
this on otherwise new servers I see are RedHat Linux RHEL5 systems  
running 2.6.18.  But as you say, even the lower defaults of the newer 
kernels can be way too much on a system with lots of RAM.


The main downside I've seen of addressing this by using a kernel with 
dirty_bytes and dirty_background_bytes is that VACUUM can slow down 
considerably.  It really relies on the filesystem having a lot of write 
cache to perform well.  In many cases people are happy with VACUUM 
throttling if it means nasty I/O spikes go away, but the trade-offs here 
are still painful at times.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance