On Tue, May 1, 2012 at 1:45 PM, Gary <gdri...@gmail.com> wrote:

> The idea of benchmarking -- IMHO -- is to vaguely attempt to reproduce
> real world loads. Obviously, this is an imperfect science but if
> you're going to be writing a lot of small files (e.g. NNTP or email
> servers used to be a good real world example) then you're going to
> want to benchmark for that. If you're going to want to write a bunch
> of huge files (are you writing a lot of 16GB files?) then you'll want
> to test for that. Caching anywhere in the pipeline is important for
> benchmarks because you aren't going to turn off a cache or remove RAM
> in production are you?

    It also depends on what you are going to be tuning. When I needed
to decided on a zpool configuration (# of vdev's, type of vdev, etc.)
I did not want the effect of the cache "hiding" the underlying
performance limitations of the physical drive configuration. In that
case I either needed to use a very large test data set or reduce the
size (effect) of the RAM. By limiting the ARC to 2 GB for my test, I
was able to relatively easily quantify the performance differences
between the various configurations. Once we picked a configuration, we
let the ARC take as much RAM as it wanted and re-ran the benchmark to
see what kind of real world performance we would get. Unfortunately,
we could not easily simulate 400 real world people sitting at desktops
accessing the data. So our ARC limited benchmark was effectively a
"worst case" number and the full ARC the "best case". The real world,
as usual, fell somewhere in between.

   Finding a benchmark tool that matches _my_ work load is why I have
started kludging together my own.

Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
zfs-discuss mailing list

Reply via email to