> Without knowing the I/O pattern, saying 500 MB/sec. is meaningless.
> Achieving 500MB/sec. with 8KB files and lots of random accesses is really
> hard, even with 20 HDDs. Achieving 500MB/sec. of sequential streaming of
> 100MB+ files is much easier.
> . . .
> For ZFS, performance is proportional to the number of vdevs NOT the
> number of drives or the number of drives per vdev. See https://
> Xc for some testing I did a while back. I did not test sequential read as
> that is not part of our workload.
> . . .
> I understand why the read performance scales with the number of vdevs,
> but I have never really understood _why_ it does not also scale with the
> number of drives in each vdev. When I did my testing with 40 dribves, I
> expected similar READ performance regardless of the layout, but that was NOT
> the case.
In your first paragraph you make the important point that "performance"
is too ambiguous in this discussion. Yet in the 2nd & 3rd paragraphs above,
you go back to using "performance" in its ambiguous form. I assume that
by "performance" you are mostly focussing on random-read performance....
My experience is that sequential read performance _does_ scale with the number
of drives in each vdev. Both sequential and random write performance also
scales in this manner (note that ZFS tends to save up small, random writes
and flush them out in a sequential batch).
Small, random read performance does not scale with the number of drives in each
raidz vdev because of the dynamic striping. In order to read a single
logical block, ZFS has to read all the segments of that logical block, which
have been spread out across multiple drives, in order to validate the checksum
before returning that logical block to the application. This is why a single
vdev's random-read performance is equivalent to the random-read performance of
a single drive.
> The recommendation is to not go over 8 or so drives per vdev, but that is
> a performance issue NOT a reliability one. I have also not been able to
> duplicate others observations that 2^N drives per vdev is a magic number (4,
> 8, 16, etc). As you can see from the above, even a 40 drive vdev works and is
> reliable, just (relatively) slow :-)
Again, the "performance issue" you describe above is for the random-read
case, not sequential. If you rarely experience small-random-read workloads,
then raidz* will perform just fine. We often see 2000 MBytes/sec sequential
read (and write) performance on a raidz3 pool consisting of 3, 12-disk vdev's
(using 2TB drives).
However, when a disk fails and must be resilvered, that's when you will
run into the slow performance of the small, random read workload. This
is why I use raidz2 or raidz3 on vdevs consisting of more than 6-7 drives,
especially of the 1TB+ size. That way if it takes 200 hours to resilver,
you've still got a lot of redundancy in place.
zfs-discuss mailing list