> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of sensille
> 
> The basic idea: the main problem when using a HDD as a ZIL device
> are the cache flushes in combination with the linear write pattern
> of the ZIL. This leads to a whole rotation of the platter after
> each write, because after the first write returns, the head is
> already past the sector that will be written next.
> My idea goes as follows: don't write linearly. Track the rotation
> and write to the position the head will hit next. This might be done
> by a re-mapping layer or integrated into ZFS. This works only because
> ZIL device are basically write-only. Reads from this device will be
> horribly slow.

This is a really interesting idea, but I think you've hurt yourself in the
way you described the problem - and additionally, I was recently corrected
for misusing the terms you just misused (saying "ZIL" != saying "ZIL on
dedicated log device").  So I'll try to clarify what you just said:

The reason why hard drives are less effective as ZIL dedicated log devices
compared to such things as SSD's, is because of the rotation of the hard
drives; the physical time to seek a random block.  There may be a
possibility to use hard drives as dedicated log devices, cheaper than SSD's
with possibly comparable latency, if you can intelligently eliminate the
random seek.  If you have a way to tell the hard drive "Write this data, to
whatever block happens to be available at minimum seek time."

For rough estimates:  Assume the drive is using Zone Density Recording, like
this:
http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm
Suppose you're able to keep your hard drive head on the outer sectors.
Suppose 1000 sectors per track (I have no idea if that's accurate, but at
least according to the above article in the year 2000 it was ballpark
realistic).  Suppose 10krpm.  Then the physical seek time could
theoretically be brought down to as low as 10^-7 seconds.  Of course, that's
not realistic - some sectors may already be used - the electronics
themselves could be a factor - But the point remains, the physical seek time
can be effectively eliminated.  At least in theory.  And that was the year
2000.


> I have done some testing and am quite enthusiastic. If I take a
> decent SAS disk (like the Hitachi Ultrastar C10K300), I can raise
> the synchronous write performance from 166 writes/s to about
> 2000 writes/s (!). 2000 IOPS is more than sufficient for our
> production environment.

Um ... Careful there.  There are many apples, oranges, and bananas to be
compared inaccurately against each other.  When I measure IOPS of physical
disks, with all the caches disabled, I get anywhere from 200 to 2400 for a
single spindle disk (SAS 10k), and I get anywhere from 2000 to 6000 with a
SSD (SATA).  Just depending on the benchmark configuration.  Because ZFS is
doing all sorts of acceleration behind the scenes, which make the results
vary *immensely* from some IOPS number that you look up online.

You've got to be sure you measure something, then change *only one thing*
and measure again, to get a good measurement.  You've got to toggle back and
forth a few times, and see that the results are repeatable.  And *only* then
do you have a solid result.


> Currently I'm implementing a re-mapping driver for this. The
> reason I'm writing to this list is that I'd like to find support
> from the zfs team, find sparring partners to discuss implementation
> details and algorithms and, most important, find testers!

So you believe you can know the drive geometry, the instantaneous head
position, and the next available physical block address in software?  No
need for special hardware?  That's cool.  I hope there aren't any "gotchas"
as-yet undiscovered.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to