> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of sensille > > The basic idea: the main problem when using a HDD as a ZIL device > are the cache flushes in combination with the linear write pattern > of the ZIL. This leads to a whole rotation of the platter after > each write, because after the first write returns, the head is > already past the sector that will be written next. > My idea goes as follows: don't write linearly. Track the rotation > and write to the position the head will hit next. This might be done > by a re-mapping layer or integrated into ZFS. This works only because > ZIL device are basically write-only. Reads from this device will be > horribly slow.
This is a really interesting idea, but I think you've hurt yourself in the way you described the problem - and additionally, I was recently corrected for misusing the terms you just misused (saying "ZIL" != saying "ZIL on dedicated log device"). So I'll try to clarify what you just said: The reason why hard drives are less effective as ZIL dedicated log devices compared to such things as SSD's, is because of the rotation of the hard drives; the physical time to seek a random block. There may be a possibility to use hard drives as dedicated log devices, cheaper than SSD's with possibly comparable latency, if you can intelligently eliminate the random seek. If you have a way to tell the hard drive "Write this data, to whatever block happens to be available at minimum seek time." For rough estimates: Assume the drive is using Zone Density Recording, like this: http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm Suppose you're able to keep your hard drive head on the outer sectors. Suppose 1000 sectors per track (I have no idea if that's accurate, but at least according to the above article in the year 2000 it was ballpark realistic). Suppose 10krpm. Then the physical seek time could theoretically be brought down to as low as 10^-7 seconds. Of course, that's not realistic - some sectors may already be used - the electronics themselves could be a factor - But the point remains, the physical seek time can be effectively eliminated. At least in theory. And that was the year 2000. > I have done some testing and am quite enthusiastic. If I take a > decent SAS disk (like the Hitachi Ultrastar C10K300), I can raise > the synchronous write performance from 166 writes/s to about > 2000 writes/s (!). 2000 IOPS is more than sufficient for our > production environment. Um ... Careful there. There are many apples, oranges, and bananas to be compared inaccurately against each other. When I measure IOPS of physical disks, with all the caches disabled, I get anywhere from 200 to 2400 for a single spindle disk (SAS 10k), and I get anywhere from 2000 to 6000 with a SSD (SATA). Just depending on the benchmark configuration. Because ZFS is doing all sorts of acceleration behind the scenes, which make the results vary *immensely* from some IOPS number that you look up online. You've got to be sure you measure something, then change *only one thing* and measure again, to get a good measurement. You've got to toggle back and forth a few times, and see that the results are repeatable. And *only* then do you have a solid result. > Currently I'm implementing a re-mapping driver for this. The > reason I'm writing to this list is that I'd like to find support > from the zfs team, find sparring partners to discuss implementation > details and algorithms and, most important, find testers! So you believe you can know the drive geometry, the instantaneous head position, and the next available physical block address in software? No need for special hardware? That's cool. I hope there aren't any "gotchas" as-yet undiscovered. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss