On 07/11/2012 05:20 AM, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Sašo Kiselkov >> >> I'm contemplating implementing a new fast hash algorithm in Illumos' ZFS >> implementation to supplant the currently utilized sha256. On modern >> 64-bit CPUs SHA-256 is actually much slower than SHA-512 and indeed much >> slower than many of the SHA-3 candidates, so I went out and did some >> testing (details attached) on a possible new hash algorithm that might >> improve on this situation. > > As coincidence would have it, I recently benchmarked md5 hashing and AES > encryption on systems with and without AES-NI. Theoretically, hashing should > be much faster because it's asymmetric, while symmetric encryption has much > less speed potential. I found md5 could hash at most several hundred MB/sec, > and AES was about half to quarter of that speed ... Which is consistent with > the theory. But if I had AES-NI, then AES was about 1.1 GB/sec. Which means > we have much *much* more speed potential available untapped in terms of > hashing.
MD5 is a painfully slow hash compared to the SHA-3 candidates, or even SHA-512. The candidates I tested produced the following throughputs (a simple reversal of the cycles/byte metric for each CPU): Opteron 4234 (3.1 GHz): Skein-512: 355 MB/s Edon-R: 643 MB/s AMD Athlon II Neo N36L (1.3 GHz): Skein-512: 213 MB/s Edon-R: 364 MB/s Intel Xeon E5645 (2.4 GHz): Skein-512: 357 MB/s Edon-R: 734 MB/s Intel Xeon E5405 (2.0 GHz): Skein-512: 280 MB/s Edon-R: 531 MB/s Intel Xeon E5450 (3.0 GHz): Skein-512: 416 MB/s Edon-R: 792 MB/s Keep in mind that this is single-threaded on a pure-C implementation. During my tests I used GCC 3.4.3 in order to be able to assess speed improvements should the code be folded into Illumos (since that's one compiler Illumos can be built with), but GCC 3.4.3 is seriously stone-age. Compiling with GCC 4.6.2 I got a further speed boost of around 10-20%, so even in pure C, Edon-R is probably able to breathe down the neck of the AES-NI optimized implementation you mentioned. > Now, when you consider that a single disk typically is able to sustain > 1.0Gbit (128 MB) per second, it means, very quickly the CPU can become the > bottleneck for sustained disk reads in a large raid system. I think a lot of > the time, people with a bunch of disks in a raid configuration are able to > neglect the CPU load, just because they're using fletcher. Yes, that's exactly what I'm getting at. It would be great to have a hash that you could enable with significantly more peace of mind than sha256 - with sha256, you always need to keep in mind that the hashing is going to be super-expensive (even for reads). My testing with a "small" JBOD from Supermicro showed that I was easily able to exceed 2 GB/s of reads off of just 45 7k2 SAS drives. > Of the SHA3 finalists, I was pleased to see that only one of them was based > on AES-NI, and the others are actually faster. So in vague hand-waving > terms, yes I believe the stronger & faster hash algorithm, in time will be a > big win for zfs performance. But only in situations where people have a > sufficiently large number of disks and sufficiently high expectation for IO > performance. My whole reason for starting this exercise is that RAM is getting dirt cheap nowadays, so a reasonably large and fast dedup setup can be had for relatively little money. However, I think that the sha256 hash is really ill suited to this application and even if it isn't a critical issue, I think it's really inexcusable we are using something worse than the best of breed here (especially considering how ZFS was always designed to be easily extensible with new algorithms). > CPU's are not getting much faster. But IO is definitely getting faster. > It's best to keep ahead of that curve. As I said above, RAM is getting cheaper much faster than CPU performance is. Nowadays you can get 128 GB of server-grade RAM for around $1000, so equipping a machine with half a terabyte of RAM or more is getting commonplace. By a first degree approximation (assuming 200 B per block in the DDT) this would allow one to store upward of 80TB of unique 128K blocks in 128 GB of RAM - that's easily above 100 TB of attached raw storage, so we're looking at things like this: http://dataonstorage.com/dataon-products/6g-sas-jbod/dns-1660-4u-60-bay-6g-35inch-sassata-jbod.html These things come with eight 4-wide SAS 2.0 ports and enough bandwidth to saturate a QDR InfiniBand port. Another thing to consider are SSDs. A single 2U server with eight or even sixteen 2.5'' SATA-III SSDs can achieve even higher throughput. So I fully agree with you, we need to stay ahead of the curve, however, I think the curve is much closer than we think! Cheers, -- Saso _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss