On Mittwoch, 25. Februar 2009, OGAWA Hirofumi wrote: > OGAWA Hirofumi <hirof...@mail.parknet.co.jp> writes: > > In kernel, crypto subsystem has sha1 (and some other hashes). And > > some systems can use hardware for it (and IIRC, it can calc the hash > > asynchronously, if you want). And the algorithms would be selectable > > without changing interface for it. So, crypto stuff may be good one. > > BTW, IIRC, asynchronous stuff on hardware was the good optimization when > I was playing with IPSEC. Well, doing that asynchronously in hardware would probably mean some latency, if eg. a MB of data has to be hashed in 4kB blocks.
I'm not sure what the best way is, performance-wise; I could see a benefit of fast (in-CPU) hash calculation (with something like I mentioned earlier), *if* the data is still in the CPU cache later when the comparision is done ... but that's possibly some 0.03 seconds later, and so that cache would be spilled anyway. Should the de-duplication be *fully* asynchronously, ie. done while the rest of the (IO) system is idle? Or would you bet the data on the hash being collision-free, so that no direct comparision is necessary? (Of course, using a 32kBit hash key for a 4kB block would work, but be meaningless ;-) Regards, Phil _______________________________________________ Tux3 mailing list Tux3@tux3.org http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3