On 2016-01-02 05:07, Bruce Evans wrote: > On Sat, 2 Jan 2016, Allan Jude wrote: > >> On 2015-12-31 13:50, Allan Jude wrote: >>> On 2015-12-31 13:32, Jonathan T. Looney wrote: >>>> On 12/31/15, 2:15 AM, "Allan Jude" <[email protected]> wrote: >>>> >>>>> It seems these problems also slow things down, a lot: >>>>> >>>>> # time md5 /media/md5test/bigdata >>>>> MD5 (/media/md5test/bigdata) = 6afad0bf5d8318093e943229be05be67 >>>>> 4.310u 3.476s 0:07.79 99.8% 20+167k 0+0io 0pf+0w >>>>> # time env LD_PRELOAD=/usr/obj/media/svn/md5/head/tmp/lib/libmd.so >>>>> /usr/obj/media/svn/md5/head/sbin/md5/md5 /media/md5test/bigdata >>>>> MD5 (/media/md5test/bigdata) = 6afad0bf5d8318093e943229be05be67 >>>>> 4.133u 0.354s 0:04.49 99.7% 20+167k 1+0io 0pf+0w >>>>> >>>>> (file is fully cached in ZFS ARC, dd reads it at 11GB/s) >>>>> >>>>> Will investigate more tomorrow. >>>> >>>> md5 will be slower than dd due to the extra processing it needs to >>>> do to >>>> generate the hash. I suspect that explains the difference you're seeing >>>> between those utilities. >>> >>> Sorry, you missed my point here. >>> >>> I replaced MDXFile() with the implementation included in my earlier >>> email. Using the newer libmd with that code, cut the time to md5 the >>> SAME data down a lot. I need to do a more scientific test on a box that >>> isn't doing other stuff still though. >>> >>> The comment about dd doing 11GB/s, was just to clarify that I wasn't >>> reading the file from disk, which would introduce other variables. >> >> I found the cause of my bogus benchmark, the world on my test machine >> was just old enough to be missing jmg@'s bufsize patch. >> >> Now the difference is about 1 second on a 2GB file, so ignore my >> foolishness. > > That patch is surprisingly new. > > The main slowness that I complained about was for the other path in md5 > that must be used for special files. That uses stdio so it suffers from > stdio trusting st_blksize. But st_blksize is rarely as small as the old > size BUFSIZ in MDXFile. > > Bruce >
I did some experiments on MDXFilter, adjusting the buffer size fo 16kb, and using setvbuf() on stdin before reading from it. It improved things, but only marginally. dd if=/mnt/bigzerofile bs=1m | md5 10 GB took 80 seconds for unmodified md5, and 73.5 seconds with the bigger buffer size. I will try to setup and flamegraph it, and see if we can determine what can be done to make it faster. -- Allan Jude
signature.asc
Description: OpenPGP digital signature
