-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This was just a comparison script that I wrote.
Checking through the Ruby source, it definitely looks like the digest methods just pull the entire "string" into memory. If this is a file, then we're already taking the memory hit that you would take by just comparing the two files. This makes complete sense since Digest doesn't know what you're passing. I will note that it looks like chunking a file and performing the checksum might take twice as long. I'm thinking that size+time (similar to rsync) might be enough for most files on a system. There will be relatively few files on a system that you'll want to do a full checksum on. Thanks, Trevor On 12/26/2010 12:40 AM, Luke Kanies wrote: > On Dec 23, 2010, at 4:58, Trevor Vaughan <[email protected]> wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Brice, >> >> Thanks for the feedback, this is good stuff! >> >>> >>> That's more or less what rsync does. For sourced files we could even use >>> HTTP If-Modified-Since and/or If-None-Match to perform the check (and >>> thus the check would be done server side). >> >> Yes, I briefly looked at the Rsync algorithm papers to see if I could >> figure out how to re-implement it in Ruby but just using the native >> Rsync libraries might be a better call. However, that would introduce an >> external dependency. >> >>> >>>> 4) For ultimate speed, a direct comparison should be an option as a >>>> checksum type. Directly comparing the content of the in-memory file >>>> and the target file appears to be twice as fast as an MD5 checksum. >>>> This would not be feasible for a 'source'. >>> >>> That might be faster, but please don't re-introduce the slurp the whole >>> file in memory syndrom. >> >> It seems that MD5 might be doing it anyway. When I tried a block-wise >> 'comp', it was *much* slower and I think it was even slower than MD5 (or >> close anyway) which means that MD5 is reading the whole blob into memory >> to work on it anyway! If we're going to take the memory hit, let's just >> take it and compare the two items. > > Is this an md5 script you wrote, or are you using the Puppet code? > We've worked to add 'stream' checksum types that checksum the file a > bit at a time. > > I expect that most of those are actually a good bit slower than just > reading the whole thing in and checksumming, but they're faster by > being less ram-efficient. > >>> That's really something I'd like to work on. Unfortunately this is >>> really complex stuff. The file type is one of the biggest type and even >>> though I already worked on it, I'm not sure I grasped enough to be able >>> to fully refactor it for a different inner working. >> >> Completely agreed. I'll do what I can to help, but my outside time is >> severely limited. > > - -- Trevor Vaughan Vice President, Onyx Point, Inc. email: [email protected] phone: 410-541-ONYX (6699) pgp: 0x6C701E94 - -- This account not approved for unencrypted sensitive information -- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJNGx5LAAoJECNCGV1OLcypUrQH/RjbY56VfBunWk5rV1cgMCSO VMzXjqY0HhyAvOtYpcesYDpvPHNsnSBx3684TCX1+VYfy8vh9lFy6CxEqB3ohwN5 gHjIBs2c6ZpT8UloywkwbMwAkFnqFXMfQ2/ELOfGvKsHwWq+Z9uVxW/vPxmswPJ0 U6qiDnmk762OfRyD0/sBNsYljnUXwDBidWC9up9WO+hEz9bSr+NLSxMc+5PsVjyl kRtGtnBNqnE8Sw8VEjGKjrHkuoCR9pqAiGU2KM4h827zkog5oy0ghPolnEJXMD82 ErEXo+Y6C7xZc7U62+0eS96Zb0LZi9B412c5PpB08TEP18lJwCwSWWtY47dSJgQ= =FPbW -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
<<attachment: tvaughan.vcf>>
