Hi Trevor, Not sure about the git revs, but in HEAD of master, lib/puppet/util/checksums.rb has stream methods for most of the types.
On Jan 5, 2011, at 11:34 AM, Trevor Vaughan wrote: > Luke, > > Thanks for the clarification. I think I was looking at the 0.24 > codebase so this may have thrown me off a bit. > > Could you point me to the Git rev(s) where the stream summing is described? > > Thanks! > > Trevor > > On Wed, Jan 5, 2011 at 2:00 PM, Luke Kanies <[email protected]> wrote: >> On Dec 29, 2010, at 3:41 AM, Trevor Vaughan wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> This was just a comparison script that I wrote. >>> >>> Checking through the Ruby source, it definitely looks like the digest >>> methods just pull the entire "string" into memory. >> >> Yep, that's why we had to go through the effort of adding stream summing. >> >>> If this is a file, then we're already taking the memory hit that you >>> would take by just comparing the two files. >> >> Not with Puppet we're not - we do stream summing for both files, and then >> compare the sums. >> >>> This makes complete sense since Digest doesn't know what you're passing. >>> >>> I will note that it looks like chunking a file and performing the >>> checksum might take twice as long. >> >> This is the big thing - I agree that this could all be much faster via the >> mechanisms you've proposed. >> >> It's just important to know how Puppet works internally right now and why. >> The stream summing is very important for us because if Ruby loads 100mb >> files into memory, it basically never frees that memory again. Thus, we >> give up speed for drastically better ram efficiency, and we don't want to go >> back to the bad old ram days. >> >>> I'm thinking that size+time (similar to rsync) might be enough for most >>> files on a system. There will be relatively few files on a system that >>> you'll want to do a full checksum on. >> >> I agree. >> >>> Thanks, >>> >>> Trevor >>> >>> On 12/26/2010 12:40 AM, Luke Kanies wrote: >>>> On Dec 23, 2010, at 4:58, Trevor Vaughan <[email protected]> wrote: >>>> >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>>> >>>>> Brice, >>>>> >>>>> Thanks for the feedback, this is good stuff! >>>>> >>>>>> >>>>>> That's more or less what rsync does. For sourced files we could even use >>>>>> HTTP If-Modified-Since and/or If-None-Match to perform the check (and >>>>>> thus the check would be done server side). >>>>> >>>>> Yes, I briefly looked at the Rsync algorithm papers to see if I could >>>>> figure out how to re-implement it in Ruby but just using the native >>>>> Rsync libraries might be a better call. However, that would introduce an >>>>> external dependency. >>>>> >>>>>> >>>>>>> 4) For ultimate speed, a direct comparison should be an option as a >>>>>>> checksum type. Directly comparing the content of the in-memory file >>>>>>> and the target file appears to be twice as fast as an MD5 checksum. >>>>>>> This would not be feasible for a 'source'. >>>>>> >>>>>> That might be faster, but please don't re-introduce the slurp the whole >>>>>> file in memory syndrom. >>>>> >>>>> It seems that MD5 might be doing it anyway. When I tried a block-wise >>>>> 'comp', it was *much* slower and I think it was even slower than MD5 (or >>>>> close anyway) which means that MD5 is reading the whole blob into memory >>>>> to work on it anyway! If we're going to take the memory hit, let's just >>>>> take it and compare the two items. >>>> >>>> Is this an md5 script you wrote, or are you using the Puppet code? >>>> We've worked to add 'stream' checksum types that checksum the file a >>>> bit at a time. >>>> >>>> I expect that most of those are actually a good bit slower than just >>>> reading the whole thing in and checksumming, but they're faster by >>>> being less ram-efficient. >>>> >>>>>> That's really something I'd like to work on. Unfortunately this is >>>>>> really complex stuff. The file type is one of the biggest type and even >>>>>> though I already worked on it, I'm not sure I grasped enough to be able >>>>>> to fully refactor it for a different inner working. >>>>> >>>>> Completely agreed. I'll do what I can to help, but my outside time is >>>>> severely limited. >>>> >>>> >>> >>> - -- >>> Trevor Vaughan >>> Vice President, Onyx Point, Inc. >>> email: [email protected] >>> phone: 410-541-ONYX (6699) >>> pgp: 0x6C701E94 >>> >>> - -- This account not approved for unencrypted sensitive information -- >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.11 (GNU/Linux) >>> >>> iQEcBAEBAgAGBQJNGx5LAAoJECNCGV1OLcypUrQH/RjbY56VfBunWk5rV1cgMCSO >>> VMzXjqY0HhyAvOtYpcesYDpvPHNsnSBx3684TCX1+VYfy8vh9lFy6CxEqB3ohwN5 >>> gHjIBs2c6ZpT8UloywkwbMwAkFnqFXMfQ2/ELOfGvKsHwWq+Z9uVxW/vPxmswPJ0 >>> U6qiDnmk762OfRyD0/sBNsYljnUXwDBidWC9up9WO+hEz9bSr+NLSxMc+5PsVjyl >>> kRtGtnBNqnE8Sw8VEjGKjrHkuoCR9pqAiGU2KM4h827zkog5oy0ghPolnEJXMD82 >>> ErEXo+Y6C7xZc7U62+0eS96Zb0LZi9B412c5PpB08TEP18lJwCwSWWtY47dSJgQ= >>> =FPbW >>> -----END PGP SIGNATURE----- >>> <tvaughan.vcf> >> >> >> -- >> On Bureaucracy.... >> The Pythagorean theorem contains 24 words. Archimedes >> Principle, 67. The Ten Commandments, 179. The American Declaration of >> Independence, 300. And recent legislation in Europe concerning when >> and where to smoke, 23,942. -- The European, June 23-29, 1995 >> --------------------------------------------------------------------- >> Luke Kanies -|- http://puppetlabs.com -|- +1(615)594-8199 >> >> >> >> >> > > > > -- > Trevor Vaughan > Vice President, Onyx Point, Inc > (410) 541-6699 > [email protected] > > -- This account not approved for unencrypted proprietary information -- > > -- > You received this message because you are subscribed to the Google Groups > "Puppet Developers" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/puppet-dev?hl=en. > -- I have an answering machine in my car. It says, "I'm home now. But leave a message and I'll call when I'm out. -- Stephen Wright --------------------------------------------------------------------- Luke Kanies -|- http://puppetlabs.com -|- +1(615)594-8199 -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
