Hi Trevor,

Not sure about the git revs, but in HEAD of master, 
lib/puppet/util/checksums.rb has stream methods for most of the types.

On Jan 5, 2011, at 11:34 AM, Trevor Vaughan wrote:

> Luke,
> 
> Thanks for the clarification. I think I was looking at the 0.24
> codebase so this may have thrown me off a bit.
> 
> Could you point me to the Git rev(s) where the stream summing is described?
> 
> Thanks!
> 
> Trevor
> 
> On Wed, Jan 5, 2011 at 2:00 PM, Luke Kanies <[email protected]> wrote:
>> On Dec 29, 2010, at 3:41 AM, Trevor Vaughan wrote:
>> 
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>> 
>>> This was just a comparison script that I wrote.
>>> 
>>> Checking through the Ruby source, it definitely looks like the digest
>>> methods just pull the entire "string" into memory.
>> 
>> Yep, that's why we had to go through the effort of adding stream summing.
>> 
>>> If this is a file, then we're already taking the memory hit that you
>>> would take by just comparing the two files.
>> 
>> Not with Puppet we're not - we do stream summing for both files, and then 
>> compare the sums.
>> 
>>> This makes complete sense since Digest doesn't know what you're passing.
>>> 
>>> I will note that it looks like chunking a file and performing the
>>> checksum might take twice as long.
>> 
>> This is the big thing - I agree that this could all be much faster via the 
>> mechanisms you've proposed.
>> 
>> It's just important to know how Puppet works internally right now and why.  
>> The stream summing is very important for us because if Ruby loads 100mb 
>> files into memory, it basically never frees that memory again.  Thus, we 
>> give up speed for drastically better ram efficiency, and we don't want to go 
>> back to the bad old ram days.
>> 
>>> I'm thinking that size+time (similar to rsync) might be enough for most
>>> files on a system. There will be relatively few files on a system that
>>> you'll want to do a full checksum on.
>> 
>> I agree.
>> 
>>> Thanks,
>>> 
>>> Trevor
>>> 
>>> On 12/26/2010 12:40 AM, Luke Kanies wrote:
>>>> On Dec 23, 2010, at 4:58, Trevor Vaughan <[email protected]> wrote:
>>>> 
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>> 
>>>>> Brice,
>>>>> 
>>>>> Thanks for the feedback, this is good stuff!
>>>>> 
>>>>>> 
>>>>>> That's more or less what rsync does. For sourced files we could even use
>>>>>> HTTP If-Modified-Since and/or If-None-Match to perform the check (and
>>>>>> thus the check would be done server side).
>>>>> 
>>>>> Yes, I briefly looked at the Rsync algorithm papers to see if I could
>>>>> figure out how to re-implement it in Ruby but just using the native
>>>>> Rsync libraries might be a better call. However, that would introduce an
>>>>> external dependency.
>>>>> 
>>>>>> 
>>>>>>> 4) For ultimate speed, a direct comparison should be an option as a
>>>>>>> checksum type. Directly comparing the content of the in-memory file
>>>>>>> and the target file appears to be twice as fast as an MD5 checksum.
>>>>>>> This would not be feasible for a 'source'.
>>>>>> 
>>>>>> That might be faster, but please don't re-introduce the slurp the whole
>>>>>> file in memory syndrom.
>>>>> 
>>>>> It seems that MD5 might be doing it anyway. When I tried a block-wise
>>>>> 'comp', it was *much* slower and I think it was even slower than MD5 (or
>>>>> close anyway) which means that MD5 is reading the whole blob into memory
>>>>> to work on it anyway! If we're going to take the memory hit, let's just
>>>>> take it and compare the two items.
>>>> 
>>>> Is this an md5 script you wrote, or are you using the Puppet code?
>>>> We've worked to add 'stream' checksum types that checksum the file a
>>>> bit at a time.
>>>> 
>>>> I expect that most of those are actually a good bit slower than just
>>>> reading the whole thing in and checksumming, but they're faster by
>>>> being less ram-efficient.
>>>> 
>>>>>> That's really something I'd like to work on. Unfortunately this is
>>>>>> really complex stuff. The file type is one of the biggest type and even
>>>>>> though I already worked on it, I'm not sure I grasped enough to be able
>>>>>> to fully refactor it for a different inner working.
>>>>> 
>>>>> Completely agreed. I'll do what I can to help, but my outside time is
>>>>> severely limited.
>>>> 
>>>> 
>>> 
>>> - --
>>> Trevor Vaughan
>>> Vice President, Onyx Point, Inc.
>>> email: [email protected]
>>> phone: 410-541-ONYX (6699)
>>> pgp: 0x6C701E94
>>> 
>>> - -- This account not approved for unencrypted sensitive information --
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.11 (GNU/Linux)
>>> 
>>> iQEcBAEBAgAGBQJNGx5LAAoJECNCGV1OLcypUrQH/RjbY56VfBunWk5rV1cgMCSO
>>> VMzXjqY0HhyAvOtYpcesYDpvPHNsnSBx3684TCX1+VYfy8vh9lFy6CxEqB3ohwN5
>>> gHjIBs2c6ZpT8UloywkwbMwAkFnqFXMfQ2/ELOfGvKsHwWq+Z9uVxW/vPxmswPJ0
>>> U6qiDnmk762OfRyD0/sBNsYljnUXwDBidWC9up9WO+hEz9bSr+NLSxMc+5PsVjyl
>>> kRtGtnBNqnE8Sw8VEjGKjrHkuoCR9pqAiGU2KM4h827zkog5oy0ghPolnEJXMD82
>>> ErEXo+Y6C7xZc7U62+0eS96Zb0LZi9B412c5PpB08TEP18lJwCwSWWtY47dSJgQ=
>>> =FPbW
>>> -----END PGP SIGNATURE-----
>>> <tvaughan.vcf>
>> 
>> 
>> --
>> On Bureaucracy....
>>        The Pythagorean theorem contains 24 words. Archimedes
>> Principle, 67.  The Ten Commandments, 179. The American Declaration of
>> Independence, 300. And recent legislation in Europe concerning when
>> and where to smoke, 23,942.      -- The European, June 23-29, 1995
>> ---------------------------------------------------------------------
>> Luke Kanies  -|-   http://puppetlabs.com   -|-   +1(615)594-8199
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Trevor Vaughan
> Vice President, Onyx Point, Inc
> (410) 541-6699
> [email protected]
> 
> -- This account not approved for unencrypted proprietary information --
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Puppet Developers" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/puppet-dev?hl=en.
> 


-- 
I have an answering machine in my car.  It says, "I'm home now. But
leave a message and I'll call when I'm out.  -- Stephen Wright
---------------------------------------------------------------------
Luke Kanies  -|-   http://puppetlabs.com   -|-   +1(615)594-8199




-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to