I've been looking at the usage of MD5 checksums by Puppet and I think
that there may be room for quite a bit of optimization.

The clients seem to compute the MD5 checksum of all files and in
catalog content every time they compare two files. What if:

1) The size of any known content is used as a first level comparison.
Obviously, if the sizes differ, the files differ. I don't see this in
0.24.X, but I haven't checked 2.6.X.

2) The *server* pre-computes checksums for all content items in File
resources and passes those in the catalog, then only one MD5 sum needs
to be calculated.

3) When using the puppet server in a 'source' element, the server
passes the checksum of the file on the server. If they differ, then
the file is passed across to the client.

4) For ultimate speed, a direct comparison should be an option as a
checksum type. Directly comparing the content of the in-memory file
and the target file appears to be twice as fast as an MD5 checksum.
This would not be feasible for a 'source'.

These techniques will place more burden on the server, but may cut the
CPU resources needed on the client by as much as half from some
preliminary testing.

  user     system      total        real
 MD5:   0.810000   0.230000   1.040000 (  1.050886)
MD52:  0.400000   0.120000   0.520000 (  0.525936)
Hash:   0.550000   0.270000   0.820000 (  0.821033)
Comp:  0.290000   0.120000   0.410000 (  0.407351)

MD5 -> MD5 comparison of two 100M files
MD52 -> MD5 comparison where one file has been pre-computed
Hash -> Using String.hash to do the comparison
Comp -> Direct comparison of the files

If anyone can provide a quick and dirty hack to get these into Puppet,
I'll be happy to test them.

Thanks,

Trevor



-- 
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
[email protected]

-- This account not approved for unencrypted proprietary information --

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to