On Dec 17, 2010, at 7:44 AM, Trevor Vaughan wrote: > I've been looking at the usage of MD5 checksums by Puppet and I think > that there may be room for quite a bit of optimization. > > The clients seem to compute the MD5 checksum of all files and in > catalog content every time they compare two files. What if: > > 1) The size of any known content is used as a first level comparison. > Obviously, if the sizes differ, the files differ. I don't see this in > 0.24.X, but I haven't checked 2.6.X. > > 2) The *server* pre-computes checksums for all content items in File > resources and passes those in the catalog, then only one MD5 sum needs > to be calculated. > > 3) When using the puppet server in a 'source' element, the server > passes the checksum of the file on the server. If they differ, then > the file is passed across to the client. > > 4) For ultimate speed, a direct comparison should be an option as a > checksum type. Directly comparing the content of the in-memory file > and the target file appears to be twice as fast as an MD5 checksum. > This would not be feasible for a 'source'. > > These techniques will place more burden on the server, but may cut the > CPU resources needed on the client by as much as half from some > preliminary testing. > > user system total real > MD5: 0.810000 0.230000 1.040000 ( 1.050886) > MD52: 0.400000 0.120000 0.520000 ( 0.525936) > Hash: 0.550000 0.270000 0.820000 ( 0.821033) > Comp: 0.290000 0.120000 0.410000 ( 0.407351) > > MD5 -> MD5 comparison of two 100M files > MD52 -> MD5 comparison where one file has been pre-computed > Hash -> Using String.hash to do the comparison > Comp -> Direct comparison of the files > > If anyone can provide a quick and dirty hack to get these into Puppet, > I'll be happy to test them.
This seems like a good idea to me. Interesting that the direct comparison is so much faster. How would you log that, if they're different? Could you open a ticket on this? I can't promise that we'll spend dev time on it right now, but it'd be great to capture it to start. -- Always read stuff that will make you look good if you die in the middle of it. -- P. J. O'Rourke --------------------------------------------------------------------- Luke Kanies -|- http://puppetlabs.com -|- +1(615)594-8199 -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
