Issue #5650 has been updated by Brice Figureau.
Daniel Pittman wrote: > None of that would solve the problem that we checksum when we are not > managing content, though. That shouldn't happen. The last time I checked (about a year ago) we weren't doing that (or at least we could use checksum => none to prevent such issue). ---------------------------------------- Feature #5650: Potentially able to save a great deal of time on File comparisons. https://projects.puppetlabs.com/issues/5650 Author: Trevor Vaughan Status: Accepted Priority: Normal Assignee: Category: file Target version: Affected Puppet version: Keywords: Branch: I've been looking at the usage of MD5 checksums by Puppet and I think that there may be room for quite a bit of optimization. The clients seem to compute the MD5 checksum of all files and in catalog content every time they compare two files. What if: 1) The size of any known content is used as a first level comparison. Obviously, if the sizes differ, the files differ. I don't see this in 0.24.X, but I haven't checked 2.6.X. 2) The *server* pre-computes checksums for all content items in File resources and passes those in the catalog, then only one MD5 sum needs to be calculated. 3) When using the puppet server in a 'source' element, the server passes the checksum of the file on the server. If they differ, then the file is passed across to the client. 4) For ultimate speed, a direct comparison should be an option as a checksum type. Directly comparing the content of the in-memory file and the target file appears to be twice as fast as an MD5 checksum. This would not be feasible for a 'source'. These techniques will place more burden on the server, but may cut the CPU resources needed on the client by as much as half from some preliminary testing. user system total real MD5: 0.810000 0.230000 1.040000 ( 1.050886) MD52: 0.400000 0.120000 0.520000 ( 0.525936) Hash: 0.550000 0.270000 0.820000 ( 0.821033) Comp: 0.290000 0.120000 0.410000 ( 0.407351) MD5 -> MD5 comparison of two 100M files MD52 -> MD5 comparison where one file has been pre-computed Hash -> Using String.hash to do the comparison Comp -> Direct comparison of the files For any technique that does not compute a checksum of the file, I would think that a good item to record to note the change difference would be a combination of the latest modified time and the size of the file. This would make for an easy numeric comparison string and the time can be set at the time that you update the file. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
