On Wed, Apr 16, 2008 at 07:35:04PM -0700, Dan Price wrote: > We could certainly do the same to generate a hash, but it's expensive; > in search of a slightly better way, I put this together as a jumping > off point for further discussion. I wonder what people think of it? I wasn't > sure of the security implications of relying on a sha1 hash in turn derived > from metadata which is using CRC32.
They're not terribly good. :) We should be hashing the actual data in the file instead. It'll probably be faster than the whole unjarring rigamarole, but not terribly fast. Now, if we could get back the data of each file without it being decompressed, that'd be really nice, since that wouldn't change the safety of the hash (though if the compression changes from one version to another without the underlying data changing, then we hit a false positive). I dunno what's going on with skipping MANIFEST.MF. That seems bogus to me. I also don't know whether we should be explicitly hashing the zip directory, or whether that's implicit in hashing the rest of the data. Looks like we should ignore the following members from ZipInfo: - self.date_time - self.create_system - self.create_version and we can ignore: - self.CRC - self.compress_size - self.file_size (because they're implicit in hashing the file contents), and possibly skip - self.header_offset - self.file_offset - self.compress_type because we probably don't really care about re-ordering in the archive or what kind of compression we actually use. Or maybe we do? Danek _______________________________________________ pkg-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
