On Wed, Apr 16, 2008 at 07:35:04PM -0700, Dan Price wrote:

> We could certainly do the same to generate a hash, but it's expensive;
> in search of a slightly better way, I put this together as a jumping
> off point for further discussion.  I wonder what people think of it?  I wasn't
> sure of the security implications of relying on a sha1 hash in turn derived
> from metadata which is using CRC32.

They're not terribly good.  :)  We should be hashing the actual data in the
file instead.  It'll probably be faster than the whole unjarring
rigamarole, but not terribly fast.  Now, if we could get back the data of
each file without it being decompressed, that'd be really nice, since that
wouldn't change the safety of the hash (though if the compression changes
from one version to another without the underlying data changing, then we
hit a false positive).

I dunno what's going on with skipping MANIFEST.MF.  That seems bogus to me.

I also don't know whether we should be explicitly hashing the zip
directory, or whether that's implicit in hashing the rest of the data.

Looks like we should ignore the following members from ZipInfo:

  - self.date_time
  - self.create_system
  - self.create_version

and we can ignore:

  - self.CRC
  - self.compress_size
  - self.file_size

(because they're implicit in hashing the file contents), and possibly skip

  - self.header_offset
  - self.file_offset
  - self.compress_type

because we probably don't really care about re-ordering in the archive or
what kind of compression we actually use.  Or maybe we do?

Danek
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to