On 07/11/2017 12:27 PM, Dennis Kliban wrote: > On Tue, Jul 11, 2017 at 1:20 PM, Brian Bouterse <[email protected] > <mailto:[email protected]>> wrote: > > We should not raise a validation exception because due to semver we can't > stop raising that exception > later. Specifically if we want to ever allow double checksum being > specified in the future. > > For the MVP, I think the choices are: only respect sha256 and ignore the > rest OR do as many standard > validators as they specify. I'm ok with either of these with a slight > preference for validating as many of > them as are specified would be the best. Since we're already handling all > of the data at save time, > pushing it through additional digest validators will cost a bit of cpu > but not much additional I/O. Also > if the user asked us to do that validation then having it demand some > additional resources is OK. Also > having feature parity with the changesets is good. The changesets can > validate all standard digest > validators so having uploads do the same would be consistent. > > > I am reversing what I had previously said, and I agree that we should not > raise an exception if a user > provides more than one checksum at upload time. My current implementation > checks every checksum specified by > the user. > >
I don't think I understand the purpose of an upload user specifying more than 1 algorithm/digest for the platform to validate the integrity of an uploaded file. Why would the user specify an algorithm/digest they don't trust? > > > On Mon, Jul 10, 2017 at 5:11 PM, Jeff Ortel <[email protected] > <mailto:[email protected]>> wrote: > > > > On 07/10/2017 02:36 PM, Dennis Kliban wrote: > > On Mon, Jul 10, 2017 at 3:26 PM, Michael Hrivnak > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > > > > > On Mon, Jul 10, 2017 at 3:06 PM, Dennis Kliban > <[email protected] <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> wrote: > > > > The upload API for Artifacts is going to allow users to > specify the artifact size and a digest. The > > Artifact model currently supports 'md5', 'sha1', 'sha224', > 'sha256', 'sha384', and 'sha512' digests. > > > > Do we want to let users specify more than one digest per > upload? e.g. md5 and sha256? > > > > > > There may be no harm in this, but it would add complexity to > the verification and not add much value. I'd > > stick with just one unless there's a compelling reason for > multiple. > > > > > > I agree. The API is going to raise a validation exception when more > than 1 digest is provided. > > +1 > > > > > > > > > > > > > Do we want to store all 6 digests for each Artifact? > > > > > > The expensive part of calculating the digests is reading the > file. As long as you're already reading the > > entire file, which we will during verification, you may as well > stuff the bits through multiple hashers > > (digesters?) and get all the digests. Pulp 2 has a function > that does this: > > > > > https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353 > > <https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353> > > > <https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353 > > <https://github.com/pulp/pulp/blob/2.13-release/server/pulp/server/util.py#L327-L353>> > > > > But we can't always guarantee that we'll have all the checksums > available, for at least two reasons. 1) If > > in the future if we want to use yet another algorithm, we > probably won't want to run a migration that > > re-reads every file and calculates the additional digest. 2) > For on-demand content, we don't have it > > locally, so we can't calculate any additional checksums until > it gets fetched. > > > > So this may be one of those times where we use a > good-ole-fashioned getter method that returns the > > requested digest if it's on the artifact, calculates it if not, > or raises an exception if the value isn't > > available and can't be calculated. > > > > > > For uploaded Artifacts, all of the digests will be calculated as > the file is being processed during the > > upload. So I don't think calculating all of them should incur > significantly more cost than just one. The code > > snippet from Pulp 2 looks similar to what I am doing. > > This functionality should be a method on the Artifact and not a util > function somewhere. > > > > > I haven't given much thought to the getter, but your idea sounds > fine to me. > > Thanks, > > Dennis > > > > > > > > > > -- > > > > Michael Hrivnak > > > > Principal Software Engineer, RHCE > > > > Red Hat > > > > > > > > > > _______________________________________________ > > Pulp-dev mailing list > > [email protected] <mailto:[email protected]> > > https://www.redhat.com/mailman/listinfo/pulp-dev > <https://www.redhat.com/mailman/listinfo/pulp-dev> > > > > > _______________________________________________ > Pulp-dev mailing list > [email protected] <mailto:[email protected]> > https://www.redhat.com/mailman/listinfo/pulp-dev > <https://www.redhat.com/mailman/listinfo/pulp-dev> > > > > _______________________________________________ > Pulp-dev mailing list > [email protected] <mailto:[email protected]> > https://www.redhat.com/mailman/listinfo/pulp-dev > <https://www.redhat.com/mailman/listinfo/pulp-dev> > >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
