Sebastian,

Usually SPDX with files is produced by tools that have scanned the entire file 
contents of the project. These tools may not always scan git checkouts, because 
they’d also want to include dependencies pulled in by build tools. 

Making the existing sha1 non-mandatory would be a breaking change – consumers 
of prior versions of documents may rely on Sha1 being present.

It should be pointed out that in SPDX 2.1, files themselves are not required, 
so if you’re a developer building up a bill of materials by hand or using an 
“SPDX Editor” rather than a file scanner, chances are, you won’t be including 
files in the first place.

Do you have a particular use case in which using sha1 sums to identify files 
would be particularly difficult?

Yev

On 5/18/16, 7:29 AM, "[email protected] on behalf of Schuberth, 
Sebastian" <[email protected] on behalf of 
[email protected]> wrote:

>Hi,
>
>nowadays most source code is stored in some sort of VCS. Particularly popular 
>in the OSS world, but also in commercial software development, is Git as a 
>DVCS. Git's internal data structures are based on simple hierarchies of SHA-1 
>hashes: Contents of files ("blobs") are hashed, entries of blobs are hashed to 
>"trees", trees are hashes to "commits" etc.
>
>So basically Git already knows the hashes of all its files, and there's 
>usually no need to recalculate the hashes for the purpose of creating SPDX 
>File Checksum entries. The only hitch is that Git's SHA1 of a blob is 
>*slightly* different from the SHA1 of purely the file contents: Git prefixes 
>the file contents with "blob <size>\0" where <size> is the size of the file. 
>The "git hash-object <file>" command calculates this SHA1 on the contents of 
><file> with the prefix added, and the script at [1] illustrates how Git 
>internally performs the calculation.
>
>In order to reuse Git's SHA1 of blobs when creating an SPDX file for files 
>stored in Git, I'd like to propose a new "SHA1GIT" algorithm. The hash value 
>for that algorithm must match the output of "git hash-object <file>". Having 
>the Git-style SHA1 also allows easier matching of a given SPDX File Checksum 
>to Git repositories by doing something like "git rev-list --objects --all | 
>grep <sha1git>".
>
>Benefitting from the new SHA1GIT algorithm the most would also require to make 
>the existing SHA1 algorithm non-mandatory. From a file consistency point of 
>view it does not really make sense to compute both ("git hash-object <file>" 
>also works on files not committed to Git), and neither does it form a 
>performance point of view.
>
>Please let me know what you think about this proposal.
>
>[1] https://github.com/sschuberth/dev-scripts/blob/master/git/git-hash-blob.sh
>
>Regards,
>Sebastian
>
>
>_______________________________________________
>Spdx-tech mailing list
>[email protected]
>https://lists.spdx.org/mailman/listinfo/spdx-tech

_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to