>> So, instead of the hash processing a file with this text
>>
>> $Id: foo.c 123456 2015-01-31 12:34:56 mdb $
>>
>> as is found in a file, it would instead process the above text as if
>> it were written $Id$
>>
>> This would allow two files that are identical other than RCS Keyword
>> vaues to have the same 'hash' for an SPDX report.
>
>Mark:
>this is eventually a problem with no simple answer. Luckily this is going away
>eventually in the future as as far as I know git does not support >keyword
>expansions (IMHO for the better).
While Git does not support keyword expansion directly, it can be achieved using
the more general clean / smudge filter approach.
>That said, there are various ways I have handled this practically:
Note that the standard currently requires plain "SHA1" to be present. Omitting
that in favor of any other / custom hash would render your file
non-spec-compliant :-(
>3. You use a non-crypto, "locality sensitive" checksum hash that you use for
>approximate file comparison.
That option is very useful in general to have an indication about similarity of
files.
4. option, you simply use the hash of how the file is stored internally to the
VCS. In a way that is similar to Philippe's option 2 as it refers to the file
before keyword expansion. But instead of actually checking out the file without
doing keyword expansion, you simply query the VCS for its internal hash of the
file. At the example of Git and the AUTHORS.rst file of ScanCode [1] that would
work like:
$ ARRAY=( $(git ls-tree HEAD AUTHORS.rst) ) ; echo ${ARRAY[2]}
d89c7ba9918d7fe249875ac44b8c61cb11cac4ac
So, this way you not only get the hash before keyword expansion is done, you
also get the hash for free since it's already known by the VCS.
The downside is that this internal hash is specific to the VCS, so it only
helps to identify the same file in other repos of the same VCS. But for other
VCS you could go with Philippe's option 2 and calculate the file hash like Git
does internally [2].
[1]
https://github.com/nexB/scancode-toolkit/blob/bd424eae1dcdbb3f873169bbc01d252e4e20e4f4/AUTHORS.rst
[2] https://github.com/sschuberth/dev-scripts/blob/master/git/git-hash-blob.sh
Regards,
Sebastian
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech