>> So, instead of the hash processing a file with this text
>>
>>   $Id: foo.c 123456 2015-01-31 12:34:56 mdb $
>>
>> as is found in a file, it would instead process the above text as if 
>> it were written $Id$
>>
>> This would allow two files that are identical other than RCS Keyword 
>> vaues to have the same 'hash' for an SPDX report.
>
>Mark:
>this is eventually a problem with no simple answer. Luckily this is going away 
>eventually in the future as as far as I know git does not support >keyword 
>expansions (IMHO for the better).

While Git does not support keyword expansion directly, it can be achieved using 
the more general clean / smudge filter approach.

>That said, there are various ways I have handled this practically:

Note that the standard currently requires plain "SHA1" to be present. Omitting 
that in favor of any other / custom hash would render your file 
non-spec-compliant :-(

>3. You use a non-crypto, "locality sensitive" checksum hash that you use for 
>approximate file comparison.

That option is very useful in general to have an indication about similarity of 
files.

4. option, you simply use the hash of how the file is stored internally to the 
VCS. In a way that is similar to Philippe's option 2 as it refers to the file 
before keyword expansion. But instead of actually checking out the file without 
doing keyword expansion, you simply query the VCS for its internal hash of the 
file. At the example of Git and the AUTHORS.rst file of ScanCode [1] that would 
work like:

$ ARRAY=( $(git ls-tree HEAD AUTHORS.rst) ) ; echo ${ARRAY[2]}
d89c7ba9918d7fe249875ac44b8c61cb11cac4ac

So, this way you not only get the hash before keyword expansion is done, you 
also get the hash for free since it's already known by the VCS.

The downside is that this internal hash is specific to the VCS, so it only 
helps to identify the same file in other repos of the same VCS. But for other 
VCS you could go with Philippe's option 2 and calculate the file hash like Git 
does internally [2].

[1] 
https://github.com/nexB/scancode-toolkit/blob/bd424eae1dcdbb3f873169bbc01d252e4e20e4f4/AUTHORS.rst
[2] https://github.com/sschuberth/dev-scripts/blob/master/git/git-hash-blob.sh

Regards,
Sebastian
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to