Hi Kare,

Thanks for clarifying. From my perspective I still see SHA-1 as a good option for file level signatures, especially for preserving compatibility with older SPDX documents.

However, we may want to permit SHA-256 or something else to be used instead or as an option. Not sure right now, and interested in thoughts. Downside is its size and
whether its really worth it for file level.

Ok, sounds good. Disadvantages of size or excess data/bloat aside, this is an example of what I am currently index automatically on file level:

## File

FileName: pom.xml
FilePath: ./ant/
FileType: OTHER
FileChecksum: SHA1: 5d4e9512257c7d2a6211eff641e70b9d14a5bdc2
FileChecksum: SHA256: 3cdc1d9684bf787aaff380775b4c3966e444265f528b5d384b79b40e8650038a
FileChecksum: MD5: 0da4b1ba4890806aac2dd2a35d29b8c3
FileChecksum: SSDEEP: 48:cFxKhERLT0eGyH31pAp88mHUW51d+u4CpO9FXCaOiKBy:YQhER/cyHFp2mHU+4jc3I
FileSize: 1 Kb (1701 bytes)
FileLOC: 40
LicenseInfoInFile: Apache-2.0

The path is split from the file name part. On my case when doing a name search, the path part gets in the way and brings to surface too many results. On FileType I'd prefer to call the file on this case "MARKUP" but that would break the compatibility with the standard.

The four file signatures are useful for different purposes. MD5 is good mostly to retrieve older information. For SHA-256, quite honestly I use it only for visiting directly the VirusTotal site to cross-check security information against their virus engine database. This is just an xml, not a compiled file but the link would be https://www.virustotal.com/en/file/3cdc1d9684bf787aaff380775b4c3966e444265f528b5d384b79b40e8650038a/analysis/

SSDEEP is (sometimes) useful to find similar text files according to a percentage of look-alike. Better similarity hashes exist, this one is popular and simple to implement in Java. Perhaps sdhash might be a better option, I'd need to try out and compare one day.

LOC (lines of code) are added for (some) text files to get an idea of project dimension when estimating the audit effort/time budget without direct access to the code being audited, likely just useful for me. File size is useful to narrow the search results. From a provenance point of view, keeping the time stamp of the last file modification file might help during a deeper copyright investigation. Very few times I got to need for this kind of information and on these events it could be done manually.

I'm now re-building the information archived from previous years in another format onto the SPDX format and doing experiments with the information that is collected.


2.0 is just kicking off, and we're working on it on the WIKI at this point, and through the meeting
minutes, etc. I'll start the document as soon as we have a clear
direction on the model (subject of current discussion).

 Feel to chime up with other questions, or concerns here on the list.
:-)

Ok, no rush. So far the previous specifications have been very useful (my only complaint are the limitations of choices for FileType). I found the documentation quite straightforward to read and implement in practical terms.


Hope this feedback helps.

With kind regards,
Nuno Brito

---
http://triplecheck.de

On 2013-11-14 22:25, [email protected] wrote:
Hi Nuno,
 For 2.0 I think we need to decide what makes sense from a space and
risk perspective. From an
individual file perspective, SHA-1 is probably fine since its just
meant to ensure that the file being
looked at matches the information recorded, rather than keep something
secret. However, we
may want to permit SHA-256 or something else to be used instead or as
an option. Not sure
right now, and interested in thoughts. Downside is its size and
whether its really worth it for file level.

 2.0 is just kicking off, and we're working on it on the WIKI at this
point, and through the meeting
minutes, etc. I'll start the document as soon as we have a clear
direction on the model (subject of
current discussion).

 Feel to chime up with other questions, or concerns here on the list.
:-)

Kate

-------------------------
 FROM: Nuno Brito <[email protected]>
 TO: [email protected]
CC: [email protected]
 SENT: Thursday, November 14, 2013 11:54 AM
 SUBJECT: Re: SPDX 2.0 - update the checksum?

Dear Kate,

Would each file still be described with an SHA-1 signature in version
2.0 as default?

Sorry if I misunderstood something, I don't seem to be able of finding
a draft for version 2.0 on the SPDX site and can't read the content
for the mentioned sections.

Perhaps it would be possible to provide a link where the draft can be
read?

My thanks in advance.

With kind regards,
Nuno Brito

---
http://triplecheck.de [1]

Date: Wed, 13 Nov 2013 13:19:24 -0800 (PST)
From: [email protected]
To: "[email protected]" <[email protected]>
Subject: SPDX 2.0 - update the checksum?
Message-ID:
<[email protected]>
Content-Type: text/plain; charset="iso-8859-1"



Noticed this, and thinking we may want to give an option for our
checksum algorithms to be SHA-256 in 2.0 for 4.7, 4.8, and 6.3.


see:?http://it.slashdot.org/story/13/11/13/0154244/microsoft-warns-customers-away-from-rc4-and-sha-1
[2]

Kate
_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech [3]



Links:
------
[1] http://triplecheck.de/
[2]
http://it.slashdot.org/story/13/11/13/0154244/microsoft-warns-customers-away-from-rc4-and-sha-1
[3] https://lists.spdx.org/mailman/listinfo/spdx-tech

_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

Reply via email to