Hi Philippe,
As Yev points out, the current SPDX spec already depends on the file bytes not changing due to the file checksum and the package verification code. Good point on the SVN or git checkouts changing the encodings or line endings. This would also cause a problem for the checksums and verification code. I wonder if we need to include the file encoding and line ending characters in the spec to take these into account. Perhaps as an optional field or as part of the SCM references that are being proposed. We really wrestled with encoding issues when we discussed verification codes in the 1.0/1.1 timeframe. The conclusion we came to was to just use bytes since they are well defined and the snippet definitions are following the same practice. There is a proposal to include line numbers as an optional field which we haven't discussed yet on the call. Gary From: [email protected] [mailto:[email protected]] On Behalf Of Yev Bronshteyn Sent: Friday, July 10, 2015 9:01 AM To: Philippe Ombredanne Cc: [email protected] Subject: Re: Follow-up on RDF byte range Wouldn’t the definition of “a line” also vary across operating systems? Surely, we don’t want to depend on the existence of a VCS performing EOL replacement in any environment where the SPDX file is examined. As someone pointed out in tuesday’s call, the byte indices would refer to the version of the file described by the checksum that is already mandatory for each file. On Jul 10, 2015, at 9:16 AM, Philippe Ombredanne <[email protected]> wrote: On Fri, Jul 10, 2015 at 5:41 AM, Gary O'Neall < <mailto:[email protected]> [email protected]> wrote: Hi Yev, Thanks for the pointer to the pointer vocabulary. Below are some of my thoughts - feel free to propose alternatives or provide more specific examples on how we may use the pointer class for Snippets. - I do think using the pointer classes would work for our purposes and would have the advantage of using an already defined vocabulary. It is a bit more complex, but manageable. - I noticed that the pointers RDF vocabulary defines byte offsets based on 1 for the first byte in the document (not zero). If we want to re-use these terms, we would need to define the byte ranges relative to 1 for both RDF and Tag/Value for compatibility. - Pointers include a required property to reference the document the byte range applies to. We could use the URI for the SPDX file as the value for this property. This would somewhat redundant with the SPDX File property. Not sure if we should retain both of these properties or not. I'm currently leaning toward retaining both properties. - There are a few choices on how to represent the byte range. After looking through the doc, the ByteOffsetCompondPointer uses an offset relative to the startPointer (the pointer to the beginning of the range). Based the tag/value definition where the start byte and end bytes are relative to the beginning of the file, a StartEndPointer may be a better fit. The startPointer and endPointer would be a ByteOffsetPointer class to represent a byte offset. - If we want to include optional line number offset, we could use the LineCharPointer class. Below is an example based on my understanding of the pointer vocabulary: [....] I think using bytes is impractical. For instance, files from a simple git or svn checkout may be different byte for byte on different machines and different settings (end-of-line replacement, keyword substitution, etc) . We are talking about line-oriented text source code. Why not use the simpler, natural and human-understandable start and end line? The compounded complexity of RDF and bytes is unlikely warranted here. -- Cordially Philippe Ombredanne _______________________________________________ Spdx-tech mailing list <mailto:[email protected]> [email protected] <https://lists.spdx.org/mailman/listinfo/spdx-tech> https://lists.spdx.org/mailman/listinfo/spdx-tech
_______________________________________________ Spdx-tech mailing list [email protected] https://lists.spdx.org/mailman/listinfo/spdx-tech
