RE: Follow-up on RDF byte range

Gary O'Neall Fri, 10 Jul 2015 17:07:14 -0700

Hi Philippe,


As Yev points out, the current SPDX spec already depends on the file bytes not 
changing due to the file checksum and the package verification code.

 

Good point on the SVN or git checkouts changing the encodings or line endings.  
This would also cause a problem for the checksums and verification code.  I 
wonder if we need to include the file encoding and line ending characters in 
the spec to take these into account.  Perhaps as an optional field or as part 
of the SCM references that are being proposed.

 

We really wrestled with encoding issues when we discussed verification codes in 
the 1.0/1.1 timeframe.  The conclusion we came to was to just use bytes since 
they are well defined and the snippet definitions are following the same 
practice.

 

There is a proposal to include line numbers as an optional field which we 
haven't discussed yet on the call.

 

Gary

 

From: [email protected] 
[mailto:[email protected]] On Behalf Of Yev Bronshteyn
Sent: Friday, July 10, 2015 9:01 AM
To: Philippe Ombredanne
Cc: [email protected]
Subject: Re: Follow-up on RDF byte range

 

Wouldn’t the definition of “a line” also vary across operating systems? Surely, 
we don’t want to depend on the existence of a VCS performing EOL replacement in 
any environment where the SPDX file is examined. 

As someone pointed out in tuesday’s call, the byte indices would refer to the 
version of the file described by the checksum that is already mandatory for 
each file. 

 

 

On Jul 10, 2015, at 9:16 AM, Philippe Ombredanne <[email protected]> wrote:

 

On Fri, Jul 10, 2015 at 5:41 AM, Gary O'Neall < <mailto:[email protected]> 
[email protected]> wrote:



Hi Yev,
Thanks for the pointer to the pointer vocabulary.
Below are some of my thoughts - feel free to propose alternatives or provide 
more specific examples on how we may use the pointer class for Snippets.
- I do think  using the pointer classes would work for our purposes and would 
have the advantage of using an already defined vocabulary.  It is a bit more 
complex, but manageable.
- I noticed that the pointers RDF vocabulary defines byte offsets based on 1 
for the first byte in the document (not zero).  If we want to re-use these 
terms, we would need to define the byte ranges relative to 1 for both RDF and 
Tag/Value for compatibility.
- Pointers include a required property to reference the document the byte range 
applies to.  We could use the URI for the SPDX file as the value for this 
property.  This would somewhat redundant with the SPDX File property.  Not sure 
if we should retain both of these properties or not.  I'm currently leaning 
toward retaining both properties.
- There are a few choices on how to represent the byte range.  After looking 
through the doc, the ByteOffsetCompondPointer uses an offset relative to the 
startPointer (the pointer to the beginning of the range).  Based the tag/value 
definition where the start byte and end bytes are relative to the beginning of 
the file,  a StartEndPointer may be a better fit.  The startPointer and 
endPointer would be a ByteOffsetPointer class to represent a byte offset.
- If we want to include optional line number offset, we could use the 
LineCharPointer class.
Below is an example based on my understanding of the pointer vocabulary:

[....]

I think using bytes is impractical.
For instance, files from a simple git or svn checkout may be different
byte for byte on different machines and different settings
(end-of-line replacement, keyword substitution, etc) .

We are talking about line-oriented text source code.
Why not use the simpler, natural and human-understandable start and end line?
The compounded complexity of RDF and bytes is unlikely warranted here.

-- 
Cordially
Philippe Ombredanne
_______________________________________________
Spdx-tech mailing list
 <mailto:[email protected]> [email protected]
 <https://lists.spdx.org/mailman/listinfo/spdx-tech> 
https://lists.spdx.org/mailman/listinfo/spdx-tech

_______________________________________________
Spdx-tech mailing list
[email protected]
https://lists.spdx.org/mailman/listinfo/spdx-tech

RE: Follow-up on RDF byte range

Reply via email to