Re: [spdx-tech] Summary of document serialization discussion

Sean Barnum Tue, 26 Jul 2022 07:41:47 -0700

I wanted to offer a few observations here.

First, we should be careful not to conflate the concept/term “node and property 
elements” in RDF with with what we are calling Elements in our model. They are 
not the same thing. In the RDF graph, ALL nodes and edges (at a very granular 
level) are considered “elements” whereas in our model Elements are a certain 
subset of full objects that may include properties. Not all classes or nodes in 
the RDF graph are Elements. ALL nodes and edges in the RDF graph need to be 
serialized but they are not typically serialized at the same level of 
granularity that they exist in the RDF graph.
The issue of nesting in serializations is actually relatively straightforward.
In RDF (our model spec), properties are either DatatypeProperties which contain 
literal values or ObjectProperties which contain IDs of other objects. The IDs 
contained in ObjectProperties are either Element classes (subclasses of 
Element) or non-Element classes (e.g., IntegrityMethod). For serialization if 
we nest/embed the actual referenced objects for all ObjectProperties that are 
non-Elements but do not nest/embed the actual referenced objects for all 
ObjectProperties that are Elements we get a nice serialization at the 
granularity of Elements which is what we are looking for.


Collections in a graph are NOT edges. They are object nodes that are Elements. 
They have particular ObjectProperties called “element” and another called 
“rootElement” that represent edges in the graph to other Element object nodes. 
I agree that we want to avoid complexities of serialization rules treating 
edges on different kinds of objects differently but there should be no issues 
with simply treating different kinds of edges differently regardless of the 
kinds of objects they are on. This is a fundamental part of any serialization.
Using the above approach for nesting avoids the issue of collections including 
collections, supports consistent canonicalization, and keeps the domain graph 
clean.

I do not think this conflicts in any way with Wiliam’s post regarding 
serialization rules.

Sean


From: [email protected] <[email protected]> on behalf of David 
Kemp <[email protected]>
Date: Wednesday, July 20, 2022 at 12:08 PM
To: William Bartholomew (CELA) <[email protected]>
Cc: [email protected] <[email protected]>
Subject: [EXT] Re: [spdx-tech] Summary of document serialization discussion
This way of describing things is still conflating element values with 
serialization.  There are Elements (RDF - 
https://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-node-property-elements 
- calls them Node Elements), and properties (RDF calls them property elements) 
that are nodes and edges in the logical graph.   A subset of nodes in the 
logical graph needs to be serialized into files using various data formats 
including JSON, RDF, Tag-Value, ....
* The content of files (whether individual files, package tarfiles or transfer 
units) does not exist in the graph
* Nodes can be created to describe files

Collections in a graph are edges from one node to another (in our case, from 
Collection to Element).  All Elements have IRIs, and all edges between Elements 
are IRIs. The job of serialization is to losslessly represent nodes and edges.  
 Treating one kind of edge (those outbound from the Collection node) 
differently from all other edges can be done, but it is an unnecessary 
complication.  And as mentioned in the meeting, collections can include 
collections, which can  include collections in an acyclic (non-recursive) graph.

Please provide a serialized example of SBOM A having SBOM B as a member, which 
in turn has "Document" C as a member, using a nested serialization, to contrast 
with linear serialization of the same nodes.  The canonicalization group has 
been discussing JSON as the canonical data format, but we also need nested 
tag-value and RDF serializations, to compare with linear tag-value and RDF 
serializations of the same nodes, in order to evaluate any proposals.

Regards,
Dave

On Tue, Jul 19, 2022 at 2:00 PM William Bartholomew (CELA) via 
lists.spdx.org<http://lists.spdx.org> 
<[email protected]<mailto:[email protected]>> 
wrote:
Proposed rules (and alternatives):

  1.  If you are transferring a single element and no additional context needs 
to be transferred, just transfer the single element. Root of serialization is a 
single element (the element being transferred).


  1.  If you are transferring one or more elements and additional context (such 
as the creator info of the anthology) needs to be transferred, place the 
elements in a collection and transfer the collection. Root of serialization is 
a single element (the collection element).


  1.  If you are transferring one or more elements and no additional context 
needs to be transferred, this was the sticking point and we had two options:

a.       Allow the elements to be transferred as an array. Root of 
serialization is an array of elements.

b.      Require the elements to be wrapped in a collection. Root of 
serialization is a single element (the collection element).


  1.  An alternative was proposed where the root of serialization is always an 
array of elements, even if it’s an array of one in scenario #1 and #2 above.

Serialization implications:
We end up with one of these three options for serialization:

  *   Root is always an element (#1, #2, #3b).
  *   Root is always an array (#4).

     *   One significant concern I have with this is that if we ever have to 
attach additional information to the root it is a breaking change. We can work 
around this by making the root an object with a single property “elements” 
though at that stage I’d argue we’re just recreating “Collection” is a plain 
object instead of an “Element” (we originally had that design and moved away 
from it), we went to a lot of effort to make “Element” and “Collection” 
extremely lightweight so they could be used for scenarios like this.

  *   Root is sometimes an element (#1, #2) or sometimes an array (#3a).

Individual serializations may have constraints that require them to select a 
certain option or wrap an option in another option, for example, XML always has 
a single root, JSON-JD is always a list.

When the root is an array, consumers lose any ability to “address” the root 
(e.g. if they wanted to attach annotations or other information to the root), 
it requires the producer to intend the consumer to be able to do this and to 
make the decision to wrap a collection around the elements, while the consumer 
could do this post facto there would be no shared identity with the producer. 
This was one of the reasons that SPDXID was required on all SPDX elements in 
2.x, because it gave the consumer options to attach information even if that 
was not the intent of the producer, because the producer does not know all of 
the consumers use cases for the information or future use cases they may want 
to apply.


Regards,

William Bartholomew (he/him) – Let’s 
chat<https://outlook.office.com/bookwithme/user/[email protected]/meetingtype/SVRwCe7HMUGxuT6WGxi68g2?anonymous&ep=mlink>
Principal Security Strategist
Global Cybersecurity Policy – Microsoft

My working day may not be your working day. Please don’t feel obliged to reply 
to this e-mail outside of your normal working hours.




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4687): https://lists.spdx.org/g/Spdx-tech/message/4687
Mute This Topic: https://lists.spdx.org/mt/92488223/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [spdx-tech] Summary of document serialization discussion

Reply via email to