Oops. After hitting send I realized I forgot to include an important exception in something I said below. The missing statement is in red below.
For serialization if we nest/embed the actual referenced objects for all ObjectProperties that are non-Elements but do not nest/embed the actual referenced objects for all ObjectProperties that are Elements we get a nice serialization at the granularity of Elements which is what we are looking for. There would be one specific exception to this rule and that is for the “element” ObjectProperty on Collection where the referenced objects would be nested/embedded only at a single level of depth. Sorry for the oversight. sean From: Sean Barnum <[email protected]> Date: Tuesday, July 26, 2022 at 10:41 AM To: David Kemp <[email protected]>, William Bartholomew (CELA) <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: [EXT] Re: [spdx-tech] Summary of document serialization discussion I wanted to offer a few observations here. First, we should be careful not to conflate the concept/term “node and property elements” in RDF with with what we are calling Elements in our model. They are not the same thing. In the RDF graph, ALL nodes and edges (at a very granular level) are considered “elements” whereas in our model Elements are a certain subset of full objects that may include properties. Not all classes or nodes in the RDF graph are Elements. ALL nodes and edges in the RDF graph need to be serialized but they are not typically serialized at the same level of granularity that they exist in the RDF graph. The issue of nesting in serializations is actually relatively straightforward. In RDF (our model spec), properties are either DatatypeProperties which contain literal values or ObjectProperties which contain IDs of other objects. The IDs contained in ObjectProperties are either Element classes (subclasses of Element) or non-Element classes (e.g., IntegrityMethod). For serialization if we nest/embed the actual referenced objects for all ObjectProperties that are non-Elements but do not nest/embed the actual referenced objects for all ObjectProperties that are Elements we get a nice serialization at the granularity of Elements which is what we are looking for. Collections in a graph are NOT edges. They are object nodes that are Elements. They have particular ObjectProperties called “element” and another called “rootElement” that represent edges in the graph to other Element object nodes. I agree that we want to avoid complexities of serialization rules treating edges on different kinds of objects differently but there should be no issues with simply treating different kinds of edges differently regardless of the kinds of objects they are on. This is a fundamental part of any serialization. Using the above approach for nesting avoids the issue of collections including collections, supports consistent canonicalization, and keeps the domain graph clean. I do not think this conflicts in any way with Wiliam’s post regarding serialization rules. Sean From: [email protected] <[email protected]> on behalf of David Kemp <[email protected]> Date: Wednesday, July 20, 2022 at 12:08 PM To: William Bartholomew (CELA) <[email protected]> Cc: [email protected] <[email protected]> Subject: [EXT] Re: [spdx-tech] Summary of document serialization discussion This way of describing things is still conflating element values with serialization. There are Elements (RDF - https://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-node-property-elements - calls them Node Elements), and properties (RDF calls them property elements) that are nodes and edges in the logical graph. A subset of nodes in the logical graph needs to be serialized into files using various data formats including JSON, RDF, Tag-Value, .... * The content of files (whether individual files, package tarfiles or transfer units) does not exist in the graph * Nodes can be created to describe files Collections in a graph are edges from one node to another (in our case, from Collection to Element). All Elements have IRIs, and all edges between Elements are IRIs. The job of serialization is to losslessly represent nodes and edges. Treating one kind of edge (those outbound from the Collection node) differently from all other edges can be done, but it is an unnecessary complication. And as mentioned in the meeting, collections can include collections, which can include collections in an acyclic (non-recursive) graph. Please provide a serialized example of SBOM A having SBOM B as a member, which in turn has "Document" C as a member, using a nested serialization, to contrast with linear serialization of the same nodes. The canonicalization group has been discussing JSON as the canonical data format, but we also need nested tag-value and RDF serializations, to compare with linear tag-value and RDF serializations of the same nodes, in order to evaluate any proposals. Regards, Dave On Tue, Jul 19, 2022 at 2:00 PM William Bartholomew (CELA) via lists.spdx.org<http://lists.spdx.org> <[email protected]<mailto:[email protected]>> wrote: Proposed rules (and alternatives): 1. If you are transferring a single element and no additional context needs to be transferred, just transfer the single element. Root of serialization is a single element (the element being transferred). 1. If you are transferring one or more elements and additional context (such as the creator info of the anthology) needs to be transferred, place the elements in a collection and transfer the collection. Root of serialization is a single element (the collection element). 1. If you are transferring one or more elements and no additional context needs to be transferred, this was the sticking point and we had two options: a. Allow the elements to be transferred as an array. Root of serialization is an array of elements. b. Require the elements to be wrapped in a collection. Root of serialization is a single element (the collection element). 1. An alternative was proposed where the root of serialization is always an array of elements, even if it’s an array of one in scenario #1 and #2 above. Serialization implications: We end up with one of these three options for serialization: * Root is always an element (#1, #2, #3b). * Root is always an array (#4). * One significant concern I have with this is that if we ever have to attach additional information to the root it is a breaking change. We can work around this by making the root an object with a single property “elements” though at that stage I’d argue we’re just recreating “Collection” is a plain object instead of an “Element” (we originally had that design and moved away from it), we went to a lot of effort to make “Element” and “Collection” extremely lightweight so they could be used for scenarios like this. * Root is sometimes an element (#1, #2) or sometimes an array (#3a). Individual serializations may have constraints that require them to select a certain option or wrap an option in another option, for example, XML always has a single root, JSON-JD is always a list. When the root is an array, consumers lose any ability to “address” the root (e.g. if they wanted to attach annotations or other information to the root), it requires the producer to intend the consumer to be able to do this and to make the decision to wrap a collection around the elements, while the consumer could do this post facto there would be no shared identity with the producer. This was one of the reasons that SPDXID was required on all SPDX elements in 2.x, because it gave the consumer options to attach information even if that was not the intent of the producer, because the producer does not know all of the consumers use cases for the information or future use cases they may want to apply. Regards, William Bartholomew (he/him) – Let’s chat<https://outlook.office.com/bookwithme/user/[email protected]/meetingtype/SVRwCe7HMUGxuT6WGxi68g2?anonymous&ep=mlink> Principal Security Strategist Global Cybersecurity Policy – Microsoft My working day may not be your working day. Please don’t feel obliged to reply to this e-mail outside of your normal working hours. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4688): https://lists.spdx.org/g/Spdx-tech/message/4688 Mute This Topic: https://lists.spdx.org/mt/92488223/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
