Re: [spdx-tech] No Array root

William Bartholomew (CELA) via lists.spdx.org Tue, 09 Aug 2022 13:23:15 -0700

The logical model is indifferent, this would live in the serialization layer, 
and I'd like to give serialization formats the flexibility since they may have 
technical constraints or conventions that make one option necessary or 
preferrable.

We previously decided that nesting was allowed, preferred to be restricted to 
one level, but you do not have to use nesting, you can keep flat if that's your 
preference. Nested and not nested are semantically equivalent.

The "root" will be dependent on the serialization format, again due to 
technical constraints. Because of this I'd like to keep the "root" as simple as 
possible, for example, I'm reluctant to add support for defaults and 
external/namespace maps to the "root" because this is already supported by 
collections, if you want to use those features then put your elements in a 
collection. Otherwise, we need to create a new type that looks like a 
collection in most ways but isn't a collection, that will lead to more 
confusion, not less ("if you want to collect multiple elements together put 
them in a collection, unless you want to put them in the root of a document in 
which case you can put them in an array, and then you put the maps over in 
these properties which are the same as the properties on collection but this 
isn't a collection it just looks like one" is not a sentence I want to have to 
write again 😊).

On the call I mentioned I wanted to talk a little more about optimizations a 
serializer might want to implement, it's relevant to this conversation so I'll 
include it here. The "data types" (on the right-hand side of the model) have 
struct semantics, in other words, if the fields have the same values then the 
structs are the same. When serializing a serializer could collapse all structs 
with the same value and replace them with a short identifier (that's only valid 
within that serialization), and on deserializing it expands the structs back to 
individual instances. This is more flexible than defaults because it can 
support collapsing multiple repeated values into individual short identifiers 
and some serializations already have this concept built in.

For a real-world example, let's say I create an Office SPDX document by 
combining the Word, Excel, Outlook, and PowerPoint SPDX documents. These 
products were built independently and there will be duplication in the creation 
information within each product but not across the products. They also depend 
on shared components that will have common creation information across the 
products. A serializer could collapse out all the common creation information 
into a file-level hash of identifier -> creation information without affecting 
the canonical or semantic meaning of the elements.

One interesting feature of YAML for example, is that it allows you to attach an 
"anchor" to a value and then use aliases to re-use that value elsewhere in the 
document. So on the first instance of a creation information, you could attach 
an "anchor" (I'd use the canonical hash of the creation information as the 
anchor) and then every subsequent instance of that same creation information 
could be replaced with an "alias". See here for an example: 
https://yaml101.com/anchors-and-aliases/. This same concept exists, or can be 
easily implemented, in other serializations, for example, in JSON you could 
have a "@anchors": { "a1b2c3": { arbitrary_json }, ... } property that provides 
the same capability.

Regards,

William Bartholomew (he/him) – Let’s chat
Principal Security Strategist
Global Cybersecurity Policy – Microsoft

My working day may not be your working day. Please don’t feel obliged to reply 
to this e-mail outside of your normal working hours.

-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of 
Sebastian Crane via lists.spdx.org
Sent: Tuesday, August 9, 2022 12:23 PM
To: [email protected]
Subject: [EXTERNAL] Re: [spdx-tech] No Array root

Dear David,

With my tech-hat on, I would greatly prefer your second, modified example. It's 
much easier to process with the programming languages I use, which are of the 
functional paradigm and thus get along really well with flat arrays or maps. 
Nesting means hard-coding extra logic to extract the individual Elements out 
the extra structure.

Also, donning my outreach-hat now, I'd fully agree with your final statement; I 
think the communication for 3.0 (both for publicity and
education) will be improved by prioritising the role of invididual, atomic and 
discrete Elements.

Best wishes,

Sebastian

On Tue, Aug 09, 2022 at 02:34:47PM -0400, David Kemp wrote:
> William,
>
> I typed in the SBOM example from the model diagram.
> I then modified it to move the element of type SBOM from the beginning 
> to the array of elements.
>
> Neither the original nor the modified JSON-LD serialized file has an 
> element of type SpdxDocument containing statements about the 
> serialized file.  This is good :-).  But if in addition to the three 
> elements (SBOM, Person, Package) there were a fourth SpdxDocument 
> element, it would replace and eliminate the need for ExternalMap by 
> providing URL, elements, and verification information, simplifying the model.
>
> Neither the original nor the modified file has an array as root.  In
> example2 the root object still has creation/default properties, and it 
> has external elements, and it has element values.  The difference is 
> that the element values are all serialized together.  There is no need 
> for a special rule that you can nest values one level deep, because 
> there is no nesting at all.
>
> Question: Is the second file a valid serialization?  Is there any 
> reason to use a special nested JSON-LD serialization instead of 
> keeping all the elements together in an array?
>
> Original:
> {
>   SBOM: ...
>   creationInfo: ...
>   externalMap: ...
>   elements: [
>     Person: ...
>     Package: ...
>   ]
> }
>
> Modified, Not nested:
> {
>   creationInfo: ...
>   externalMap: ...
>   elements: [
>     SBOM: ...
>     Person: ...
>     Package: ...
>   ]
> }
>
> In my opinion, it is clearer to always say "this file contains these 3 
> elements", instead of saying "this file contains this element and two 
> other elements nested inside it".  When hashing the SBOM element the 
> hash doesn't cover other elements - this is more obvious when one SBOM 
> contains another SBOM where the second isn't nested two levels deep.
>
> Regards,
> David
>
>
> 
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4734): https://lists.spdx.org/g/Spdx-tech/message/4734
Mute This Topic: https://lists.spdx.org/mt/92921289/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [spdx-tech] No Array root

Reply via email to