The logical model is indifferent, this would live in the serialization layer,
and I'd like to give serialization formats the flexibility since they may have
technical constraints or conventions that make one option necessary or
preferrable.
We previously decided that nesting was allowed, preferred to be restricted to
one level, but you do not have to use nesting, you can keep flat if that's your
preference. Nested and not nested are semantically equivalent.
The "root" will be dependent on the serialization format, again due to
technical constraints. Because of this I'd like to keep the "root" as simple as
possible, for example, I'm reluctant to add support for defaults and
external/namespace maps to the "root" because this is already supported by
collections, if you want to use those features then put your elements in a
collection. Otherwise, we need to create a new type that looks like a
collection in most ways but isn't a collection, that will lead to more
confusion, not less ("if you want to collect multiple elements together put
them in a collection, unless you want to put them in the root of a document in
which case you can put them in an array, and then you put the maps over in
these properties which are the same as the properties on collection but this
isn't a collection it just looks like one" is not a sentence I want to have to
write again 😊).
On the call I mentioned I wanted to talk a little more about optimizations a
serializer might want to implement, it's relevant to this conversation so I'll
include it here. The "data types" (on the right-hand side of the model) have
struct semantics, in other words, if the fields have the same values then the
structs are the same. When serializing a serializer could collapse all structs
with the same value and replace them with a short identifier (that's only valid
within that serialization), and on deserializing it expands the structs back to
individual instances. This is more flexible than defaults because it can
support collapsing multiple repeated values into individual short identifiers
and some serializations already have this concept built in.
For a real-world example, let's say I create an Office SPDX document by
combining the Word, Excel, Outlook, and PowerPoint SPDX documents. These
products were built independently and there will be duplication in the creation
information within each product but not across the products. They also depend
on shared components that will have common creation information across the
products. A serializer could collapse out all the common creation information
into a file-level hash of identifier -> creation information without affecting
the canonical or semantic meaning of the elements.
One interesting feature of YAML for example, is that it allows you to attach an
"anchor" to a value and then use aliases to re-use that value elsewhere in the
document. So on the first instance of a creation information, you could attach
an "anchor" (I'd use the canonical hash of the creation information as the
anchor) and then every subsequent instance of that same creation information
could be replaced with an "alias". See here for an example:
https://yaml101.com/anchors-and-aliases/. This same concept exists, or can be
easily implemented, in other serializations, for example, in JSON you could
have a "@anchors": { "a1b2c3": { arbitrary_json }, ... } property that provides
the same capability.
Regards,
William Bartholomew (he/him) – Let’s chat
Principal Security Strategist
Global Cybersecurity Policy – Microsoft
My working day may not be your working day. Please don’t feel obliged to reply
to this e-mail outside of your normal working hours.
-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of
Sebastian Crane via lists.spdx.org
Sent: Tuesday, August 9, 2022 12:23 PM
To: [email protected]
Subject: [EXTERNAL] Re: [spdx-tech] No Array root
Dear David,
With my tech-hat on, I would greatly prefer your second, modified example. It's
much easier to process with the programming languages I use, which are of the
functional paradigm and thus get along really well with flat arrays or maps.
Nesting means hard-coding extra logic to extract the individual Elements out
the extra structure.
Also, donning my outreach-hat now, I'd fully agree with your final statement; I
think the communication for 3.0 (both for publicity and
education) will be improved by prioritising the role of invididual, atomic and
discrete Elements.
Best wishes,
Sebastian
On Tue, Aug 09, 2022 at 02:34:47PM -0400, David Kemp wrote:
> William,
>
> I typed in the SBOM example from the model diagram.
> I then modified it to move the element of type SBOM from the beginning
> to the array of elements.
>
> Neither the original nor the modified JSON-LD serialized file has an
> element of type SpdxDocument containing statements about the
> serialized file. This is good :-). But if in addition to the three
> elements (SBOM, Person, Package) there were a fourth SpdxDocument
> element, it would replace and eliminate the need for ExternalMap by
> providing URL, elements, and verification information, simplifying the model.
>
> Neither the original nor the modified file has an array as root. In
> example2 the root object still has creation/default properties, and it
> has external elements, and it has element values. The difference is
> that the element values are all serialized together. There is no need
> for a special rule that you can nest values one level deep, because
> there is no nesting at all.
>
> Question: Is the second file a valid serialization? Is there any
> reason to use a special nested JSON-LD serialization instead of
> keeping all the elements together in an array?
>
> Original:
> {
> SBOM: ...
> creationInfo: ...
> externalMap: ...
> elements: [
> Person: ...
> Package: ...
> ]
> }
>
> Modified, Not nested:
> {
> creationInfo: ...
> externalMap: ...
> elements: [
> SBOM: ...
> Person: ...
> Package: ...
> ]
> }
>
> In my opinion, it is clearer to always say "this file contains these 3
> elements", instead of saying "this file contains this element and two
> other elements nested inside it". When hashing the SBOM element the
> hash doesn't cover other elements - this is more obvious when one SBOM
> contains another SBOM where the second isn't nested two levels deep.
>
> Regards,
> David
>
>
>
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4734): https://lists.spdx.org/g/Spdx-tech/message/4734
Mute This Topic: https://lists.spdx.org/mt/92921289/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-