We should avoid the word “relationship” for this discussion, relationships are
something else entirely. In this discussion we’re really referring to
properties in the model that are references to elements (relationships have two
of those types of properties but the relationship itself is not one of these).
You’re assuming a couple of things about nesting that I wasn’t intending. How
I’m intending nesting to work, these are logically equivalent, would hash to
the same value (because canonicalization would be over the referenced IRIs),
and you could switch between them without affecting the hash or validation:
Nested:
{ "id": "urn:microsoft:windows:10:sbom", type: "sbom", "elements": [ { "id":
"urn:microsoft:calculator:10.0.1.0", "type": "package", … } ] }
Peers:
"elements": [
{ "id": "urn:microsoft:windows:10:sbom", type: "sbom", "elements": [
"urn:microsoft:calculator:10.0.1.0" ] },
{ "id": "urn:microsoft:calculator:10.0.1.0", "type": "package", … }
]
Externalized:
"elements": [
{ "id": "urn:microsoft:windows:10:sbom", type: "sbom", "elements": [
"calc:10.0.1.0" ] }
],
"externalMap": [
{ "externalId": "calc:10.0.1.0", "locationHint":
"https://sbom.microsoft.com/calculator/10.0.1.0", "verifiedUsing": [ … ] }
]
This still has the properties (pun intended) that you desire: there is element
independence, you’re not forced into a nested structure, you’re not prevented
from using a nested structure, you can use subsets without breaking the signing
of the elements.
P.S. The Windows SBOM is over a hundred megabytes (though it should come down
when we move from SPDX 2.2 to 2.3).
Regards,
William Bartholomew (he/him) – Let’s
chat<https://outlook.office.com/bookwithme/user/[email protected]/meetingtype/SVRwCe7HMUGxuT6WGxi68g2?anonymous&ep=mlink>
Principal Security Strategist
Global Cybersecurity Policy – Microsoft
My working day may not be your working day. Please don’t feel obliged to reply
to this e-mail outside of your normal working hours.
From: David Kemp <[email protected]>
Sent: Tuesday, July 26, 2022 7:59 AM
To: William Bartholomew (CELA) <[email protected]>
Cc: SPDX-list <[email protected]>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization
I'll try again with an example:
* An SBOM for Windows 10 is a Collection that could have millions of
elements, yes? The serialized file containing the values of those elements
could be megabytes.
* An SBOM for My App is a Collection with a few elements. My App runs on /
depends on Windows.
When I serialize the SBOM for My App, how big is the file? Megabytes or
kilobytes? That is my definition of "depends on".
If Microsoft serializes and signs the file containing the Windows SBOM and its
millions of elements, the chain of integrity is broken if the MyApp SBOM file
has a copy of Windows element values instead of references.
The difference between a logical model and a data model is that the logical
model doesn't care how relationships are implemented, they just exist. A data
model *defines* how relationships are implemented - as either nested values or
a map/array of independent values. My definition of "depends on" is: the value
(and hash) of every element is independent of the value (and hash) of every
other element. Elements cannot be nested, an SBOM (Collection) element must
have an array of IRIs, not a map/array of values. That requirement exists at
the data level because a pure logical model doesn't care. But to the extent
that data shapes are hybridized into the SPDX model, it must also require that
independence.
Note that the full SpdxFile is not an array, it is an object with namespace,
namespaceMap, defaults, elementValues, and references to other SpdxFiles. But
the elementValues property is an array because the values aren't nested.
Regards,
David
On Tue, Jul 19, 2022 at 12:47 AM William Bartholomew (CELA) via
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C0c3a71fe2a414800e41e08da6f175a70%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637944443357537408%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SA%2BzYBIepiFnopqMJC35IEBAG4O9LnUsaR%2FdDNoyEQA%3D&reserved=0>
<[email protected]<mailto:[email protected]>>
wrote:
CIL
________________________________
From: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>> on behalf of David
Kemp via
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C0c3a71fe2a414800e41e08da6f175a70%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637944443357693650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bb8zsxr7D8bT4kabnSo4Z%2BB%2B6IvK%2F8eaBc9abOp0ShI%3D&reserved=0>
<[email protected]<mailto:[email protected]>>
Sent: Monday, July 18, 2022 6:18 PM
To: SPDX-list <[email protected]<mailto:[email protected]>>
Subject: [EXTERNAL] Re: [spdx-tech] V3 serialization
One principle is that the goal of serialization is to put Elements into
physical format, NOT to create new elements that didn't exist prior to
serialization. If you have 6 elements going into serialization, you should
have 6 elements coming out, not 7.
[William] Agreed, does my example violate that? It would be difficult for a
serialization to "generate" elements because of the id and other required
properties so I had not considered this a possibility.
The second principle is that logical elements should be independent: the value
of one element does not depend on the value of any other element.
[William] I think it depends on your definition of "depends on" (pun intended).
Elements may have properties that are references to other elements and
serializers may choose to use that information for more compact serialization
but since this would get unwound on deserialization that's immaterial.
I believe that those two principles are worth adopting as design requirements.
It is ugly to put something into serialization and get something else back out,
[William] Agreed, though a lot of serializers/deserializers end up making minor
changes as a result of normalization and other processes. Not ideal but that's
an implementation detail within each serializer/deserializer.
and it's really ugly to stuff one element's value inside another
[William] I don't agree with this, at least for "collection" elements. Also,
the serialization model for collection elements could support either element
references or the element itself so if you think it's ugly then you would have
the option of not doing nesting.
not least because you can wind up with infinite recursion with documents inside
documents inside documents inside documents
[William] This is avoidable and using references instead of nesting doesn't
prevent this problem. In fact, if you only use nesting then it's impossible to
have infinite recursion, it's only when you use references that becomes
possible.
Even two levels of element nesting makes things quite difficult to disentangle.
[William] I don't agree, for collections the nesting makes it obvious which
collection an element is part of without having to follow the id references.
Since the serialization model could support either approach I don't see this
being a blocker.
The fundamental principle is that a file containing data is not an element. A
Transfer Unit is defined by a data schema, just like the content of any XML
file or JSON file or ASN.1 file. If the logical model has a Document element
that describes an X.509 certificate, that element has interesting facts about
the certificate but does not define its content. It is essential to remember
the difference between the bytes in a file and the properties of a File or
Document element - the difference between a thing and metadata about that thing.
[William] We've had this discussion a number of times, the Collection element
(and its subclasses) aren't metadata about collections, document, SBOM, etc.
they are the collection, document, SBOM, etc. There is no "physical" thing
outside of the SPDX document that is the collection, document, SBOM, etc., they
only exist in the SPDX graph. You could take that SBOM, serialize it to disk,
and then have a File element that talks about the physical serialization of the
SBOM, but that's different to the SBOM SPDX element.
* defaults:
I created a separate defaults property to hold the five defaultable properties
in order to distinguish them from non-defaultable properties. Gary and I like
the idea, but I'm not wedded to it. The transfer unit schema could have
"defaultCreatedBy", "defaultCreated", etc properties at the top level, to
highlight that they are defaults, unlike name, description, comments, etc.
Whatever the mechanism, there must be a way to ensure that "name" doesn't take
an inappropriate default value if it isn't populated, while the default for
"profiles" is appropriate.
[William] I'm struggling with multiple properties that have the same definition
having different names and different locations on the objects, it feels like a
lot to explain. We could flag certain properties as inheritable in the schema,
and this only applies to collection elements so I think the scope is quite
narrow.
* array vs map
I used map as a conversation starter, because it fits the "unique" semantics of
element ids, and because mapping types are ubiquitous now, XML schema had it
in 2005
https://www.w3.org/2005/07/xml-schema-patterns.html#Maps<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2005%2F07%2Fxml-schema-patterns.html%23Maps&data=05%7C01%7Cwillbar%40microsoft.com%7C0c3a71fe2a414800e41e08da6f175a70%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637944443357693650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=o847nSOOFU2Gi84%2BAB4L0HLmH%2BdyHeSL2%2BOKI5ablyU%3D&reserved=0>,
and it's a built-in part of JSON.
[William] While it is built-in to XML and JSON my experience is that it's not
been supported well by schema languages and serializers/deserializers. I know
I've had situations where I had to duplicate the id property in the class to
ensure that other things work correctly (and to maintain the independence of
the class). Also, in most object oriented languages there is not a way to get
the key from the object so you end up having to track the key independently of
the object which is a pain.
JSON-LD even treats ID differently from other properties by giving it a
reserved @ID type, and SQL databases have primary keys with the special
characteristic that they uniquely identify the record rather than being just
another column. Autogenerated ids are often hidden because they are ubiquitous.
[William] In both JSON-LD and SQL the properties are still normal properties,
in JSON-LD it's still a property on the object it just has a special name, in
SQL it's still a column in the table it just has special metadata attached to
it. Even autogenerated ids are typically normal columns they're just system
generated and you can't change their definition.
And finally, you introduced Map to the logical model for Extensions. If it's
OK for extensions, it's OK for Elements :-).
[William] Not the same 🙂. The map for extensions is a map of "extension type"
to value, not of "id" to value. It is a consequence of us deciding that each
type can only be assigned once that it can also be used as an id, but it is
primarily a type, not an id. If we changed that design decision it would no
longer function as an id.
Seriously though, I'm not wedded to Map. Treating Id as any other property but
having some prose saying that it can be used as a primary key / unique
identifier is OK, it's just kind of loose given that references from foreign to
primary keys is a universal concept.
[William] In SQL Server (others are similar) a foreign key takes the form
FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1,
ParentCol2, ...), they're still just columns, nothing magical about them, not
even their names.
* type property
Since JSON does not have types it's good practice to ensure that "type:
identity" cannot collide with a property named "identity". At the core profile
all type and property names are defined and don't collide, but if "type" goes
away we'll need to ensure that properties defined in any profile cannot collide
with types defined in any profile. Again JSON-LD treats @type as a reserved
property:
https://w3c.github.io/json-ld-syntax/#typed-values<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Fjson-ld-syntax%2F%23typed-values&data=05%7C01%7Cwillbar%40microsoft.com%7C0c3a71fe2a414800e41e08da6f175a70%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637944443357693650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Hp3dnf6V1B6phOd9GaWTgELT3KhcEBoKLatnvQEz6BE%3D&reserved=0>.
[William] Agreed, and type isn't in the logical model, a JSON-LD serializer
would use @type, an XML one would use XML namespaces and element names, a
ProtoBuf one would use message types. Since my examples were "plain" JSON which
does not have a built-in way of declaring types I used a "plain" property to
capture the type, I agree that the name of this property should avoid potential
conflict (e.g. by prefixing with an _).
* document root
A transfer unit file is not an Element and not a logical type or a class. The
bytes in SPDX documents are not defined by the logical model, they just have to
be able to be de-serialized into element instances.
[William] Same disagreement as above.
Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, they
define only data types.
[William] I'm not sure what definition of "class" you're using here, but the
boxes on the diagram could be represent in an OO language as classes or
interfaces, for our purposes I don't think the distinction between class and
data type is meaningful.
Regards,
Dave
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4695): https://lists.spdx.org/g/Spdx-tech/message/4695
Mute This Topic: https://lists.spdx.org/mt/92468742/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-