CIL

From: David Kemp <[email protected]>
Sent: Tuesday, December 7, 2021 5:35 AM
To: William Bartholomew (CELA) <[email protected]>
Cc: SPDX-list <[email protected]>
Subject: Re: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples

Sean emphasizes the independence principle - every Element stands alone, with 
no dependency on any other Element.  That principle is what distinguishes SPDX 
v3 from v2.

1. A document is a file or sequence of serialized data bytes.  You agree.
1a. A file is described by a File Element, which is defined in the software 
package but inherits from Artifact in the core package.
1b. An Element can be created in core to describe a document containing data 
bytes representing serialized SPDX Elements
1c. The logical name for an element that describes an SPDX document is 
Document, which inherits from Artifact.

Therefore:
2. A Document Element that describes an SPDX document inherits from Artifact.  
This shouldn't be controversial. But if it is, we could call it something like 
"SPDX-File" and move on.
[William] This was controversial, and we had a lot of discussion about it 
earlier in the year, the controversy wasn't over the name (although that was an 
aspect of it) but the role of document vs file. The conclusion at the time was 
that the "Document" element is a fundamental part of the serialized data bytes, 
i.e., it was part of the document itself, not merely a description of the 
document. There could still be a File that describes the physical file the 
document is serialized as. Other issues were: hashes of the document can't be 
in the document itself and the Document element providing information to allow 
round-tripping of the document across different formats.

Question for today's meeting: Do we call an Element defined in core that 
describes an SPDX document "Document"?
[William] We can certainly ask that question today, but that's not the point of 
contention, it's the above point.

3a. Every Element stands alone and has an id IRI.
3b. We can create a serialized document containing a single SPDX Element, for 
example an Annotation.
3c. The Annotation Element stands alone even though it contains IRIs for other 
SPDX Elements
3d. A serialized document of an Annotation element can contain nothing else, or 
it can contain Elements referenced by the annotation, or it can contain 
Elements that have nothing to do with the annotation.
3e. An SBOM Element stands alone even though it contains IRIs for other SPDX 
Elements.  SBOM inherits from Collection / ContextualCollection.
3f. A serialized document of an SBOM Element can also contain nothing else, or 
other related or unrelated elements.
3g. A serialized document of an SBOM Element does not inherit from Collection, 
a document is just serialized bytes for any kind of Element.

Therefore:
3. Every document contains a single Element plus optional other stuff 
(DocumentContext).
[William] There's a jump from your above list to "single" element, why is this 
limited to a single element vs multiple elements? For example, my example from 
yesterday afternoon of transferring a list of licenses, it's not an SBOM it's 
just a list of licenses.

Question for today:  Do we agree that every Element is independent of every 
other Element, and that every Element can be serialized into a document file 
containing no other Elements?
[William] This would be a good question, and that is generally true today 
although Sean was wanting to ensure that collections always provide their 
referenced elements so it would be good to close on that decision.


Regards,
Dave


On Mon, Dec 6, 2021 at 7:01 PM William Bartholomew (CELA) 
<[email protected]<mailto:[email protected]>> wrote:
CIL

From: David Kemp <[email protected]<mailto:[email protected]>>
Sent: Monday, December 6, 2021 6:58 AM
To: William Bartholomew (CELA) 
<[email protected]<mailto:[email protected]>>
Cc: SPDX-list <[email protected]<mailto:[email protected]>>
Subject: Re: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples

William,

Attached is a proposed update to the logical model that allows each Element to 
be serialized either individually or with other Elements. The value of an 
Element must not be affected by where or how many documents it is serialized 
in, which leads to the requirement for optional "context" data that is included 
in the serialized document but is not part of any Element.

  1.  A document is a file or sequence of serialized data bytes.
[William] Agreed.

  1.  A Document Element describing the document inherits from Artifact, not 
Collection
[William] See discussion on point #3.

  1.  Every document contains a single Element plus optional DocumentContext.
[William] I think you might have the cardinality indicators reversed. If you 
want to transfer multiple elements (say a list of licenses or identities) how 
would you do that? They may not be contextually related which is why we had the 
distinction between Collection and ContextualCollection (with Document being a 
concrete example of a Collection).

  1.  DocumentContext is defined in the logical and information models but is 
not part of any Element, it exists only in serialized documents.
[William] This was a point of contention for a while, and I think Sean will 
strongly disagree with this. Elements can exist independent of a Document and 
define which profiles they conform to, who created them and when, etc.

  1.  DocumentContext contains Element property defaults, namespace and 
namespaceMap, and other Elements serialized in or referenced from the document.
[William] I don't think Element property defaults or namespace would be on 
DocumentContext for the logical model, a serialization model may need those if 
it is doing an optimization (such as removing repeated information) but that 
would be something specific to that serialization model. This would be like a 
binary representation that assigns an integer identifier to each class, that 
information would live in the serialization model not the logical model.
I don't know why I wrote "invisible at the information level".  I meant context 
is invisible to the universe of linked data Elements because it does not exist 
in any Element. Namespace and NamespaceMap are included in DocumentContext and 
are invisible to Elements because Elements always have full IRI ids when they 
are not serialized.  Deserialized Elements always have the required common 
properties (specVersion etc.) even though they can default to values included 
in the context when serialized.

[William] Agreed, although again I don't think the concept of "defaulting" 
lives in the logical model, that would be specific to the serialization method.

No deserialized Element ever contains another Element. The ability to serialize 
a bunch of related or unrelated Elements in the same document allows them to be 
1) compressed, 2) referenced, and 3) verified as a unit, but that unit is a 
file, not an Element.

[William] Agreed, although a serialization model may choose to represent 
elements as a hierarchy.

Because Document is an Artifact, not a Collection, it doesn't compete with 
ContextualCollection, the parent of BOM.  The DocumentContext component of 
Document contains serialization context.  If there is "real-world context" 
shared by Elements of a BOM, it seems like it could be captured in the 
description, summary, and/or comment properties of the BOM Element unless it 
can be more rigorously defined as a structured property.

[William] I'm not sure how it was competing with ContextualCollection? They 
were siblings under the same base class (Collection). It's not entirely clear 
to me what problem we're trying to solve here.

Misc notes:

  *   the "created" property could be a structure with "by" and "when" 
properties - semantically they refer to a single event that can be represented 
by a single structure.
[William] Agreed, they only existed in one place, so we hadn't needed a 
structure to contain them.

  *   Since "verifiedUsing" does not apply to (unserialized) Element values, I 
suggest removing it from Element and attaching it to Artifact.  A Document 
containing Element(s) is always what is verified.
[William] We believed that "verifiedUsing" had application beyond artifacts, 
for example, identities could have verification information associated with 
them. Future classes inheriting from Element may want to support verification 
methods. Sean does have a proposal that we haven't adopted yet that splits 
identities from the representation of the identity (e.g. person vs email 
address). In this model the representation of the identity could be an artifact 
and moving verifiedUsing to artifact may make sense in that model. So yes, we 
could make this change but we need to make sure any class we have today that 
should be "verifiable" is an artifact.

  *   ExternalMap indicates what Document is used to verify a particular 
Element.  "elementURL" is redundant with Document's "artifactURL".
[William] Possibly, in your model if you reference an external document do you 
need to copy the document element into the referencing document? If not, then 
you won't have the artifact url and would need to include it in the external 
map. Likewise, if you're not copying the document element into the referencing 
document then you will need verifiedUsing on the external map entry. I could 
see an argument for having to copy the document (and its verified using) into 
the referencing document but that's not how the model works today.

  *   dataLicense can presumably be any license identifier, not hardwired to 
one.
[William] This is a point of contention... 
https://github.com/spdx/spdx-spec/issues/159<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Fspdx-spec%2Fissues%2F159&data=04%7C01%7Cwillbar%40microsoft.com%7C239408d8a4324a842bee08d9b98674a9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637744809410478670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uNdOTadMcmwTQBxkVIEiFPCZ055bNAweRbuHjpzHUDA%3D&reserved=0>.
 I would like to see us adopt your recommendation; the legal team would like to 
understand the rationale.

  *   The parent of BOM could be called Collection or ContextualCollection.
[William] We need to discuss this in more detail based on the prior answers and 
including Sean.

  *   namespaceMap is 0..*, not 0..1.
[William] Agreed, fixed.

Regards,
Dave
On Tue, Nov 30, 2021 at 11:49 AM William Bartholomew (CELA) 
<[email protected]<mailto:[email protected]>> wrote:
Thanks David, I really like this visualization and have framed the problem. As 
an aside, we've been using the term "logical model" for the information model 
and "physical model" or "serialization" for how this is represented as "bytes 
that go over the wire".

We have had some requirements that blur the lines between physical and logical 
and I think that those might not be fully captured by what you have here. For 
example, I don't think "context is invisible and irrelevant at the information 
level" because if you deserialized a JSON SPDXv3 document to the information 
model and then serialized the information model to a YAML SPDXv3 document you 
would not expect the context to be lost and it would be if it's not in the 
information model. This is one of the reasons that we added NamespaceMap to the 
information model, because without it the namespace mappings wouldn't 
round-trip.

I agree that we don't need ContextualCollection, it was added to communicate 
that the elements are contextually related in some way (described by the 
context). We could absolutely remove it, have Collection be a concrete class 
(instead of abstract), but we would lose the ability to communicate that 
elements have a contextual relationship that's not described by Relationship 
elements, and maybe we don't need that, I think that's the question on the 
table.


Regards,

William Bartholomew (he/him) - Let's 
chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C239408d8a4324a842bee08d9b98674a9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637744809410478670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gtf54PfNB4NZsE3eHf%2BjpxwK3fiYxtQkJiTxdbDPL0c%3D&reserved=0>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft

My working day may not be your working day. Please don't feel obliged to reply 
to this e-mail outside of your normal working hours.




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4278): https://lists.spdx.org/g/Spdx-tech/message/4278
Mute This Topic: https://lists.spdx.org/mt/87406810/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to