CIL

From: David Kemp <[email protected]>
Sent: Monday, December 6, 2021 6:58 AM
To: William Bartholomew (CELA) <[email protected]>
Cc: SPDX-list <[email protected]>
Subject: Re: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples

William,

Attached is a proposed update to the logical model that allows each Element to 
be serialized either individually or with other Elements. The value of an 
Element must not be affected by where or how many documents it is serialized 
in, which leads to the requirement for optional "context" data that is included 
in the serialized document but is not part of any Element.

  1.  A document is a file or sequence of serialized data bytes.
[William] Agreed.

  1.  A Document Element describing the document inherits from Artifact, not 
Collection
[William] See discussion on point #3.

  1.  Every document contains a single Element plus optional DocumentContext.
[William] I think you might have the cardinality indicators reversed. If you 
want to transfer multiple elements (say a list of licenses or identities) how 
would you do that? They may not be contextually related which is why we had the 
distinction between Collection and ContextualCollection (with Document being a 
concrete example of a Collection).

  1.  DocumentContext is defined in the logical and information models but is 
not part of any Element, it exists only in serialized documents.
[William] This was a point of contention for a while, and I think Sean will 
strongly disagree with this. Elements can exist independent of a Document and 
define which profiles they conform to, who created them and when, etc.

  1.  DocumentContext contains Element property defaults, namespace and 
namespaceMap, and other Elements serialized in or referenced from the document.
[William] I don't think Element property defaults or namespace would be on 
DocumentContext for the logical model, a serialization model may need those if 
it is doing an optimization (such as removing repeated information) but that 
would be something specific to that serialization model. This would be like a 
binary representation that assigns an integer identifier to each class, that 
information would live in the serialization model not the logical model.
I don't know why I wrote "invisible at the information level".  I meant context 
is invisible to the universe of linked data Elements because it does not exist 
in any Element. Namespace and NamespaceMap are included in DocumentContext and 
are invisible to Elements because Elements always have full IRI ids when they 
are not serialized.  Deserialized Elements always have the required common 
properties (specVersion etc.) even though they can default to values included 
in the context when serialized.

[William] Agreed, although again I don't think the concept of "defaulting" 
lives in the logical model, that would be specific to the serialization method.

No deserialized Element ever contains another Element. The ability to serialize 
a bunch of related or unrelated Elements in the same document allows them to be 
1) compressed, 2) referenced, and 3) verified as a unit, but that unit is a 
file, not an Element.

[William] Agreed, although a serialization model may choose to represent 
elements as a hierarchy.

Because Document is an Artifact, not a Collection, it doesn't compete with 
ContextualCollection, the parent of BOM.  The DocumentContext component of 
Document contains serialization context.  If there is "real-world context" 
shared by Elements of a BOM, it seems like it could be captured in the 
description, summary, and/or comment properties of the BOM Element unless it 
can be more rigorously defined as a structured property.

[William] I'm not sure how it was competing with ContextualCollection? They 
were siblings under the same base class (Collection). It's not entirely clear 
to me what problem we're trying to solve here.

Misc notes:

  *   the "created" property could be a structure with "by" and "when" 
properties - semantically they refer to a single event that can be represented 
by a single structure.
[William] Agreed, they only existed in one place, so we hadn't needed a 
structure to contain them.

  *   Since "verifiedUsing" does not apply to (unserialized) Element values, I 
suggest removing it from Element and attaching it to Artifact.  A Document 
containing Element(s) is always what is verified.
[William] We believed that "verifiedUsing" had application beyond artifacts, 
for example, identities could have verification information associated with 
them. Future classes inheriting from Element may want to support verification 
methods. Sean does have a proposal that we haven't adopted yet that splits 
identities from the representation of the identity (e.g. person vs email 
address). In this model the representation of the identity could be an artifact 
and moving verifiedUsing to artifact may make sense in that model. So yes, we 
could make this change but we need to make sure any class we have today that 
should be "verifiable" is an artifact.

  *   ExternalMap indicates what Document is used to verify a particular 
Element.  "elementURL" is redundant with Document's "artifactURL".
[William] Possibly, in your model if you reference an external document do you 
need to copy the document element into the referencing document? If not, then 
you won't have the artifact url and would need to include it in the external 
map. Likewise, if you're not copying the document element into the referencing 
document then you will need verifiedUsing on the external map entry. I could 
see an argument for having to copy the document (and its verified using) into 
the referencing document but that's not how the model works today.

  *   dataLicense can presumably be any license identifier, not hardwired to 
one.
[William] This is a point of contention... 
https://github.com/spdx/spdx-spec/issues/159. I would like to see us adopt your 
recommendation; the legal team would like to understand the rationale.

  *   The parent of BOM could be called Collection or ContextualCollection.
[William] We need to discuss this in more detail based on the prior answers and 
including Sean.

  *   namespaceMap is 0..*, not 0..1.
[William] Agreed, fixed.

Regards,
Dave

On Tue, Nov 30, 2021 at 11:49 AM William Bartholomew (CELA) 
<[email protected]<mailto:[email protected]>> wrote:
Thanks David, I really like this visualization and have framed the problem. As 
an aside, we've been using the term "logical model" for the information model 
and "physical model" or "serialization" for how this is represented as "bytes 
that go over the wire".

We have had some requirements that blur the lines between physical and logical 
and I think that those might not be fully captured by what you have here. For 
example, I don't think "context is invisible and irrelevant at the information 
level" because if you deserialized a JSON SPDXv3 document to the information 
model and then serialized the information model to a YAML SPDXv3 document you 
would not expect the context to be lost and it would be if it's not in the 
information model. This is one of the reasons that we added NamespaceMap to the 
information model, because without it the namespace mappings wouldn't 
round-trip.

I agree that we don't need ContextualCollection, it was added to communicate 
that the elements are contextually related in some way (described by the 
context). We could absolutely remove it, have Collection be a concrete class 
(instead of abstract), but we would lose the ability to communicate that 
elements have a contextual relationship that's not described by Relationship 
elements, and maybe we don't need that, I think that's the question on the 
table.


Regards,

William Bartholomew (he/him) - Let's 
chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7Cbe3a99a9e0a9458d761b08d9b8c8cd83%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637743994862700980%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=TVat73urxKXWfqIfpvfRRi6XJdAgEVTJTLi0VLEq6fQ%3D&reserved=0>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft

My working day may not be your working day. Please don't feel obliged to reply 
to this e-mail outside of your normal working hours.




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4276): https://lists.spdx.org/g/Spdx-tech/message/4276
Mute This Topic: https://lists.spdx.org/mt/87406810/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to