CIL From: David Kemp <[email protected]> Sent: Monday, December 6, 2021 6:58 AM To: William Bartholomew (CELA) <[email protected]> Cc: SPDX-list <[email protected]> Subject: Re: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples
William, Attached is a proposed update to the logical model that allows each Element to be serialized either individually or with other Elements. The value of an Element must not be affected by where or how many documents it is serialized in, which leads to the requirement for optional "context" data that is included in the serialized document but is not part of any Element. 1. A document is a file or sequence of serialized data bytes. [William] Agreed. 1. A Document Element describing the document inherits from Artifact, not Collection [William] See discussion on point #3. 1. Every document contains a single Element plus optional DocumentContext. [William] I think you might have the cardinality indicators reversed. If you want to transfer multiple elements (say a list of licenses or identities) how would you do that? They may not be contextually related which is why we had the distinction between Collection and ContextualCollection (with Document being a concrete example of a Collection). 1. DocumentContext is defined in the logical and information models but is not part of any Element, it exists only in serialized documents. [William] This was a point of contention for a while, and I think Sean will strongly disagree with this. Elements can exist independent of a Document and define which profiles they conform to, who created them and when, etc. 1. DocumentContext contains Element property defaults, namespace and namespaceMap, and other Elements serialized in or referenced from the document. [William] I don't think Element property defaults or namespace would be on DocumentContext for the logical model, a serialization model may need those if it is doing an optimization (such as removing repeated information) but that would be something specific to that serialization model. This would be like a binary representation that assigns an integer identifier to each class, that information would live in the serialization model not the logical model. I don't know why I wrote "invisible at the information level". I meant context is invisible to the universe of linked data Elements because it does not exist in any Element. Namespace and NamespaceMap are included in DocumentContext and are invisible to Elements because Elements always have full IRI ids when they are not serialized. Deserialized Elements always have the required common properties (specVersion etc.) even though they can default to values included in the context when serialized. [William] Agreed, although again I don't think the concept of "defaulting" lives in the logical model, that would be specific to the serialization method. No deserialized Element ever contains another Element. The ability to serialize a bunch of related or unrelated Elements in the same document allows them to be 1) compressed, 2) referenced, and 3) verified as a unit, but that unit is a file, not an Element. [William] Agreed, although a serialization model may choose to represent elements as a hierarchy. Because Document is an Artifact, not a Collection, it doesn't compete with ContextualCollection, the parent of BOM. The DocumentContext component of Document contains serialization context. If there is "real-world context" shared by Elements of a BOM, it seems like it could be captured in the description, summary, and/or comment properties of the BOM Element unless it can be more rigorously defined as a structured property. [William] I'm not sure how it was competing with ContextualCollection? They were siblings under the same base class (Collection). It's not entirely clear to me what problem we're trying to solve here. Misc notes: * the "created" property could be a structure with "by" and "when" properties - semantically they refer to a single event that can be represented by a single structure. [William] Agreed, they only existed in one place, so we hadn't needed a structure to contain them. * Since "verifiedUsing" does not apply to (unserialized) Element values, I suggest removing it from Element and attaching it to Artifact. A Document containing Element(s) is always what is verified. [William] We believed that "verifiedUsing" had application beyond artifacts, for example, identities could have verification information associated with them. Future classes inheriting from Element may want to support verification methods. Sean does have a proposal that we haven't adopted yet that splits identities from the representation of the identity (e.g. person vs email address). In this model the representation of the identity could be an artifact and moving verifiedUsing to artifact may make sense in that model. So yes, we could make this change but we need to make sure any class we have today that should be "verifiable" is an artifact. * ExternalMap indicates what Document is used to verify a particular Element. "elementURL" is redundant with Document's "artifactURL". [William] Possibly, in your model if you reference an external document do you need to copy the document element into the referencing document? If not, then you won't have the artifact url and would need to include it in the external map. Likewise, if you're not copying the document element into the referencing document then you will need verifiedUsing on the external map entry. I could see an argument for having to copy the document (and its verified using) into the referencing document but that's not how the model works today. * dataLicense can presumably be any license identifier, not hardwired to one. [William] This is a point of contention... https://github.com/spdx/spdx-spec/issues/159. I would like to see us adopt your recommendation; the legal team would like to understand the rationale. * The parent of BOM could be called Collection or ContextualCollection. [William] We need to discuss this in more detail based on the prior answers and including Sean. * namespaceMap is 0..*, not 0..1. [William] Agreed, fixed. Regards, Dave On Tue, Nov 30, 2021 at 11:49 AM William Bartholomew (CELA) <[email protected]<mailto:[email protected]>> wrote: Thanks David, I really like this visualization and have framed the problem. As an aside, we've been using the term "logical model" for the information model and "physical model" or "serialization" for how this is represented as "bytes that go over the wire". We have had some requirements that blur the lines between physical and logical and I think that those might not be fully captured by what you have here. For example, I don't think "context is invisible and irrelevant at the information level" because if you deserialized a JSON SPDXv3 document to the information model and then serialized the information model to a YAML SPDXv3 document you would not expect the context to be lost and it would be if it's not in the information model. This is one of the reasons that we added NamespaceMap to the information model, because without it the namespace mappings wouldn't round-trip. I agree that we don't need ContextualCollection, it was added to communicate that the elements are contextually related in some way (described by the context). We could absolutely remove it, have Collection be a concrete class (instead of abstract), but we would lose the ability to communicate that elements have a contextual relationship that's not described by Relationship elements, and maybe we don't need that, I think that's the question on the table. Regards, William Bartholomew (he/him) - Let's chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7Cbe3a99a9e0a9458d761b08d9b8c8cd83%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637743994862700980%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=TVat73urxKXWfqIfpvfRRi6XJdAgEVTJTLi0VLEq6fQ%3D&reserved=0> Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4276): https://lists.spdx.org/g/Spdx-tech/message/4276 Mute This Topic: https://lists.spdx.org/mt/87406810/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
