CIL From: David Kemp <[email protected]> Sent: Tuesday, December 7, 2021 5:35 AM To: William Bartholomew (CELA) <[email protected]> Cc: SPDX-list <[email protected]> Subject: Re: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples
Sean emphasizes the independence principle - every Element stands alone, with no dependency on any other Element. That principle is what distinguishes SPDX v3 from v2. 1. A document is a file or sequence of serialized data bytes. You agree. 1a. A file is described by a File Element, which is defined in the software package but inherits from Artifact in the core package. 1b. An Element can be created in core to describe a document containing data bytes representing serialized SPDX Elements 1c. The logical name for an element that describes an SPDX document is Document, which inherits from Artifact. Therefore: 2. A Document Element that describes an SPDX document inherits from Artifact. This shouldn't be controversial. But if it is, we could call it something like "SPDX-File" and move on. [William] This was controversial, and we had a lot of discussion about it earlier in the year, the controversy wasn't over the name (although that was an aspect of it) but the role of document vs file. The conclusion at the time was that the "Document" element is a fundamental part of the serialized data bytes, i.e., it was part of the document itself, not merely a description of the document. There could still be a File that describes the physical file the document is serialized as. Other issues were: hashes of the document can't be in the document itself and the Document element providing information to allow round-tripping of the document across different formats. Question for today's meeting: Do we call an Element defined in core that describes an SPDX document "Document"? [William] We can certainly ask that question today, but that's not the point of contention, it's the above point. 3a. Every Element stands alone and has an id IRI. 3b. We can create a serialized document containing a single SPDX Element, for example an Annotation. 3c. The Annotation Element stands alone even though it contains IRIs for other SPDX Elements 3d. A serialized document of an Annotation element can contain nothing else, or it can contain Elements referenced by the annotation, or it can contain Elements that have nothing to do with the annotation. 3e. An SBOM Element stands alone even though it contains IRIs for other SPDX Elements. SBOM inherits from Collection / ContextualCollection. 3f. A serialized document of an SBOM Element can also contain nothing else, or other related or unrelated elements. 3g. A serialized document of an SBOM Element does not inherit from Collection, a document is just serialized bytes for any kind of Element. Therefore: 3. Every document contains a single Element plus optional other stuff (DocumentContext). [William] There's a jump from your above list to "single" element, why is this limited to a single element vs multiple elements? For example, my example from yesterday afternoon of transferring a list of licenses, it's not an SBOM it's just a list of licenses. Question for today: Do we agree that every Element is independent of every other Element, and that every Element can be serialized into a document file containing no other Elements? [William] This would be a good question, and that is generally true today although Sean was wanting to ensure that collections always provide their referenced elements so it would be good to close on that decision. Regards, Dave On Mon, Dec 6, 2021 at 7:01 PM William Bartholomew (CELA) <[email protected]<mailto:[email protected]>> wrote: CIL From: David Kemp <[email protected]<mailto:[email protected]>> Sent: Monday, December 6, 2021 6:58 AM To: William Bartholomew (CELA) <[email protected]<mailto:[email protected]>> Cc: SPDX-list <[email protected]<mailto:[email protected]>> Subject: Re: [EXTERNAL] Re: [spdx-tech] SPDX v3 Serialization, with examples William, Attached is a proposed update to the logical model that allows each Element to be serialized either individually or with other Elements. The value of an Element must not be affected by where or how many documents it is serialized in, which leads to the requirement for optional "context" data that is included in the serialized document but is not part of any Element. 1. A document is a file or sequence of serialized data bytes. [William] Agreed. 1. A Document Element describing the document inherits from Artifact, not Collection [William] See discussion on point #3. 1. Every document contains a single Element plus optional DocumentContext. [William] I think you might have the cardinality indicators reversed. If you want to transfer multiple elements (say a list of licenses or identities) how would you do that? They may not be contextually related which is why we had the distinction between Collection and ContextualCollection (with Document being a concrete example of a Collection). 1. DocumentContext is defined in the logical and information models but is not part of any Element, it exists only in serialized documents. [William] This was a point of contention for a while, and I think Sean will strongly disagree with this. Elements can exist independent of a Document and define which profiles they conform to, who created them and when, etc. 1. DocumentContext contains Element property defaults, namespace and namespaceMap, and other Elements serialized in or referenced from the document. [William] I don't think Element property defaults or namespace would be on DocumentContext for the logical model, a serialization model may need those if it is doing an optimization (such as removing repeated information) but that would be something specific to that serialization model. This would be like a binary representation that assigns an integer identifier to each class, that information would live in the serialization model not the logical model. I don't know why I wrote "invisible at the information level". I meant context is invisible to the universe of linked data Elements because it does not exist in any Element. Namespace and NamespaceMap are included in DocumentContext and are invisible to Elements because Elements always have full IRI ids when they are not serialized. Deserialized Elements always have the required common properties (specVersion etc.) even though they can default to values included in the context when serialized. [William] Agreed, although again I don't think the concept of "defaulting" lives in the logical model, that would be specific to the serialization method. No deserialized Element ever contains another Element. The ability to serialize a bunch of related or unrelated Elements in the same document allows them to be 1) compressed, 2) referenced, and 3) verified as a unit, but that unit is a file, not an Element. [William] Agreed, although a serialization model may choose to represent elements as a hierarchy. Because Document is an Artifact, not a Collection, it doesn't compete with ContextualCollection, the parent of BOM. The DocumentContext component of Document contains serialization context. If there is "real-world context" shared by Elements of a BOM, it seems like it could be captured in the description, summary, and/or comment properties of the BOM Element unless it can be more rigorously defined as a structured property. [William] I'm not sure how it was competing with ContextualCollection? They were siblings under the same base class (Collection). It's not entirely clear to me what problem we're trying to solve here. Misc notes: * the "created" property could be a structure with "by" and "when" properties - semantically they refer to a single event that can be represented by a single structure. [William] Agreed, they only existed in one place, so we hadn't needed a structure to contain them. * Since "verifiedUsing" does not apply to (unserialized) Element values, I suggest removing it from Element and attaching it to Artifact. A Document containing Element(s) is always what is verified. [William] We believed that "verifiedUsing" had application beyond artifacts, for example, identities could have verification information associated with them. Future classes inheriting from Element may want to support verification methods. Sean does have a proposal that we haven't adopted yet that splits identities from the representation of the identity (e.g. person vs email address). In this model the representation of the identity could be an artifact and moving verifiedUsing to artifact may make sense in that model. So yes, we could make this change but we need to make sure any class we have today that should be "verifiable" is an artifact. * ExternalMap indicates what Document is used to verify a particular Element. "elementURL" is redundant with Document's "artifactURL". [William] Possibly, in your model if you reference an external document do you need to copy the document element into the referencing document? If not, then you won't have the artifact url and would need to include it in the external map. Likewise, if you're not copying the document element into the referencing document then you will need verifiedUsing on the external map entry. I could see an argument for having to copy the document (and its verified using) into the referencing document but that's not how the model works today. * dataLicense can presumably be any license identifier, not hardwired to one. [William] This is a point of contention... https://github.com/spdx/spdx-spec/issues/159<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Fspdx-spec%2Fissues%2F159&data=04%7C01%7Cwillbar%40microsoft.com%7C239408d8a4324a842bee08d9b98674a9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637744809410478670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=uNdOTadMcmwTQBxkVIEiFPCZ055bNAweRbuHjpzHUDA%3D&reserved=0>. I would like to see us adopt your recommendation; the legal team would like to understand the rationale. * The parent of BOM could be called Collection or ContextualCollection. [William] We need to discuss this in more detail based on the prior answers and including Sean. * namespaceMap is 0..*, not 0..1. [William] Agreed, fixed. Regards, Dave On Tue, Nov 30, 2021 at 11:49 AM William Bartholomew (CELA) <[email protected]<mailto:[email protected]>> wrote: Thanks David, I really like this visualization and have framed the problem. As an aside, we've been using the term "logical model" for the information model and "physical model" or "serialization" for how this is represented as "bytes that go over the wire". We have had some requirements that blur the lines between physical and logical and I think that those might not be fully captured by what you have here. For example, I don't think "context is invisible and irrelevant at the information level" because if you deserialized a JSON SPDXv3 document to the information model and then serialized the information model to a YAML SPDXv3 document you would not expect the context to be lost and it would be if it's not in the information model. This is one of the reasons that we added NamespaceMap to the information model, because without it the namespace mappings wouldn't round-trip. I agree that we don't need ContextualCollection, it was added to communicate that the elements are contextually related in some way (described by the context). We could absolutely remove it, have Collection be a concrete class (instead of abstract), but we would lose the ability to communicate that elements have a contextual relationship that's not described by Relationship elements, and maybe we don't need that, I think that's the question on the table. Regards, William Bartholomew (he/him) - Let's chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C239408d8a4324a842bee08d9b98674a9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637744809410478670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gtf54PfNB4NZsE3eHf%2BjpxwK3fiYxtQkJiTxdbDPL0c%3D&reserved=0> Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4278): https://lists.spdx.org/g/Spdx-tech/message/4278 Mute This Topic: https://lists.spdx.org/mt/87406810/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
