William, *[David]* In v2 Document means "The SBOM" and document means "the bytes of the serialized SBOM". There is no "SBOM Information" analogous to Package/File/Annotation Information in v2.
*[William] *This isn’t a correct comparison. v2 did not have a concept of SBOM, as Gary mentioned most people had a root Package in v2 documents which you could equate to the SBOM. Document was the Document Creation Information and anything else at the root. If I was sharing a list of Annotations in SPDX v2 that would not be an SBOM. I'm not trying to play word games. CycloneDX calls their files SBOMs. SPDX calls their files ... *something*, and NTIA thinks they are SBOMs. I don't care what word you want to use, just pick one and I'll happily use it. An "SPDX v2.2 Document", whatever you want to call it, is a serialized sequence of bytes that contains exactly one Document Creation Information plus zero or more of each of the six things shown as stacks in the picture: Package Information, File Information, Snippet Information, Other Licensing Information, Relationships, and Annotations. The question is if you have one Document Creation Information plus three Annotations in an "SPDX v2.2 Document", then how many Elements are in the corresponding serialized "SPDX v3 Document"? SPDX v3 has an Element that is currently called SBOM that inherits from ContextualCollection. My position is that an SPDX v3 artifact carrying the same info as that v2.2 example has 3 Annotation Elements. If the v2.2 file had one Package Information and 3 Annotation Informations, then the v3 file would have a ContextualCollection Element (currently called BOM/SBOM) and 3 Annotations. The document artifact is a sequence of bytes and if you have a Document Element that describes a document artifact, then the Element inherits from Artifact. Please don't argue about names, I don't care about the names. There is *some* Element that inherits from Artifact that describes an SPDX v3 document, call it Foo if you want. There is *some other* Element that the logical model currently calls Document that inherits from Collection, and there is a *third* Element that the logical model currently calls SBOM that also inherits from Collection. My position is simple - the serialized document bytes can optionally contain a ContextualCollection element (that I'll call SBOM until the logical model calls it something else) plus three Annotation Elements. The serialized document does not need to contain a ContextualCollection element PLUS a NonContextualCollection element (currently called Document) plus three Annotation Elements. That works regardless of what you want to serialize: * One annotation is serialized as DocumentContext plus one Element. * Three annotations without a ContextualCollection are serialized as DocumentContext plus three Elements. * Those three annotations are not serialized as DocumentContext plus a NonContextualCollection Element plus three Annotation Elements. Putting a serialized NonContextualCollection Element in the serialized data serves no purpose, it is just a waste of space. DocumentContext is my name for serialization information for the document. V2.2 calls it Document Creation Information, you can call it whatever you want. It contains document-default values for SpecVersion, Creation Information, Profile Identifier, DataLicense, NamespaceMap (prefixes including the null prefix), ElementValues (Elements serialized in this document), and ExternalMap (references to Elements serialized in other documents including optional integrity information). Dave On Tue, Dec 14, 2021 at 6:38 PM William Bartholomew (CELA) < [email protected]> wrote: > CIL > > > > *From:* [email protected] <[email protected]> *On Behalf Of > *David Kemp via lists.spdx.org > *Sent:* Tuesday, December 14, 2021 1:51 PM > *To:* SPDX-list <[email protected]> > *Subject:* [EXTERNAL] [spdx-tech] Do serialized documents contain the > bytes of a Document Element? > > > > William has been gathering punch list questions during the meetings, and > we've tried to avoid taking up meeting time on engineering the solutions. > This is my perspective on the subject question. > > * A document is a sequence of bytes. This is normally a file, but as > Nisha pointed out it could also be the response to an SQL query, or to any > API call. A File Element has an artifactUri, a media type, and a > filePurpose - if this is sufficient to describe a byte sequence returned > from an API at the artifactUri address, great. If not, the logical model > could define a new Artifact sub-type for byte sequences returned from API > calls. > > * An Artifact element (File or a new "ByteSequence" type) describes a > sequence of bytes. > > * A sequence of bytes can be signed or hashed. (The details of what to > hash, i.e., how to canonicalize a byte sequence, can be worked out later.) > > > > If we have a File Element referring to a JPG file, the bytes of the File > Element are not included in the bytes of the JPG file. If we have a File > Element referring to an SPDXv2 file, the bytes of the File Element are not > included in the SPDXv2 file. > So if we have a File Element referring to an SPDXv3 file, the bytes of the > FileElement are not required to be included in the SPDXv3 file. > > > > The problem with answering the question is: > > SPDXv2.2 Document contains: > > * Document Creation Information > > * Package Information > > * ... > > * Annotation Information > > But the SPDXv3 logical model currently says: > > SPDXv3 Document contains: > * Document Element (NOT Document Creation Information) > > *[William] *It’s technically both the description of the document and the > document creation information. To enable elements to standalone any element > can have creation information, but the document is also an element, so its > creation information lives there rather than a standalone class. The > document has metadata about the document, from SPDX 2.2: SPDX Version, Data > License (now on all elements including Document), SPDX Identifier (on all > elements including Document), Document Name (now on all elements including > Document – renamed to “name”), Document Namespace (this was removed as a > standalone concept, however, the document’s SPDX Identifier does include a > namespace portion), Creator (now on all elements including Document), > Created (now on all elements including Document), Creator Comment (now on > all elements including Document – renamed to “comment”), Document Comment > (merged with “Creator Comment”). > > > > * SBOM Element(s) > > * Package Element(s) > > * ... > * Annotation Element(s) > > > > In v2 Document means "The SBOM" and document means "the bytes of the > serialized SBOM". There is no "SBOM Information" analogous to > Package/File/Annotation Information in v2. > > *[William] *This isn’t a correct comparison. v2 did not have a concept of > SBOM, as Gary mentioned most people had a root Package in v2 documents > which you could equate to the SBOM. Document was the Document Creation > Information and anything else at the root. If I was sharing a list of > Annotations in SPDX v2 that would not be an SBOM. > > > > In v3 SBOM means "The SBOM" and document means "the bytes of the > serialized SBOM". > > *[William] *An SPDX document does not need to carry an SBOM, it could be > a list of annotations, or a list of licenses, those are not an SBOM. The > Document element represents the metadata about the document (who created > it, when it was created, gives it an identity that can be used in > relationships to relate the document to other documents), this must live > inside the serialized document otherwise you would always need two > documents (in fact you may need infinite SPDX documents 😊). > > > > The Document Element in v3 is the Artifact referring to "the bytes of the > serialized SBOM". > > *[William] *You could have an artifact that describes an SPDX document, > but that doesn’t negate the need for a document to be self-describing which > is what the Document element intends to do. > > > (In v3 any Element can be serialized; I'm referring to implementing the v2 > use case of serializing an SBOM in v3.) > > *[William] *While any Element can be serialized there’s an open question > of whether the serialization requires a Document or not, I can see > arguments in both directions. The creation information on the element tells > you who created the element, but it doesn’t tell you who created the > serialized file, do we need that? In same cases who created the “anthology” > by bringing all the elements together into a document is interesting and > creating that “anthology” may have legal meaning (I’ll defer to the legal > team on that) and capturing who did it, when, and what license they apply > to that anthology may be important. > > The SBOM Element (like all Elements) has its own creation information, so > when it is serialized the document creation information can be (but is not > constrained to be) that of the SBOM Element. Document creation information > is serialized in DocumentContext (along with NamespaceMap), and if the > document creator wishes, SBOM creation information can override document > creation information. > > *[William] *In this model I think “DocumentContext” is really just > “Document”. It would be very complex to have overrides from SBOM to > Document, a Document could “contain” multiple SBOMs which would be > ambiguous and overriding will likely make integrity even more difficult. > > > > So requiring a Document Element in addition to an SBOM Element in the > serialized bytes is a departure from v2, not consistent with it. The > logical model should allow serialization of a single SBOM Element plus > context, assorted annotations, relationships, licenses, identities, etc., > just as v2 allows. > > *[William] *I don’t think this conclusion follows naturally since v2 > doesn’t have an SBOM element, what you refer to as “context” at the root of > a v2 document is “Document”. In RDF it’s the SpdxDocument element > <https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXRdfExample-v2.2.spdx.rdf.xml#L1411>, > in XML it’s the root Document element > <https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXXMLExample-v2.2.spdx.xml#L2> > (including some its child elements such as “creationInfo”), in YAML it’s > the set of root properties that aren’t collections > <https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXYAMLExample-2.2.spdx.yaml#L2> > (such as “SPDXID”, “documentNamespace”, “creationInfo”), in tag/value it’s > the set of properties that occur before the first object > <https://github.com/spdx/spdx-spec/blob/development/v2.2.2/examples/SPDXTagExample-v2.2.spdx#L3>, > in Excel it’s the “Document Info” sheet > <https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fraw.githubusercontent.com%2Fspdx%2Fspdx-spec%2Fdevelopment%2Fv2.2.2%2Fexamples%2FSPDXSpreadsheetExample-v2.2.xlsx&wdOrigin=BROWSELINK>. > > > > > Dave > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4297): https://lists.spdx.org/g/Spdx-tech/message/4297 Mute This Topic: https://lists.spdx.org/mt/87733987/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
