Compare serialized examples for three use cases: 1) I want to transfer three Actors. 2) SBOM C uses SBOM B, which uses SBOM A. 3) I want the canonical hash of a Relationship.
Regards, Dave On Tue, Jul 19, 2022 at 12:47 AM William Bartholomew (CELA) via lists.spdx.org <[email protected]> wrote: > CIL > > ------------------------------ > *From:* [email protected] <[email protected]> on behalf of > David Kemp via lists.spdx.org <[email protected]> > *Sent:* Monday, July 18, 2022 6:18 PM > *To:* SPDX-list <[email protected]> > *Subject:* [EXTERNAL] Re: [spdx-tech] V3 serialization > > One principle is that the goal of serialization is to put Elements into > physical format, NOT to create new elements that didn't exist prior to > serialization. If you have 6 elements going into serialization, you should > have 6 elements coming out, not 7. > > *[William] *Agreed, does my example violate that? It would be difficult > for a serialization to "generate" elements because of the id and other > required properties so I had not considered this a possibility. > > The second principle is that logical elements should be independent: the > value of one element does not depend on the value of any other element. > > *[William] *I think it depends on your definition of "depends on" (pun > intended). Elements may have properties that are references to other > elements and serializers may choose to use that information for more > compact serialization but since this would get unwound on deserialization > that's immaterial. > > I believe that those two principles are worth adopting as design > requirements. > > It is ugly to put something into serialization and get something else back > out, > > *[William] *Agreed, though a lot of serializers/deserializers end up > making minor changes as a result of normalization and other processes. Not > ideal but that's an implementation detail within each > serializer/deserializer. > > and it's really ugly to stuff one element's value inside another > > *[William] *I don't agree with this, at least for "collection" elements. > Also, the serialization model for collection elements could support either > element references or the element itself so if you think it's ugly then you > would have the option of not doing nesting. > > not least because you can wind up with infinite recursion with documents > inside documents inside documents inside documents > > *[William] *This is avoidable and using references instead of nesting > doesn't prevent this problem. In fact, if you only use nesting then it's > impossible to have infinite recursion, it's only when you use references > that becomes possible. > > Even two levels of element nesting makes things quite difficult to > disentangle. > > *[William] *I don't agree, for collections the nesting makes it obvious > which collection an element is part of without having to follow the id > references. Since the serialization model could support either approach I > don't see this being a blocker. > > The fundamental principle is that a file containing data is not an > element. A Transfer Unit is defined by a data schema, just like the > content of any XML file or JSON file or ASN.1 file. If the logical model > has a Document element that describes an X.509 certificate, that element > has interesting facts about the certificate but does not define its > content. It is essential to remember the difference between the bytes in a > file and the properties of a File or Document element - the difference > between a thing and metadata about that thing. > > *[William] *We've had this discussion a number of times, the Collection > element (and its subclasses) aren't metadata about collections, document, > SBOM, etc. they are the collection, document, SBOM, etc. There is no > "physical" thing outside of the SPDX document that is the collection, > document, SBOM, etc., they only exist in the SPDX graph. You could take > that SBOM, serialize it to disk, and then have a File element that talks > about the physical serialization of the SBOM, but that's different to the > SBOM SPDX element. > > * defaults: > I created a separate defaults property to hold the five defaultable > properties in order to distinguish them from non-defaultable properties. > Gary and I like the idea, but I'm not wedded to it. The transfer unit > schema could have "defaultCreatedBy", "defaultCreated", etc properties at > the top level, to highlight that they are defaults, unlike name, > description, comments, etc. Whatever the mechanism, there must be a way to > ensure that "name" doesn't take an inappropriate default value if it isn't > populated, while the default for "profiles" is appropriate. > > *[William] *I'm struggling with multiple properties that have the same > definition having different names and different locations on the objects, > it feels like a lot to explain. We could flag certain properties as > inheritable in the schema, and this only applies to collection elements so > I think the scope is quite narrow. > > * array vs map > I used map as a conversation starter, because it fits the "unique" > semantics of element ids, and because mapping types are ubiquitous now, > XML schema had it in 2005 > https://www.w3.org/2005/07/xml-schema-patterns.html#Maps > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2005%2F07%2Fxml-schema-patterns.html%23Maps&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EQ3nPm2ddYPE8xvFM0KqQAlHYZwbK%2BoIxh9K8hTz9r4%3D&reserved=0>, > and it's a built-in part of JSON. > > *[William] *While it is built-in to XML and JSON my experience is that > it's not been supported well by schema languages and > serializers/deserializers. I know I've had situations where I had to > duplicate the id property in the class to ensure that other things work > correctly (and to maintain the independence of the class). Also, in most > object oriented languages there is not a way to get the key from the object > so you end up having to track the key independently of the object which is > a pain. > > JSON-LD even treats ID differently from other properties by giving it a > reserved @ID type, and SQL databases have primary keys with the special > characteristic that they uniquely identify the record rather than being > just another column. Autogenerated ids are often hidden because they are > ubiquitous. > > *[William] *In both JSON-LD and SQL the properties are still normal > properties, in JSON-LD it's still a property on the object it just has a > special name, in SQL it's still a column in the table it just has special > metadata attached to it. Even autogenerated ids are typically normal > columns they're just system generated and you can't change their definition. > > And finally, you introduced Map to the logical model for Extensions. If > it's OK for extensions, it's OK for Elements :-). > > *[William] *Not the same 🙂. The map for extensions is a map of > "extension type" to value, not of "id" to value. It is a consequence of us > deciding that each type can only be assigned once that it can also be used > as an id, but it is primarily a type, not an id. If we changed that design > decision it would no longer function as an id. > > Seriously though, I'm not wedded to Map. Treating Id as any other > property but having some prose saying that it can be used as a primary key > / unique identifier is OK, it's just kind of loose given that references > from foreign to primary keys is a universal concept. > > *[William] *In SQL Server (others are similar) a foreign key takes the > form FOREIGN KEY (ChildCol1, ChildCol2) REFERENCES parentTable (ParentCol1, > ParentCol2, ...), they're still just columns, nothing magical about them, > not even their names. > > * type property > Since JSON does not have types it's good practice to ensure that "type: > identity" cannot collide with a property named "identity". At the core > profile all type and property names are defined and don't collide, but if > "type" goes away we'll need to ensure that properties defined in any > profile cannot collide with types defined in any profile. Again JSON-LD > treats @type as a reserved property: > https://w3c.github.io/json-ld-syntax/#typed-values > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Fjson-ld-syntax%2F%23typed-values&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CKnF7%2BIEKZf8stmRlX21mxCvsHPWJi1OOT7zbGdsNQ4%3D&reserved=0> > . > > *[William] *Agreed, and type isn't in the logical model, a JSON-LD > serializer would use @type, an XML one would use XML namespaces and element > names, a ProtoBuf one would use message types. Since my examples were > "plain" JSON which does not have a built-in way of declaring types I used a > "plain" property to capture the type, I agree that the name of this > property should avoid potential conflict (e.g. by prefixing with an _). > > * document root > A transfer unit file is not an Element and not a logical type or a class. > The bytes in SPDX documents are not defined by the logical model, they just > have to be able to be de-serialized into element instances. > > *[William] *Same disagreement as above. > > Data schemas (for JSON, XML, ASN.1, ...) explicitly do not define classes, > they define only data types. > > *[William] *I'm not sure what definition of "class" you're using here, > but the boxes on the diagram could be represent in an OO language as > classes or interfaces, for our purposes I don't think the distinction > between class and data type is meaningful. > > Regards, > Dave > > > On Mon, Jul 18, 2022 at 7:08 PM William Bartholomew (CELA) via > lists.spdx.org > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TiataTJ3Eq7MrdDQuuJIlKVEqBSvD3161KGOG6FSG%2BQ%3D&reserved=0> > <[email protected]> wrote: > > There are some “proposed” examples at the bottom of the model diagram > (note that I intended these to be representative until we define the exact > serialization for each data format): > > https://github.com/spdx/spdx-3-model/blob/main/model.png > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Fspdx-3-model%2Fblob%2Fmain%2Fmodel.png&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rqc21Xo%2F%2B2KlwtUvFSqmnyhiGCBPJKTZydqzDG1VYw0%3D&reserved=0> > > > > Some of the key differences (with no implied support for either choice, I > have included my reasoning for reference only): > > - Defaults being represented as the original properties on a > collection element * vs* being in their own “defaults” property. > - I was thinking about this as a traditional inheritance/overrides > structure. If a property doesn’t have a value you can walk the tree up > looking for the same property. > - Array of elements *vs* map of elements. > - In the past I have found schema languages don’t have good support > for one of the properties of an object being outside of the object > (i.e. a > key on the collection outside). Having a completely contained object > makes > canonicalization etc. easier at the risk of the array having multiple > instances of the same element (which can be solved in other ways). > - Type being a string property *vs* an object property containing the > type. > - I mainly followed the JSON-LD style and it has one less level of > nesting. > - Document root being an element *vs* a custom class. > - Tried to minimize custom classes by having everything as either > an element or a value type. > > > > > > Regards, > > > > William Bartholomew (he/him) – Let’s chat > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Fbookwithme%2Fuser%2F988a5aee063345bab5c400a0da19af33%40microsoft.com%2Fmeetingtype%2FSVRwCe7HMUGxuT6WGxi68g2%3Fanonymous%26ep%3Dmlink&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fOWymTbUg2G24V70i0ng03S7DCnFGcdN45pwnDYCPrw%3D&reserved=0> > > Principal Security Strategist > > Global Cybersecurity Policy – Microsoft > > > > *My working day may not be your working day. Please don’t feel obliged to > reply to this e-mail outside of your normal working hours.* > > > > *From:* [email protected] <[email protected]> *On Behalf Of > *David Kemp via lists.spdx.org > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TiataTJ3Eq7MrdDQuuJIlKVEqBSvD3161KGOG6FSG%2BQ%3D&reserved=0> > *Sent:* Monday, July 18, 2022 1:56 PM > *To:* SPDX-list <[email protected]> > *Subject:* [EXTERNAL] [spdx-tech] V3 serialization > > > > Last week I took an action item to describe what serialized data for the > v3 logical model could look like, in order to clarify discussion of the > types shown in the model. > > The thing to remember about v3 is that it is knowledge graph centric, not > document centric. Element instances from the knowledge graph can be > serialized into data instances, but the data definition is controlled by > the logical model, not vice versa. Data examples in various formats can > illustrate the logical model for readers of the v3 spec, but they do not > define it as they do in SPDX v2. > > A collection of independent element values is shown in > "logical-elements". JSON data is use to visualize the element values, but > it is important to remember that the logical value itself is the ability to > answer questions: > * what is the id of this element? > * what is the type of this element? > * who created this element? > etc. The element is a class with getters that allow each property of an > instance to be retrieved, and those property values are independent of > serialization format. > > That collection of elements can be serialized into a transfer unit file as > shown in "transfer units" > > A Document element describes the contents of a transfer unit, but does not > need to be present in the transfer unit. The example transfer unit > containing six elements (an SBOM, a Package, two Files, a Relationship, and > an Actor that created them) is: > > { > "namespace": "urn:acme.dev > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Facme.dev%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PKSuAOtEI1oCP5UYPeEQEjOt9abpeMQhfII1XzDY4M4%3D&reserved=0> > :", > "defaults": { > "createdBy": ["identities:fred"], > "created": "2022-04-05T22:00:00Z", > "specVersion": "3.0", > "profiles": ["Core", "Software"], > "dataLicense": "CC0-1.0" > }, > "elementValues": { > "artifacts:gnu-coreutils/v9.1/src/du.c": { > "type": { > "file": { > "filePurpose": ["APPLICATION", "SOURCE"] > } > } > }, > "artifacts:gnu-coreutils/v9.1/src/echo.c": { > "type": { > "file": { > "filePurpose": ["APPLICATION", "SOURCE"] > } > } > }, > "artifacts:gnu-coreutils/v9.1": { > "type": { > "package": { > "packagePurpose": ["APPLICATION", "SOURCE"], > "downloadLocation": " > http://mirror.rit.edu/gnu/coreutils/coreutils-9.1.tar.gz > <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmirror.rit.edu%2Fgnu%2Fcoreutils%2Fcoreutils-9.1.tar.gz&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4sq6zH7mAV6i7AmXqkbACbfYy%2BkEBWZmOIrXNN1HJxM%3D&reserved=0> > ", > "homePage": "https://www.gnu.org/software/coreutils/ > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fcoreutils%2F&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uBw%2BKAYUGx6qpjWVTTtWQY1LLuT0i3EOFHzp7U%2BWfPw%3D&reserved=0> > " > } > }, > "name": "GNU Coreutils" > }, > "relationships:gnu-coreutils/v9.1": { > "type": { > "relationship": { > "relationshipType": "CONTAINS", > "from": "urn:acme.dev:artifacts:gnu-coreutils/v9.1", > "to": [ > "artifacts:gnu-coreutils/v9.1/src/du.c", > "artifacts:gnu-coreutils/v9.1/src/echo.c" > ] > } > } > }, > "identities:fred": { > "type": { > "actor": {} > }, > "identifiedBy": [{"email": "[email protected]"}] > }, > "sboms:gnu-coreutils/v9.1": { > "type": { > "sbom": { > "elements": [ > "artifacts:gnu-coreutils/v9.1/src/du.c", > "artifacts:gnu-coreutils/v9.1/src/echo.c", > "artifacts:gnu-coreutils/v9.1", > "relationships:gnu-coreutils/v9.1", > "identities:fred" > ] > } > } > } > } > } > > The element examples, transfer unit examples, and the SPDX v3 schema > derived from the logical model are available in > https://github.com/davaya/spdx-3-elements > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavaya%2Fspdx-3-elements&data=05%7C01%7Cwillbar%40microsoft.com%7C5fdb31fe5d124147f1d808da69249c20%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637937903545815080%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3LM2oFWzhacH2fk0Na7s7mo1Vb4cpu4v3miwzeSPB%2Fc%3D&reserved=0> > . > > The intent is for these to assist in refining the logical model and its > serializations together. > > Regards, > Dave > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4661): https://lists.spdx.org/g/Spdx-tech/message/4661 Mute This Topic: https://lists.spdx.org/mt/92468742/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
