Hi David, If I've interpreted this correctly (and I admit I may not have) this sounds like the approach the SPDX 2.x uses where each file (document) has a namespace and then all the identifiers within that file are relative to that namespace. Those identifiers could be meaningful or meaningless because they're opaque to SPDX (although SPDX 2.x does impose some structural requirements on the identifier).
In the SPDX 3.x model you can still produce files with this same structure if you want, you have complete control over the namespaces and identifiers you use, so if you want to have a namespace per file and then unique identifiers within that namespace that's perfectly fine. However, over the last 18 months participants have expressed a few desired requirements that led to us making the model more flexible while still supporting that use case. Specifically: * Elements not being tied to documents. This was to support use cases where you may not be transferring the elements and just want to load up a graph of elements and traverse them. * Being able to include an element that was originally created in one document into another document (for example, to create an aggregate document to transfer into an air-gapped environment). * Encouraging re-use of elements when feasible. This becomes interesting when we have more and more "canonical" SBOMs being produced (i.e. the SBOM being produced by the original software creator rather than a third-party). Re-using the canonical elements is preferrable to creating new elements. This approaches aren't mutually exclusive and I believe both are supported today. Regards, William Bartholomew (he/him) - Let's chat<https://outlook.office.com/findtime/[email protected]&anonymous&ep=plink> Principal Security Strategist Cybersecurity Policy - Digital Diplomacy From: [email protected] <[email protected]> On Behalf Of David Kemp via lists.spdx.org Sent: Friday, November 12, 2021 1:04 PM To: SPDX-list <[email protected]> Subject: [EXTERNAL] [spdx-tech] Abstraction, Collections and Graphs All, We had a good discussion of Element IDs and Collections at the last tech meeting. Here are a couple of slides illustrating an abstract view of collections. 1) We have used small hand-tailored examples, and full SPDX files in various formats for discussion. Here's a third approach: I generated a random small graph using a page Google found at http://bl.ocks.org/erkal/9746513<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbl.ocks.org%2Ferkal%2F9746513&data=04%7C01%7Cwillbar%40microsoft.com%7C3a70aa0e20e3482e528008d9a61ff47c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637723478502499195%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=e8kj7ch7ApoNeAT0pLcDvj6UfE0tkqdvdurj77DoRQI%3D&reserved=0>, then used that as a basis for an SPDX Element graph. The reason to start with random input is to avoid cherry-picking simple examples. The intent is for the same process to work correctly for any random input. The undirected input graph had 15 anonymous nodes, and the process is to create SPDX collection and leaf Elements to represent each of them. 2) Sean correctly points out that a Collection is simply a set of element references, and everyone supported the KISS principle at the meeting. There is nothing simpler than an unlabeled graph, and the slides illustrate the process of assigning SPDX Element types to an unlabeled graph. There are many different ways to do that assignment, but all of them must result in a directed graph with edges from Collection Elements to members of each Collection. Leaf Elements (Annotation, Relationship, Identity and Artifact) are not collections and thus have no outgoing edges to member Elements. 3) In the logical model each Element exists on its own. But in the information model Elements are serialized into data files. The creator/serializer can decide how many Elements to include in a single serialized file. The slides show three examples: 15 serialized files (1 per Element), 1 serialized file with 15 Elements, and 2 serialized files with the 15 Elements divided among them and references from one to the other. The only difference between those files is how Element ids are assigned; the information model allows any mixture of large and small files. 4) The bottom line is that the more Elements an author chooses to include in a file: 1. the less work the author needs to do in choosing unique IRIs - there is only one per file, and all of the others are derived from it (in whatever manner the author chooses), and 2. the more efficient the serialization can be, assuming the author chooses to use small local ids. I hope these are self-explanatory. The data for the Element graph is below, it can be pasted into http://sketchviz.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsketchviz.com%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C3a70aa0e20e3482e528008d9a61ff47c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637723478502509190%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HkLTxN%2Fah4iZdji9X9LQ3a%2BgXq3HcOHAL%2BmHjbFqnMg%3D&reserved=0> to view. Regards, Dave ``` digraph G { node [fontname=arial; style=filled; fillcolor=lightskyblue1]; n1 [label="SBOM"]; n1 -> n2; n1 -> n4; n1 -> n7; n1 -> n8; n1 -> n9; n1 -> n10; n2 [label="Package1"]; n2 -> n3; n2 -> n4; n2 -> n6; n2 -> n10; n3 [label="Package2"]; n3 -> n10; n3 -> n5; n4 [label="Package3"]; n4 -> n8; n4 -> n12; n5 [label="Package4"]; n5 -> n6; n6 [label="Package5"]; n6 -> n7; n7 [label="file1"; style=filled; fillcolor=palegreen]; n8 [label="file2"; style=filled; fillcolor=palegreen]; n9 [label="Package6"]; n9 -> n13; n9 -> n14; n10 [label="file3"; style=filled; fillcolor=palegreen]; n11 [label="Anno1"; style=filled; fillcolor=palegreen]; n11 -> n3; n12 [label="file5"; style=filled; fillcolor=palegreen]; n13 [label="Package7"]; n13 -> n15; n14 [label="file6"; style=filled; fillcolor=palegreen]; n15 [label="file7"; style=filled; fillcolor=palegreen]; } ``` -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4254): https://lists.spdx.org/g/Spdx-tech/message/4254 Mute This Topic: https://lists.spdx.org/mt/87076458/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
