David, How do you think about the orphan problem in Option 1? Given that profiles will be attaching more and more information to elements over time I see this as potentially a big con, do any of the "orphan annotation fixes" options jump out to you over others?
I understand your concern around the embedded files/relationships not being elements, however, this is a serialization, I envisaged that they would be elements (which is why they included an id). I agree that we wouldn't want files/relationships that aren't elements because we want consumers to have the ability to attach information to them even if that wasn't the intent of the producer. Regards, William Bartholomew (he/him) - Let's chat<https://outlook.office.com/findtime/[email protected]&anonymous&ep=plink> Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. From: David Kemp <[email protected]> Sent: Friday, January 28, 2022 8:39 AM To: William Bartholomew (CELA) <[email protected]> Cc: [email protected]; Gary O'Neall <[email protected]>; SPDX-list <[email protected]> Subject: Re: [EXTERNAL] Re: [spdx-tech] Follow-up on today's tech call about the "Amend Contains Relationship" discussion William, Thank you! Your examples are exactly the sort of specifics we need to discuss use cases pros and cons. Some observations: * I agree that your options 1A, 1B, 1C, 1D represent the same "concept 1", and option 2 is the other concept. 1A is how I envisioned the structure * In 1B Package, typo files: #foo-contents should be #foo-contents-rev1 * 1B uses a doubly-linked list from package to "files: #foo-contents-rev1" and from relationship to "from: #foo-metadatarev1". Inconsistencies are detected because the forward link invalidates any attempt for other relationships to have the same "from" id. 1B is a complex way of expressing 1A, and any apparent flexibility implied by the ability of multiple relationships to reference the same package is incorrect, and tools would be required to do that error detection. * In 1C and 1D, embedded structures are not Elements and are not reusable as you note in 1D. The "id" properties in 1C "id": "foo-contents-rev1" and 1D "id": "world-file" and "id": "hello-file" cannot be used to reference graph elements, do not have any other purpose, and are therefore misleading and should be deleted. Option 2 is identical to Option 1B except that there is no forward link "files": "#foo-contents-rev1" from package to relationship. That forward link is the safety belt that prevents multiple relationships from referencing the same package. Since the purpose of Option 2 is to allow that, the same note from Option 1B applies to Option 2: "# NOTE: This reflects what is in the Package.files so there is a possibility of inconsistency" That inconsistency is what I consider the overriding CON for Option 2. Inconsistency is not just a possibility. If Alice creates Revision 1 and Revision 2, she needs to include the AMENDS relationship to invalidate the Revision 1 file list, otherwise they are inconsistent. Bob could also create a different Revision 2 that AMENDS the file list, but which Revision 2 should consumers believe, assuming they obtained all of the AMENDS relationships in the graph. As Alice and Bob and Carol create Revisions 2 and 3 and 4 all from "foo-metadata-rev1", the graph can rapidly become complex and inconsistent, with no unique identifier for each of their visions of the contents of the unmodified package artifact. I believe each file list for a given package artifact deserves it's own unique id, as in Option 1A. Dave On Thu, Jan 27, 2022 at 11:14 AM William Bartholomew (CELA) via lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C9b243ebc31684a9f698108d9e27cb3e0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637789847507780438%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=N03e58zuq8WbbH4ZzmDjRLVACLp%2Fjiik3K9LJVCMdng%3D&reserved=0> <[email protected]<mailto:[email protected]>> wrote: Let me try something and provide some concrete examples that have two revisions, I want to take a single scenario where a Package is thought to contain 2 files, subsequent analysis finds it only contains 1 file, and how would that scenario look with the different model implementations. I chose the file removal scenario, even though it's less common, because removals are typically harder to express than additions. In all these scenarios the underlying artifacts do not change, it is only our knowledge of the artifacts that changes between the revisions. To keep the scenarios as consistent as possible the goals were: 1) the revision is a separate physical document to the initial 2) the documents are as standalone as possible and 3) the documents re-used unchanged elements as much as possible 4) there is information that is not revised attached to the package id that we want a consumer to be able to access (the annotation). I haven't included the property + relationship option because it's just a combination of the two, but the property option is broken out into more sub-options based on different data types of the property. Option 1A - Files are expressed as a property on the element that is of type ID-ref<File> Revision 1 { "type": "Package", "id": "foo- metadatarev1", "name": "foo", "files": [ "#hello-file", "#world-file" ] }, { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" }, { "type": "File", "id": "world-file", "name": "world" } Revision 2 { "type": "Package", "id": "foo- metadatarev2", # Element content changed so the id had to change "name": "foo", "files": [ "#hello-file" ] }, { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", # Oh no, orphaned. See below for possible fixes. "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" }, # "world-file" element removed because it is obsolete, it could be left in but would be orphaned in the graph Orphaned annotation fixes: 1. AMENDS relationship from foo-metadatarev2 to foo-metadatarev1 could signal to consumers that information attached to foo-metadatarev1 may be relevant to foo-metadatarev2. Con: Difficult for consumers to walk graph, ambiguous if attached information is amended, ambiguous if information attached to both rev1 and rev2. Pro: Simple for producer, maximizes re-use. 2. Copy the annotation and update the "for" reference and "id" property and add AMENDS relationship from annotation-rev2 to annotation-rev1. Con: Breaks integrity and signing, lose trust in the annotation, doesn't scale if there is lots of information attached. Pro: Easy for consumer to find attached information. 3. Attach annotation to package with relationship instead of "for" property, create a new relationship from "foo-metadatarev2" to "annotation-rev1". Con: Duplicate relationships, you may not know all the relationships to duplicate. Pro: Retains integrity of annotation, shows the author of new relationship, easy to follow for consumer, unambiguous. (meta description is: if you revise an element you have to revise the relationships that refer to it, consumers don't refer to relationships on previous revisions). My summary: Decent, especially with annotation fix #3. File list carried with the package but files don't need to be. Third parties can't add files without "taking ownership" of the package. Option 1B - Files are expressed as a property on the element that is of type ID-ref<Relationship> Revision 1 { "type": "Package", "id": "foo- metadatarev1", "name": "foo", "files": "#foo-contents" }, { "type": "Relationship", "id": "foo-contents-rev1", "relationshipType": "CONTAINS", "from": "#foo-metadatarev1", # NOTE: This reflects what is in the Package.files so there is a possibility of inconsistency "to":" ["#hello-file", "#world-file"] } { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" }, { "type": "File", "id": "world-file", "name": "world" } Revision 2 { "type": "Package", "id": "foo- metadatarev2", # Had to change because files changed "name": "foo", "files": "#foo-contents-rev2" # Had to change because relationship id changed }, { "type": "Relationship", "id": "foo-contents-rev2", # Had to change because to changed "relationshipType": "CONTAINS", "from": "#foo-metadatarev2", # NOTE: This reflects what is in the Package.files so there is a possibility of inconsistency "to":" ["#hello-file"] } { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", # Oh no, orphaned. "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" } # "world-file" element removed because it is obsolete, it could be left in but would be orphaned in the graph Orphaned annotation fixes: * Same as Option 1A. My summary: Too many cascading changes. File list pointer carried with the package but the relationship and files don't need to be. Third parties could amend the Relationship in a new document. Option 1C - Files are expressed as a property on the element that is of type Relationship Revision 1 { "type": "Package", "id": "foo- metadatarev1", "name": "foo", "files": { # This is of type Relationship, "type", "from", and "relationshipType" would be implied. "id": "foo-contents-rev1", "to":" ["#hello-file", "#world-file"] } }, { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" }, { "type": "File", "id": "world-file", "name": "world" } Revision 2 { "type": "Package", "id": "foo- metadatarev2", # Had to change because "files" changed "name": "foo", "files": { # This is of type Relationship, "type", "from", and "relationshipType" would be implied. "id": "foo-contents-rev2", # Had to change because "to" changed "to":" ["#hello-file"] } }, { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", # Oh no, orphaned. "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" } # "world-file" element removed because it is obsolete, it could be left in but would be orphaned in the graph Orphaned annotation fixes: * Same as Option 1A. My summary: Better than option 1B, still a lot of cascading changes. File relationships have to be carried with the Package (but the files they reference don't need to be). Third parties can't amend the files list without "taking ownership" of the package. Option 1D - Files are expressed as a property on the element that is of type File[] Revision 1 { "type": "Package", "id": "foo- metadatarev1", "name": "foo", "files": [ { "type": "File", "id": "hello-file", "name": "hello" }, { "type": "File", "id": "world-file", "name": "world" } ] }, { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", "text": "Annotation example" } Revision 2 { "type": "Package", "id": "foo- metadatarev2", # Had to change because "files" changed. "name": "foo", "files": [ { "type": "File", "id": "hello-file", "name": "hello" } ] }, { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", # Oh no, orphaned. "text": "Annotation example" } Orphaned annotation fixes: * Same as Option 1A. My summary: Succinct and all information easily accessible to consumer, doesn't support re-use of files between packages. Files have to be carried with the Package. Third parties can't amend the files list without "taking ownership" of the package. Option 2 - Files are expressed as a CONTAINS relationship from Package to File Revision 1 { "type": "Package", "id": "foo- metadatarev1", "name": "foo" }, { "type": "Relationship", "id": "foo-contents-rev1", "relationshipType": "CONTAINS", "from": "#foo-metadatarev1", "to":" ["#hello-file", "#world-file"] } { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" }, { "type": "File", "id": "world-file", "name": "world" } Revision 2 { "type": "Package", "id": "foo- metadatarev1", "name": "foo" }, { "type": "Relationship", "id": "foo-contents-rev2", # Had to change because "to" changed. "relationshipType": "CONTAINS", "from": "#foo-metadatarev1", "to":" ["#hello-file"] }, { # Needed so consumers know to ignore foo-contents-rev1 "type": "Relationship", "id": "foo-contents-amend-rev2", "relationshipType": "AMENDS", "from": "#foo-contents-rev2", "to":" ["#foo-contents-rev1"] }, { "type": "Annotation", "id": "annotation-rev1", "for": "#foo-metadatarev1", "text": "Annotation example" }, { "type": "File", "id": "hello-file", "name": "hello" } My summary: Impact only to changed nodes, "append only" (no copying and re-id-ing of nodes not impacted by the revision). Consumer has to be given the appropriate amended relationship (though they don't need any previous ones), producer would have to identify the correct set of nodes to give the consumer, or, if they are given extra, the consumer needs to follow the amends graph. Nothing has to be carried with the Package. Third parties can amend the files list without "taking ownership" of the package by creating a new document with new relationship and amends. Regards, William Bartholomew (he/him) - Let's chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C9b243ebc31684a9f698108d9e27cb3e0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637789847507780438%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0b%2B2VyC07oM9o71IKutMw9JRdrdfNYZibuX7bHr5jr0%3D&reserved=0> Principal Security Strategist Global Cybersecurity Policy - Microsoft My working day may not be your working day. Please don't feel obliged to reply to this e-mail outside of your normal working hours. From: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> On Behalf Of Steve Winslow via lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C9b243ebc31684a9f698108d9e27cb3e0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637789847507830440%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vBzZdm0WfMx82Zz06aHU60ZzDi%2FM%2Fma7zMPy8oo0rfs%3D&reserved=0> Sent: Wednesday, January 26, 2022 12:18 PM To: Gary O'Neall <[email protected]<mailto:[email protected]>> Cc: SPDX-list <[email protected]<mailto:[email protected]>> Subject: [EXTERNAL] Re: [spdx-tech] Follow-up on today's tech call about the "Amend Contains Relationship" discussion You don't often get email from [email protected]<mailto:[email protected]>. Learn why this is important<http://aka.ms/LearnAboutSenderIdentification> Hi Gary and team, I wasn't on the tech team call, and I haven't been keeping up to date on the SPDX 3.0 modeling discussions, so please feel free to disregard the following if it isn't relevant... >From the SPDX 2.2 perspective, if there were a change to the Files that a >Package CONTAINS, then presumably the Package Verification Code would need to >be changed accordingly. In that sense, a change to the CONTAINS for a >Package's Files is more than just a Relationship change; rather, it changes >the properties of the Package element itself. Given that, of the 3 choices given, I think that from the SPDX 2.2 perspective, it would be choice 3 -- both the Package's element and its Relationships would change. Perhaps not relevant if there isn't going to be a Package Verification Code (or equivalent) for Packages in 3.0, but just throwing it out there in case it's helpful for how to think about this. Steve On Wed, Jan 26, 2022 at 2:36 PM Gary O'Neall <[email protected]<mailto:[email protected]>> wrote: TL;DR: Please reply with any thoughts on the pros and cons to the 3 different options for amending a contains relationship. Please try not to duplicate - just reply with which scenario and which pro/con. If you need clarification on the scenario or options, please start a separate thread. Here's the option description and pros and cons from the minutes: (1) Anytime there is a change in the contains relationship for a package, one has to change the package element. Pro's: Concise for producer (initial production) without having to amend it. Improve the quality (harder to miss updating relationships). Graph navigation easier for un-amended. Con's: Less concise for amendments. Makes element reuse difficult / less likely. Graph navigation is harder on amends, unless all copied. (2) Anytime there is a change in the contains relationship, only the relationships are changed. Pro's: More concise for amendments. Makes element reuse easier. Graph navigation is easier on amends. Con's: : Less concise for producer (initial production) without having to amend it. Lower quality - you must traverse and understand the amends relationship to have a full understanding of what's changed. Graph navigation harder for un-amended. (3) Anytime there is a change in the contains relationship, the element, the relationship or both can change. Pro's: Con's: Additional context (especially for those not on the tech call today): Last week, we discussed one of the Meta Issues - "Should we have properties that duplicate relationships? And impact on round-tripping?" as tracked in Issue #21<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Fspdx-3-model%2Fissues%2F21&data=04%7C01%7Cwillbar%40microsoft.com%7C9b243ebc31684a9f698108d9e27cb3e0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637789847507830440%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ko2rnM1w7LZfvZPU7k%2BQPWfIGRv6R1aB4khvupl300o%3D&reserved=0>. Based on the email conversations, we started this week's call with a discussion if the package Element contains relationship could be changed. We concluded that the Package Artifact contains relationships are fixed and do not change. However, the Package Element and associated Contains Relationships may change. This may be due to imperfect knowledge for the initial creation of the SBOM, different use cases, better tools etc. Since the Contains metadata may change, how would an SBOM be amended? The specific scenario is: 1. An SBOM is produced with a Package. The package contains files, but the SBOM is not complete. 2. An amended SBOM is produced with a better/corrected list of files it contains. In creating the amended SBOM, there are 3 possible ways of representing the changed SBOM: 1. Only the Package Element is amended: * a new Package Element is created with an Amends relationship to the previous incorrect Package element * the new Package Element properties are updated as appropriate * relationships to/from the Element are replaced if appropriate 1. Only the relationships are amended or added: * Any Contains relationships which are modified is amended by creating a new Relationship with an Amends relationship to the previous Relationship * Any new Contains relationship is added 1. Both - Both approaches to amending the SBOM is supported. A new Package Element may be created, relationships may be modified, or both ------------------------------------------------- Gary O'Neall Principal Consultant Source Auditor Inc. Mobile: 408.805.0586 Email: [email protected]<mailto:[email protected]> CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, re-transmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4349): https://lists.spdx.org/g/Spdx-tech/message/4349 Mute This Topic: https://lists.spdx.org/mt/88724635/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
