Orphans was an ambiguous term for me to use. Option 1 creates a new Package
element with a new SPDXID, so everything that referenced the previous SPDXID
now refers to an obsolete Package. This introduces two problems:
1. If you have an Element (for example: an Annotation, or a Vulnerability)
that references the old SPDXID you need to follow the graph to find the current
revision of the Package.
2. If you have a Package then to discover the information associated with
that package you would need to walk through all previous revisions of the
Package to find Elements referencing those SPDXIDs.
This problem can still occur in Option 2 but the frequency at which it occurs
is reduced. There is an option which eliminates the problem, but it may add too
much complexity to the model, and that is to not include any "information" in
Element and use it to hold identity only.
Element
id: SPDXID
created: DateTime
ElementDescription
for: SPDXID
created: DateTime
name: String
summary: String
description: String
comment: String
With this model everything in Element is immutable, so the id would never
change, you can iterate the description by creating new ElementDescriptions and
everything attached to Element remains intact.
Regards,
William Bartholomew (he/him) - Let's
chat<https://outlook.office.com/findtime/[email protected]&anonymous&ep=plink>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft
My working day may not be your working day. Please don't feel obliged to reply
to this e-mail outside of your normal working hours.
From: David Kemp <[email protected]>
Sent: Friday, January 28, 2022 1:07 PM
To: William Bartholomew (CELA) <[email protected]>
Cc: [email protected]; Gary O'Neall <[email protected]>; SPDX-list
<[email protected]>
Subject: Re: [EXTERNAL] Re: [spdx-tech] Follow-up on today's tech call about
the "Amend Contains Relationship" discussion
William,
Nodes in the graph, once created, conceptually exist forever. I don't see
orphans as a problem that has to be solved except as an engineering exercise to
keep systems running. Data can be archived and then deleted when it is no
longer useful. Orphan elements can be garbage-collected when nothing
references them, but they don't cause any harm except by taking up space.
"Orphaned annotation fixes" is a different issue that might be addressed with
symbolic links - mechanisms to allow annotations to be attached to the
artifactUri instead of to the metadata about the artifact. I understand that's
what foo-metadata-rev1 in Option 2 accomplishes, but it's not clear in my head
yet how to distinguish information that is constant in every revision chain vs.
information that can be revised.
Perhaps Option 1A Revision 2 could have Orphan Option 4: get rid of the
Annotation and replace it with a relationship DESCRIBED_BY (for lack of a
better idea, it's descriptive for annotations but should be applicable to every
element type) "from foo-metadatarev2" to ["annotation-rev1", other elements
from foo-metadata-rev1 that we want to keep]. That preserves integrity, and is
a constant size vs. the variable size and broken integrity of option 2
(copying).
I accept your point about serializing elements as embedded structures. It does
mean that the hash of the containing element depends on the properties of the
contained element, not just the IRI. And if it is just serialization then the
elements can be referenced and validated from elsewhere if tooling is required
to dig them out of the nested structure.
Dave
On Fri, Jan 28, 2022 at 12:34 PM William Bartholomew (CELA)
<[email protected]<mailto:[email protected]>> wrote:
David,
How do you think about the orphan problem in Option 1? Given that profiles will
be attaching more and more information to elements over time I see this as
potentially a big con, do any of the "orphan annotation fixes" options jump out
to you over others?
I understand your concern around the embedded files/relationships not being
elements, however, this is a serialization, I envisaged that they would be
elements (which is why they included an id). I agree that we wouldn't want
files/relationships that aren't elements because we want consumers to have the
ability to attach information to them even if that wasn't the intent of the
producer.
Regards,
William Bartholomew (he/him) - Let's
chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C987a8d379b014299421b08d9e2a23630%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637790008611657485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Ll%2FzOF65Xaw3TBjYtBUWDBYNn%2B22s9SpofvQdHVBp3A%3D&reserved=0>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft
My working day may not be your working day. Please don't feel obliged to reply
to this e-mail outside of your normal working hours.
From: David Kemp <[email protected]<mailto:[email protected]>>
Sent: Friday, January 28, 2022 8:39 AM
To: William Bartholomew (CELA)
<[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>; Gary O'Neall
<[email protected]<mailto:[email protected]>>; SPDX-list
<[email protected]<mailto:[email protected]>>
Subject: Re: [EXTERNAL] Re: [spdx-tech] Follow-up on today's tech call about
the "Amend Contains Relationship" discussion
William,
Thank you! Your examples are exactly the sort of specifics we need to discuss
use cases pros and cons.
Some observations:
* I agree that your options 1A, 1B, 1C, 1D represent the same "concept 1",
and option 2 is the other concept. 1A is how I envisioned the structure
* In 1B Package, typo files: #foo-contents should be #foo-contents-rev1
* 1B uses a doubly-linked list from package to "files: #foo-contents-rev1"
and from relationship to "from: #foo-metadatarev1". Inconsistencies are
detected because the forward link invalidates any attempt for other
relationships to have the same "from" id. 1B is a complex way of expressing
1A, and any apparent flexibility implied by the ability of multiple
relationships to reference the same package is incorrect, and tools would be
required to do that error detection.
* In 1C and 1D, embedded structures are not Elements and are not reusable
as you note in 1D. The "id" properties in 1C "id": "foo-contents-rev1" and 1D
"id": "world-file" and "id": "hello-file" cannot be used to reference graph
elements, do not have any other purpose, and are therefore misleading and
should be deleted.
Option 2 is identical to Option 1B except that there is no forward link
"files": "#foo-contents-rev1" from package to relationship. That forward link
is the safety belt that prevents multiple relationships from referencing the
same package. Since the purpose of Option 2 is to allow that, the same note
from Option 1B applies to Option 2: "# NOTE: This reflects what is in the
Package.files so there is a possibility of inconsistency"
That inconsistency is what I consider the overriding CON for Option 2.
Inconsistency is not just a possibility. If Alice creates Revision 1 and
Revision 2, she needs to include the AMENDS relationship to invalidate the
Revision 1 file list, otherwise they are inconsistent. Bob could also create a
different Revision 2 that AMENDS the file list, but which Revision 2 should
consumers believe, assuming they obtained all of the AMENDS relationships in
the graph. As Alice and Bob and Carol create Revisions 2 and 3 and 4 all from
"foo-metadata-rev1", the graph can rapidly become complex and inconsistent,
with no unique identifier for each of their visions of the contents of the
unmodified package artifact.
I believe each file list for a given package artifact deserves it's own unique
id, as in Option 1A.
Dave
On Thu, Jan 27, 2022 at 11:14 AM William Bartholomew (CELA) via
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C987a8d379b014299421b08d9e2a23630%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637790008611657485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=n5zyNeLPUa%2BWHRCeEwQCpbAlLWK5uRx6grfejtzP4pY%3D&reserved=0>
<[email protected]<mailto:[email protected]>>
wrote:
Let me try something and provide some concrete examples that have two
revisions, I want to take a single scenario where a Package is thought to
contain 2 files, subsequent analysis finds it only contains 1 file, and how
would that scenario look with the different model implementations. I chose the
file removal scenario, even though it's less common, because removals are
typically harder to express than additions. In all these scenarios the
underlying artifacts do not change, it is only our knowledge of the artifacts
that changes between the revisions.
To keep the scenarios as consistent as possible the goals were: 1) the revision
is a separate physical document to the initial 2) the documents are as
standalone as possible and 3) the documents re-used unchanged elements as much
as possible 4) there is information that is not revised attached to the package
id that we want a consumer to be able to access (the annotation).
I haven't included the property + relationship option because it's just a
combination of the two, but the property option is broken out into more
sub-options based on different data types of the property.
Option 1A - Files are expressed as a property on the element that is of type
ID-ref<File>
Revision 1
{
"type": "Package",
"id": "foo- metadatarev1",
"name": "foo",
"files": [
"#hello-file", "#world-file"
]
},
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1",
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
},
{
"type": "File",
"id": "world-file",
"name": "world"
}
Revision 2
{
"type": "Package",
"id": "foo- metadatarev2", # Element content changed so the id
had to change
"name": "foo",
"files": [
"#hello-file"
]
},
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1", # Oh no, orphaned. See below for
possible fixes.
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
},
# "world-file" element removed because it is obsolete, it could be left in but
would be orphaned in the graph
Orphaned annotation fixes:
1. AMENDS relationship from foo-metadatarev2 to foo-metadatarev1 could
signal to consumers that information attached to foo-metadatarev1 may be
relevant to foo-metadatarev2. Con: Difficult for consumers to walk graph,
ambiguous if attached information is amended, ambiguous if information attached
to both rev1 and rev2. Pro: Simple for producer, maximizes re-use.
2. Copy the annotation and update the "for" reference and "id" property and
add AMENDS relationship from annotation-rev2 to annotation-rev1. Con: Breaks
integrity and signing, lose trust in the annotation, doesn't scale if there is
lots of information attached. Pro: Easy for consumer to find attached
information.
3. Attach annotation to package with relationship instead of "for" property,
create a new relationship from "foo-metadatarev2" to "annotation-rev1". Con:
Duplicate relationships, you may not know all the relationships to duplicate.
Pro: Retains integrity of annotation, shows the author of new relationship,
easy to follow for consumer, unambiguous. (meta description is: if you revise
an element you have to revise the relationships that refer to it, consumers
don't refer to relationships on previous revisions).
My summary: Decent, especially with annotation fix #3. File list carried with
the package but files don't need to be. Third parties can't add files without
"taking ownership" of the package.
Option 1B - Files are expressed as a property on the element that is of type
ID-ref<Relationship>
Revision 1
{
"type": "Package",
"id": "foo- metadatarev1",
"name": "foo",
"files": "#foo-contents"
},
{
"type": "Relationship",
"id": "foo-contents-rev1",
"relationshipType": "CONTAINS",
"from": "#foo-metadatarev1", # NOTE: This reflects what is in
the Package.files so there is a possibility of inconsistency
"to":" ["#hello-file", "#world-file"]
}
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1",
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
},
{
"type": "File",
"id": "world-file",
"name": "world"
}
Revision 2
{
"type": "Package",
"id": "foo- metadatarev2", # Had to change because files changed
"name": "foo",
"files": "#foo-contents-rev2" # Had to change because
relationship id changed
},
{
"type": "Relationship",
"id": "foo-contents-rev2", # Had to change because to changed
"relationshipType": "CONTAINS",
"from": "#foo-metadatarev2", # NOTE: This reflects what is in
the Package.files so there is a possibility of inconsistency
"to":" ["#hello-file"]
}
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1", # Oh no, orphaned.
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
}
# "world-file" element removed because it is obsolete, it could be left in but
would be orphaned in the graph
Orphaned annotation fixes:
* Same as Option 1A.
My summary: Too many cascading changes. File list pointer carried with the
package but the relationship and files don't need to be. Third parties could
amend the Relationship in a new document.
Option 1C - Files are expressed as a property on the element that is of type
Relationship
Revision 1
{
"type": "Package",
"id": "foo- metadatarev1",
"name": "foo",
"files": { # This is of type Relationship, "type", "from", and
"relationshipType" would be implied.
"id": "foo-contents-rev1",
"to":" ["#hello-file", "#world-file"]
}
},
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1",
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
},
{
"type": "File",
"id": "world-file",
"name": "world"
}
Revision 2
{
"type": "Package",
"id": "foo- metadatarev2", # Had to change because "files"
changed
"name": "foo",
"files": { # This is of type Relationship, "type", "from", and
"relationshipType" would be implied.
"id": "foo-contents-rev2", # Had to change because "to" changed
"to":" ["#hello-file"]
}
},
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1", # Oh no, orphaned.
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
}
# "world-file" element removed because it is obsolete, it could be left in but
would be orphaned in the graph
Orphaned annotation fixes:
* Same as Option 1A.
My summary: Better than option 1B, still a lot of cascading changes. File
relationships have to be carried with the Package (but the files they reference
don't need to be). Third parties can't amend the files list without "taking
ownership" of the package.
Option 1D - Files are expressed as a property on the element that is of type
File[]
Revision 1
{
"type": "Package",
"id": "foo- metadatarev1",
"name": "foo",
"files": [
{
"type": "File",
"id": "hello-file",
"name": "hello"
},
{
"type": "File",
"id": "world-file",
"name": "world"
}
]
},
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1",
"text": "Annotation example"
}
Revision 2
{
"type": "Package",
"id": "foo- metadatarev2", # Had to change because "files"
changed.
"name": "foo",
"files": [
{
"type": "File",
"id": "hello-file",
"name": "hello"
}
]
},
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1", # Oh no, orphaned.
"text": "Annotation example"
}
Orphaned annotation fixes:
* Same as Option 1A.
My summary: Succinct and all information easily accessible to consumer, doesn't
support re-use of files between packages. Files have to be carried with the
Package. Third parties can't amend the files list without "taking ownership" of
the package.
Option 2 - Files are expressed as a CONTAINS relationship from Package to File
Revision 1
{
"type": "Package",
"id": "foo- metadatarev1",
"name": "foo"
},
{
"type": "Relationship",
"id": "foo-contents-rev1",
"relationshipType": "CONTAINS",
"from": "#foo-metadatarev1",
"to":" ["#hello-file", "#world-file"]
}
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1",
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
},
{
"type": "File",
"id": "world-file",
"name": "world"
}
Revision 2
{
"type": "Package",
"id": "foo- metadatarev1",
"name": "foo"
},
{
"type": "Relationship",
"id": "foo-contents-rev2", # Had to change because "to" changed.
"relationshipType": "CONTAINS",
"from": "#foo-metadatarev1",
"to":" ["#hello-file"]
},
{ # Needed so consumers know to ignore foo-contents-rev1
"type": "Relationship",
"id": "foo-contents-amend-rev2",
"relationshipType": "AMENDS",
"from": "#foo-contents-rev2",
"to":" ["#foo-contents-rev1"]
},
{
"type": "Annotation",
"id": "annotation-rev1",
"for": "#foo-metadatarev1",
"text": "Annotation example"
},
{
"type": "File",
"id": "hello-file",
"name": "hello"
}
My summary: Impact only to changed nodes, "append only" (no copying and
re-id-ing of nodes not impacted by the revision). Consumer has to be given the
appropriate amended relationship (though they don't need any previous ones),
producer would have to identify the correct set of nodes to give the consumer,
or, if they are given extra, the consumer needs to follow the amends graph.
Nothing has to be carried with the Package. Third parties can amend the files
list without "taking ownership" of the package by creating a new document with
new relationship and amends.
Regards,
William Bartholomew (he/him) - Let's
chat<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C987a8d379b014299421b08d9e2a23630%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637790008611657485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Ll%2FzOF65Xaw3TBjYtBUWDBYNn%2B22s9SpofvQdHVBp3A%3D&reserved=0>
Principal Security Strategist
Global Cybersecurity Policy - Microsoft
My working day may not be your working day. Please don't feel obliged to reply
to this e-mail outside of your normal working hours.
From: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>> On Behalf Of Steve
Winslow via
lists.spdx.org<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spdx.org%2F&data=04%7C01%7Cwillbar%40microsoft.com%7C987a8d379b014299421b08d9e2a23630%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637790008611657485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=n5zyNeLPUa%2BWHRCeEwQCpbAlLWK5uRx6grfejtzP4pY%3D&reserved=0>
Sent: Wednesday, January 26, 2022 12:18 PM
To: Gary O'Neall <[email protected]<mailto:[email protected]>>
Cc: SPDX-list <[email protected]<mailto:[email protected]>>
Subject: [EXTERNAL] Re: [spdx-tech] Follow-up on today's tech call about the
"Amend Contains Relationship" discussion
You don't often get email from
[email protected]<mailto:[email protected]>.
Learn why this is important<http://aka.ms/LearnAboutSenderIdentification>
Hi Gary and team,
I wasn't on the tech team call, and I haven't been keeping up to date on the
SPDX 3.0 modeling discussions, so please feel free to disregard the following
if it isn't relevant...
>From the SPDX 2.2 perspective, if there were a change to the Files that a
>Package CONTAINS, then presumably the Package Verification Code would need to
>be changed accordingly. In that sense, a change to the CONTAINS for a
>Package's Files is more than just a Relationship change; rather, it changes
>the properties of the Package element itself.
Given that, of the 3 choices given, I think that from the SPDX 2.2 perspective,
it would be choice 3 -- both the Package's element and its Relationships would
change.
Perhaps not relevant if there isn't going to be a Package Verification Code (or
equivalent) for Packages in 3.0, but just throwing it out there in case it's
helpful for how to think about this.
Steve
On Wed, Jan 26, 2022 at 2:36 PM Gary O'Neall
<[email protected]<mailto:[email protected]>> wrote:
TL;DR: Please reply with any thoughts on the pros and cons to the 3 different
options for amending a contains relationship. Please try not to duplicate -
just reply with which scenario and which pro/con. If you need clarification on
the scenario or options, please start a separate thread.
Here's the option description and pros and cons from the minutes:
(1) Anytime there is a change in the contains relationship for a package,
one has to change the package element.
Pro's: Concise for producer (initial production) without having to
amend it. Improve the quality (harder to miss updating relationships). Graph
navigation easier for un-amended.
Con's: Less concise for amendments. Makes element reuse difficult /
less likely. Graph navigation is harder on amends, unless all copied.
(2) Anytime there is a change in the contains relationship, only the
relationships are changed.
Pro's: More concise for amendments. Makes element reuse easier.
Graph navigation is easier on amends.
Con's: : Less concise for producer (initial production) without
having to amend it. Lower quality - you must traverse and understand the
amends relationship to have a full understanding of what's changed. Graph
navigation harder for un-amended.
(3) Anytime there is a change in the contains relationship, the element,
the relationship or both can change.
Pro's:
Con's:
Additional context (especially for those not on the tech call today):
Last week, we discussed one of the Meta Issues - "Should we have properties
that duplicate relationships? And impact on round-tripping?" as tracked in
Issue
#21<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspdx%2Fspdx-3-model%2Fissues%2F21&data=04%7C01%7Cwillbar%40microsoft.com%7C987a8d379b014299421b08d9e2a23630%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637790008611657485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZLApIc3fETWTK8P44017WWLZ2d3uyVytKLbbBwY2kq0%3D&reserved=0>.
Based on the email conversations, we started this week's call with a discussion
if the package Element contains relationship could be changed. We concluded
that the Package Artifact contains relationships are fixed and do not change.
However, the Package Element and associated Contains Relationships may change.
This may be due to imperfect knowledge for the initial creation of the SBOM,
different use cases, better tools etc.
Since the Contains metadata may change, how would an SBOM be amended? The
specific scenario is:
1. An SBOM is produced with a Package. The package contains files, but the
SBOM is not complete.
2. An amended SBOM is produced with a better/corrected list of files it
contains.
In creating the amended SBOM, there are 3 possible ways of representing the
changed SBOM:
1. Only the Package Element is amended:
* a new Package Element is created with an Amends relationship to the
previous incorrect Package element
* the new Package Element properties are updated as appropriate
* relationships to/from the Element are replaced if appropriate
1. Only the relationships are amended or added:
* Any Contains relationships which are modified is amended by creating a
new Relationship with an Amends relationship to the previous Relationship
* Any new Contains relationship is added
1. Both - Both approaches to amending the SBOM is supported. A new Package
Element may be created, relationships may be modified, or both
-------------------------------------------------
Gary O'Neall
Principal Consultant
Source Auditor Inc.
Mobile: 408.805.0586
Email: [email protected]<mailto:[email protected]>
CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, re-transmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#4357): https://lists.spdx.org/g/Spdx-tech/message/4357
Mute This Topic: https://lists.spdx.org/mt/88724635/21656
Group Owner: [email protected]
Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-