Thanks, Dave.
The big problem I see today, with regard to vulnerability reporting on SBOM components, exists within NIST NVD. I’ve proposed that NIST NVD include an “SBOM view” as a supplement to existing search capabilities into the NVD database where each cve correlates with a specific “registered” SBOM and component. The only gap I see with regard to the NTIA “primary key” elements is within the SPDX File object, which lacks a FileVersion element. This presents a fidelity issue when converting from CycloneDX, which has a version element in File objects, to SPDX which doesn’t have a version element on the File object. Thanks, Dick Brooks <https://reliableenergyanalytics.com/products> Never trust software, always verify and report! ™ <http://www.reliableenergyanalytics.com/> http://www.reliableenergyanalytics.com Email: <mailto:[email protected]> [email protected] Tel: +1 978-696-1788 From: David Kemp <[email protected]> Sent: Friday, January 21, 2022 11:54 AM To: [email protected] Cc: William Bartholomew (CELA) <[email protected]>; [email protected]; SPDX-list <[email protected]> Subject: Re: [EXTERNAL] Re: [spdx-tech] Relationships, properties, and syntactic sugar Dick, Yes. For real life artifacts (packages), SPDX could demand that PURLs contain Supplier+ProductName+Version_Timestamp (which seems unlikely) or require that SPDX tooling generate ArtifactURLs that contain PURL + Version_SBOMTimestamp if the PURL can't be guaranteed to be unique. I'm a data guy, not an operations guy, so I don't have an opinion on how to achieve sufficiently descriptive ArtifactURLs. We can have multiple Elements by different authors at different times describing the "same" artifact, but the definition of "same" is beyond my zone of experience. With respect to the example elements below, there is already an AMENDS relationship so it would be possible to create a relationship saying element 492a27 AMENDS element aa9c35 by providing non-conflicting additional information, SPDX could also create a SUPERSEDES relationship that says an element both amends and invalidates a previous element, because the additional information conflicts with the previous. If Package 492a27 were "incomplete" then an additional file would not conflict, but if it were asserted to be complete, the only option would be to supersede it. Dave On Fri, Jan 21, 2022 at 10:17 AM Dick Brooks <[email protected] <mailto:[email protected]> > wrote: On the consumer side, a risk assessment requires that an SBOM be provided by only one verifiable, authorized party for a given Supplier+ProductName+Version_SBOMTimestamp so the scenario of having three SBOM’s referring to the same item would present issues with regard to Supplier and SBOM verification. Thanks, Dick Brooks <https://reliableenergyanalytics.com/products> Never trust software, always verify and report! ™ <http://www.reliableenergyanalytics.com/> http://www.reliableenergyanalytics.com Email: <mailto:[email protected]> [email protected] Tel: +1 978-696-1788 From: David Kemp <[email protected] <mailto:[email protected]> > Sent: Friday, January 21, 2022 9:55 AM To: William Bartholomew (CELA) <[email protected] <mailto:[email protected]> > Cc: [email protected] <mailto:[email protected]> ; [email protected] <mailto:[email protected]> ; SPDX-list <[email protected] <mailto:[email protected]> > Subject: Re: [EXTERNAL] Re: [spdx-tech] Relationships, properties, and syntactic sugar [William] I also think you’ve oversimplified the question “does Package X contain File Y”, the question is really “given this set of metadata I have about Package X and File Y does Package X contain File Y”. This distinction is important because metadata is a representation of knowledge about an artifact, there could be multiple representations of knowledge (from different creators, different points in time, different levels of details) and you can only answer that question based on the set of inputs you’re looking at. Microsoft and VMware could both create an SPDX document describing the same package and one could say a file is in there and one could say a file isn’t, then depending on which I look at I get different answers, and if I look at both I get “Maybe” (which should then trigger me doing investigation and producing my own correct SPDX document or asking the incorrect vendor for a corrected SPDX document), then when I bring the correction into scope my Maybe will shift to either Yes or No depending on the “truth”. Yes, I tried to fully capture the question in the first message but deliberately simplified (or oversimplified) it later. The full answers you could get from three SBOMs are: 1) In SBOM A aa9c34 (Element IRI) on [Wednesday Microsoft] (creation info) says aa9c35:Package X (id:ArtifactURL / PURL) does not contain nil:File 3 2) In SBOM B 492a26 on Thursday Microsoft says 492a27:Package X contains 492a28:File 3 3) in SBOM C b298e4 on Friday VMWare says b298e5:Package X contains b298e6:File 4 The Elements are: aa9c34 SBOM A aa9c35 Package X = (File1, File2) aa9c36 File 1 aa9c37 File 2 492a26 SBOM B 492a27 Package X = (File1, File2, File3) 492a28 File 3 b298e4 SBOM C b298e5 Package X = (File1, File2, File3, File4) b298e6 File 4 Those three SBOMS are unambiguous and can be represented by either collection property or CONTAINS relationship. As you say, the property and relationship are syntactic sugar, one can be converted to the other when serializing/deserializing. But in addition you can create relationships that cannot be represented by the collection property. The question is do those non-collection relationships represent valuable useful flexibility, or do they represent errors/nonsense? My position is that a CONTAINS relationship between aa9c35:Package X (Microsoft) and b298e6:File4 is nonsense, because it says "On Friday VMWare says that Microsoft's Wednesday description of Package X contains File4", which is a lie. So if someone can create a set of elements that 1) cannot be represented by the collection property and 2) are not a lie and do provide useful flexibility, that would be helpful in deciding whether the COLLECTION relationship is redundant. Dave On Thu, Jan 20, 2022 at 3:35 PM William Bartholomew (CELA) <[email protected] <mailto:[email protected]> > wrote: CIL From: David Kemp <[email protected] <mailto:[email protected]> > Sent: Thursday, January 20, 2022 11:07 AM To: William Bartholomew (CELA) <[email protected] <mailto:[email protected]> > Cc: [email protected] <mailto:[email protected]> ; [email protected] <mailto:[email protected]> ; SPDX-list <[email protected] <mailto:[email protected]> > Subject: [EXTERNAL] Re: [spdx-tech] Relationships, properties, and syntactic sugar William, I agree with your explanation to Nisha, but it is incomplete. Another way of thinking about it, is that relationships are about the structure of the artifacts and collections are about the structure of the metadata about the artifacts. But there is also a requirement that metadata about artifacts must describe the artifacts. If one file DEPENDS_ON another file, that does not describe the structure of artifacts or structure of metadata: foo.exe does not have a structural relationship to foo.cpp. So DEPENDS_ON is always "a logical relationship that is independent of physical structure" because there is no physical structure between source and executable. [William] It doesn’t invalidate your argument but just so others don’t get confused, this wouldn’t be a DEPENDS_ON relationship, it would be a GENERATED_FROM relationship. I agree that there are different characterizations of relationships, structural and non-structural. It’s possible that CONTAINS and CONTAINED_BY are the only structural relationships, STATIC_LINK could be considered structural if you were to go finer grained than a file, PACKAGE_OF could be considered structural. I think there’s a good argument here for grouping relationships in the spec. On the other hand, CONTAINS metadata cannot be independent of the containing structure of artifacts. If a package artifact contains a file artifact, then the package metadata must contain the file metadata. [William] This has not been true in SPDX 2.2 and I haven’t seen a push to change it for SPDX 3.0. I can have a Package element without choosing to describe all of the files in it, I imagine this will actually become more common when SPDX is used to communicate non-SBOM information but you want to carry along extra context about the package. For example, if I was using SPDX to transport vulnerability information, I might want to include information about the packages those vulnerabilities relate to, but I wouldn’t want to be obligated to break those packages apart and describe all the files within them. Similarly, if generating an SPDX file from a package.lock I might know the package identities, so I can create Package elements, but I don’t necessarily know the files in those packages. This is one of the reasons that relationships have a way of describing completeness, a relationship might be known incomplete, known complete, or unknown completeness. SPDX also today allows you to describe the relationship from either end, you can have a package with no files and then have file elements that have a CONTAINED_BY relationship to the package (I would generally avoid this but it’s possible today). The problem with making the CONTAINS metadata structure independent of physical structure is that you aren't guaranteed a straight answer to, for example, "does Package X contain File Y?" Yes, No, and Maybe (incomplete) are all straight answers. But "Yes and No" is not a straight answer - it can't be both. Using a CONTAINS metadata structure that is decoupled from the artifact structure allows creation of inconsistent/conflicting answers. [William] I’m struggling with this because I feel the whole reason we made Package no longer inherit from Collection was to address the reality that CONTAINS and the elements within a Collection have different semantic meanings, that was the whole breakthrough we made a few weeks ago that let us move forward. I also think you’ve oversimplified the question “does Package X contain File Y”, the question is really “given this set of metadata I have about Package X and File Y does Package X contain File Y”. This distinction is important because metadata is a representation of knowledge about an artifact, there could be multiple representations of knowledge (from different creators, different points in time, different levels of details) and you can only answer that question based on the set of inputs you’re looking at. Microsoft and VMware could both create an SPDX document describing the same package and one could say a file is in there and one could say a file isn’t, then depending on which I look at I get different answers, and if I look at both I get “Maybe” (which should then trigger me doing investigation and producing my own correct SPDX document or asking the incorrect vendor for a corrected SPDX document), then when I bring the correction into scope my Maybe will shift to either Yes or No depending on the “truth”. So I'd answer Nisha's question: I wonder if the relationship “CONTAINS” is redundant in the current SPDX 3.0 model. with "Yes". [William] I still think No 😊. Dave On Thu, Jan 20, 2022 at 11:17 AM William Bartholomew (CELA) <[email protected] <mailto:[email protected]> > wrote: Nisha: By “has” are you referring to the “element” association from Collection to Element? If so (and this is still an outstanding discussion item on our punch list), this is currently proposed to be used for physically grouping elements rather than creating logical relationships (this was how we were able to stop Package inheriting from collection). CONTAINS is a logical relationship that is independent from the physical structure. Another way of thinking about it, is that relationships are about the structure of the artifacts and collections are about the structure of the metadata about the artifacts. Dick: We wouldn’t expect Package Element IRIs to be used in NVD, we promoted Package URL to be a top level property of artifacts and we’d like to see these incorporated into CVEs (they could be incorporated into the existing CVE data model by adding them as a reference but in the future I’d like to see them promoted to be a top level property – disclosure: I’m one of the maintainers on Package URL). We also support attaching multiple additional identifiers to an artifact via External References and CPE is one of the supported identifiers that can be added which adds a strong tie to NVD. One of the big challenges with NVD data is the granularity, there is often insufficient information to understand exactly which physical artifacts are affected, for example, if a vulnerability just lists the source repository of the vulnerable project what are all the package identifiers that were produced from that source repository? Or, if the vulnerability lists one package (such as an npm) how else was that same vulnerable version repackaged? Relationships and multiple identifiers attached to artifacts should allow us to build a graph that can be navigated even when a CVE only references one node in that graph. Regards, William Bartholomew (he/him) – <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Foutlook.office.com%2Ffindtime%2Fvote%3Fbook%3Dwillbar%40microsoft.com%26anonymous%26ep%3Dplink&data=04%7C01%7Cwillbar%40microsoft.com%7C9209e4eee4a14fc4ebbf08d9dc480f01%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637783024336052922%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ceXHMli18iqlxjPe9j8Mbrwldh8H7e7t3V4eLvMDgK0%3D&reserved=0> Let’s chat Principal Security Strategist Global Cybersecurity Policy – Microsoft -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#4326): https://lists.spdx.org/g/Spdx-tech/message/4326 Mute This Topic: https://lists.spdx.org/mt/88568831/21656 Group Owner: [email protected] Unsubscribe: https://lists.spdx.org/g/Spdx-tech/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
