On Fri, Feb 20, 2026, 10:10 PM Stefano Tondo <[email protected]> wrote:
> From: Stefano Tondo <[email protected]> > > When consolidating SPDX documents via expand_collection(), objects > with the same SPDX ID can appear in multiple source documents with > different levels of completeness. The previous implementation used > simple set union (self.objects |= other.objects), which would keep > an arbitrary version when duplicates existed. > > This caused data loss during consolidation, particularly affecting > externalIdentifier arrays where one version might have a basic PURL > while another has multiple PURLs with Git metadata qualifiers. > > Fix by implementing intelligent object merging that: > - Detects objects with duplicate SPDX IDs > - Compares completeness based on externalIdentifier count > - Keeps the more complete version (more externalIdentifiers) > - Preserves objects without IDs as-is > > This ensures that consolidated SBOMs contain the most complete > metadata available from all source documents. > > The bug was discovered while testing multi-PURL support where > packages can have varying externalIdentifier counts (base PURL > vs base + Git commit + Git branch PURLs), but affects any > scenario with duplicate SPDX IDs during consolidation. > This doesn't sound correct. Each generated Element should have a completely unique spdxid and only live in a single document. If that isn't the case then I think it's a bug. Can you provide a concrete example where this is happening? > Signed-off-by: Stefano Tondo <[email protected]> > --- > meta/lib/oe/sbom30.py | 47 ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 46 insertions(+), 1 deletion(-) > > diff --git a/meta/lib/oe/sbom30.py b/meta/lib/oe/sbom30.py > index 227ac51877..c77e18f4e8 100644 > --- a/meta/lib/oe/sbom30.py > +++ b/meta/lib/oe/sbom30.py > @@ -822,7 +822,52 @@ class ObjectSet(oe.spdx30.SHACLObjectSet): > if not e.externalSpdxId in imports: > imports[e.externalSpdxId] = e > > - self.objects |= other.objects > + # Merge objects intelligently: if same SPDX ID exists, keep > the one with more complete data > + # > + # WHY DUPLICATES OCCUR: When consolidating SPDX documents > (e.g., recipe -> package -> image), > + # the same package can be referenced at different build > stages, each with varying levels of > + # detail. Early stages may have basic PURLs, while later > stages add Git metadata qualifiers. > + # This is architectural - multi-stage builds naturally create > multiple representations of > + # the same entity. > + # > + # However, preserve object identity for types that get > referenced (like CreationInfo) > + # to avoid breaking serialization > + other_by_id = {} > + for obj in other.objects: > + obj_id = getattr(obj, '_id', None) > + if obj_id: > + other_by_id[obj_id] = obj > + > + self_by_id = {} > + for obj in self.objects: > + obj_id = getattr(obj, '_id', None) > + if obj_id: > + self_by_id[obj_id] = obj > + > + # Merge: for duplicate IDs, prefer the object with more > externalIdentifier entries > + # but only for Element types (not CreationInfo, Agent, Tool, > etc.) > + for obj_id, other_obj in other_by_id.items(): > + if obj_id in self_by_id: > + self_obj = self_by_id[obj_id] > + # Only replace Elements with more complete data > + # Do NOT replace CreationInfo or other supporting > types to preserve object identity > + if isinstance(self_obj, oe.spdx30.Element): > + # If both have externalIdentifier, keep the one > with more entries > + self_ext_ids = getattr(self_obj, > 'externalIdentifier', []) > + other_ext_ids = getattr(other_obj, > 'externalIdentifier', []) > + if len(other_ext_ids) > len(self_ext_ids): > + # Replace self object with other (more > complete) object > + self.objects.discard(self_obj) > + self.objects.add(other_obj) > + # For non-Element types (CreationInfo, Agent, Tool), > keep existing to preserve identity > + else: > + # New object, just add it > + self.objects.add(other_obj) > + > + # Add any objects without IDs > + for obj in other.objects: > + if not getattr(obj, '_id', None): > + self.objects.add(obj) > > for o in add_objectsets: > merge_doc(o) > -- > 2.53.0 > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#231619): https://lists.openembedded.org/g/openembedded-core/message/231619 Mute This Topic: https://lists.openembedded.org/mt/117922738/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
