On Fri, Feb 20, 2026, 10:10 PM Stefano Tondo <[email protected]> wrote:

> From: Stefano Tondo <[email protected]>
>
> When consolidating SPDX documents via expand_collection(), objects
> with the same SPDX ID can appear in multiple source documents with
> different levels of completeness. The previous implementation used
> simple set union (self.objects |= other.objects), which would keep
> an arbitrary version when duplicates existed.
>
> This caused data loss during consolidation, particularly affecting
> externalIdentifier arrays where one version might have a basic PURL
> while another has multiple PURLs with Git metadata qualifiers.
>
> Fix by implementing intelligent object merging that:
> - Detects objects with duplicate SPDX IDs
> - Compares completeness based on externalIdentifier count
> - Keeps the more complete version (more externalIdentifiers)
> - Preserves objects without IDs as-is
>
> This ensures that consolidated SBOMs contain the most complete
> metadata available from all source documents.
>
> The bug was discovered while testing multi-PURL support where
> packages can have varying externalIdentifier counts (base PURL
> vs base + Git commit + Git branch PURLs), but affects any
> scenario with duplicate SPDX IDs during consolidation.
>

This doesn't sound correct. Each generated Element should have a completely
unique spdxid and only live in a single document. If that isn't the case
then I think it's a bug. Can you provide a concrete example where this is
happening?



> Signed-off-by: Stefano Tondo <[email protected]>
> ---
>  meta/lib/oe/sbom30.py | 47 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/meta/lib/oe/sbom30.py b/meta/lib/oe/sbom30.py
> index 227ac51877..c77e18f4e8 100644
> --- a/meta/lib/oe/sbom30.py
> +++ b/meta/lib/oe/sbom30.py
> @@ -822,7 +822,52 @@ class ObjectSet(oe.spdx30.SHACLObjectSet):
>                  if not e.externalSpdxId in imports:
>                      imports[e.externalSpdxId] = e
>
> -            self.objects |= other.objects
> +            # Merge objects intelligently: if same SPDX ID exists, keep
> the one with more complete data
> +            #
> +            # WHY DUPLICATES OCCUR: When consolidating SPDX documents
> (e.g., recipe -> package -> image),
> +            # the same package can be referenced at different build
> stages, each with varying levels of
> +            # detail. Early stages may have basic PURLs, while later
> stages add Git metadata qualifiers.
> +            # This is architectural - multi-stage builds naturally create
> multiple representations of
> +            # the same entity.
> +            #
> +            # However, preserve object identity for types that get
> referenced (like CreationInfo)
> +            # to avoid breaking serialization
> +            other_by_id = {}
> +            for obj in other.objects:
> +                obj_id = getattr(obj, '_id', None)
> +                if obj_id:
> +                    other_by_id[obj_id] = obj
> +
> +            self_by_id = {}
> +            for obj in self.objects:
> +                obj_id = getattr(obj, '_id', None)
> +                if obj_id:
> +                    self_by_id[obj_id] = obj
> +
> +            # Merge: for duplicate IDs, prefer the object with more
> externalIdentifier entries
> +            # but only for Element types (not CreationInfo, Agent, Tool,
> etc.)
> +            for obj_id, other_obj in other_by_id.items():
> +                if obj_id in self_by_id:
> +                    self_obj = self_by_id[obj_id]
> +                    # Only replace Elements with more complete data
> +                    # Do NOT replace CreationInfo or other supporting
> types to preserve object identity
> +                    if isinstance(self_obj, oe.spdx30.Element):
> +                        # If both have externalIdentifier, keep the one
> with more entries
> +                        self_ext_ids = getattr(self_obj,
> 'externalIdentifier', [])
> +                        other_ext_ids = getattr(other_obj,
> 'externalIdentifier', [])
> +                        if len(other_ext_ids) > len(self_ext_ids):
> +                            # Replace self object with other (more
> complete) object
> +                            self.objects.discard(self_obj)
> +                            self.objects.add(other_obj)
> +                    # For non-Element types (CreationInfo, Agent, Tool),
> keep existing to preserve identity
> +                else:
> +                    # New object, just add it
> +                    self.objects.add(other_obj)
> +
> +            # Add any objects without IDs
> +            for obj in other.objects:
> +                if not getattr(obj, '_id', None):
> +                    self.objects.add(obj)
>
>          for o in add_objectsets:
>              merge_doc(o)
> --
> 2.53.0
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#231619): 
https://lists.openembedded.org/g/openembedded-core/message/231619
Mute This Topic: https://lists.openembedded.org/mt/117922738/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to