On Wed, Jun 3, 2026 at 10:57 AM Mark Hatle
<[email protected]> wrote:
>
>
>
> On 6/3/26 6:35 AM, Richard Purdie via lists.openembedded.org wrote:
> > On Wed, 2026-06-03 at 13:22 +0200, Marta Rybczynska via 
> > lists.openembedded.org
> > wrote:
> >>
> >> Because of this, I believe the minimum information that should be included 
> >> in
> >> the generated SBOM is:
> >>
> >>   * The list of layers used for the build.
> >>   * The version or revision of each layer.
> >>   * Potentially the exact source revision used for each layer repository.
> >>
> >> It may also be worth considering whether layers should have a more explicit
> >> and consistently available versioning mechanism... (but that in step 2)
> >>
> >> When we have the list of layers, we also have information on bbappends, so
> >> that does not need to be directly present.
> >>
> >> Some of this (or most) is available with SPDX_INCLUDE_BUILD_VARIABLES = 
> >> "1" and
> >> SPDX_INCLUDE_BITBAKE_PARENT_BUILD = "1", but in a non-standard and
> >> non-portable way.
> >>
> >> What do you think?
> >>
> > I'm worried about this since it implies every time you update your 
> > metadata, you
> > have to regenerate every spdx file. You update oe-core, it changes the top 
> > level
> > README and everything would then have to rebuild. It also means the spdx 
> > sstate
> > is never reusable without the exact same layer config, you can't add or 
> > remove
> > any layer and have reuse work, even if that layer has no effect on the 
> > recipes
> > in question.
> >
> > So whilst I see why you might want this information in there, it would
> > effectively destroy some of the key things OE brings to the builds.
>
> This is really what the task hashing is supposed to accomplish, combine the
> build environment, metadata and related into something that indicates a build
> changed or not.  While it doesn't indicate the contents of the configuration, 
> it
> does indicate it changed.
>
> Is there any way we can incorporate not the hash itself, but what the hash
> indicates that something built into the recipe SDPX files?  (I can't think of
> anything right off the top of my head.)

The siginfo would contain that, if we could figure out a way to inject
that into the SBoM. The biggest hurdle there is that it would have to
be per sstate task instead of the single "create_spdx" task we have
today.

>
> When the recipe SPDX files are rolled up into a filesystem SBOM though, we 
> could
> include specific build information, as we know that a "new" build would change
> the filesystem hashes and cause a rebuild anyway.

This is always a tricky problem. There is a lot to be said for our
builds always being reproducible, but my impression is also that
regulatory agencies also don't understand the nuanced repercussions
when they ask for things like this (see SBoM timestamp requirements
for a classic example). I also fully expect that this information will
be wanted, so our choice is more or less to say "sorry, we won't do
that, figure it out yourself", or figure out some compromise that
allows it to work. I'm not sure which is the correct one, or even
which one I would vouch for ATM. While the taskhashes do conceptually
cover this, I don't think it's going to satisfy anyone trying to do an
audit unless we can do something like the siginfo described above (and
even then, I would postulate that to an outsider, it's not clear why
that is actually sufficient or how to correctly use that to, for
example, tell that there was a security vulnerability in bitbake).

There is another angle to think about also, which is that this could
actually make sstate _more_ usable from a supply chain perspective.
Before SPDX, you had no idea of the provenance of a given sstate
object; you just downloaded it from the server and used it, having no
idea who built it and under what conditions. This encouraged clean
builds with no sstate reuse when you needed that build where you
actually wanted to know the provenance for everything. With SPDX, we
now have at least a slightly better idea, since
SPDX_INCLUDE_BITBAKE_PARENT_BUILD will include the parent bitbake
build from sstate, giving at least some provenance. It's far from
perfect for many, many reasons, but it's better than nothing. The
layer SHAs would also fall into this category.

I have done some work to see if we could include the SHA-1 of the
layer in the SBoM, similar to what we do in the buildinfo file. The
hashes themselves are not too hard (reproducibility concerns aside),
except that you really also need the upstream URL to go with it, which
gets very tricky when the repository has multiple remotes; I'm not
sure the best way to deal with that.

If you really want more precise information about every possible build
step that was performed (including things outside of the knowledge of
bitbake), that starts to get more in the full SLSA territory and there
are already tools to do that, such as
https://in-toto.readthedocs.io/en/latest/command-line-tools/in-toto-record.html,
so one choice would be to tell people to just use that, although it
doesn't really give the sstate provenance described above.

Again, this is just some thoughts I've had on this for a while. I'm
not ready to give an opinion as to a direction.


>
> --Mark
>
> > Cheers,
> >
> > Richard
> >
> >
> >
> > 
> >
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#2380): 
https://lists.openembedded.org/g/openembedded-architecture/message/2380
Mute This Topic: https://lists.openembedded.org/mt/119626760/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to