On Thu, Jun 4, 2026, 5:08 AM Marta Rybczynska <[email protected]> wrote:

>
>
> On Wed, Jun 3, 2026 at 7:39 PM Joshua Watt <[email protected]> wrote:
>
>> On Wed, Jun 3, 2026 at 10:57 AM Mark Hatle
>> <[email protected]> wrote:
>> >
>> >
>> >
>> > On 6/3/26 6:35 AM, Richard Purdie via lists.openembedded.org wrote:
>> > > On Wed, 2026-06-03 at 13:22 +0200, Marta Rybczynska via
>> lists.openembedded.org
>> > > wrote:
>> > >>
>> > >> Because of this, I believe the minimum information that should be
>> included in
>> > >> the generated SBOM is:
>> > >>
>> > >>   * The list of layers used for the build.
>> > >>   * The version or revision of each layer.
>> > >>   * Potentially the exact source revision used for each layer
>> repository.
>> > >>
>> > >> It may also be worth considering whether layers should have a more
>> explicit
>> > >> and consistently available versioning mechanism... (but that in step
>> 2)
>> > >>
>> > >> When we have the list of layers, we also have information on
>> bbappends, so
>> > >> that does not need to be directly present.
>> > >>
>> > >> Some of this (or most) is available with
>> SPDX_INCLUDE_BUILD_VARIABLES = "1" and
>> > >> SPDX_INCLUDE_BITBAKE_PARENT_BUILD = "1", but in a non-standard and
>> > >> non-portable way.
>> > >>
>> > >> What do you think?
>> > >>
>> > > I'm worried about this since it implies every time you update your
>> metadata, you
>> > > have to regenerate every spdx file. You update oe-core, it changes
>> the top level
>> > > README and everything would then have to rebuild. It also means the
>> spdx sstate
>> > > is never reusable without the exact same layer config, you can't add
>> or remove
>> > > any layer and have reuse work, even if that layer has no effect on
>> the recipes
>> > > in question.
>> > >
>> > > So whilst I see why you might want this information in there, it would
>> > > effectively destroy some of the key things OE brings to the builds.
>> >
>> > This is really what the task hashing is supposed to accomplish, combine
>> the
>> > build environment, metadata and related into something that indicates a
>> build
>> > changed or not.  While it doesn't indicate the contents of the
>> configuration, it
>> > does indicate it changed.
>> >
>> > Is there any way we can incorporate not the hash itself, but what the
>> hash
>> > indicates that something built into the recipe SDPX files?  (I can't
>> think of
>> > anything right off the top of my head.)
>>
>> The siginfo would contain that, if we could figure out a way to inject
>> that into the SBoM. The biggest hurdle there is that it would have to
>> be per sstate task instead of the single "create_spdx" task we have
>> today.
>>
>> >
>> > When the recipe SPDX files are rolled up into a filesystem SBOM though,
>> we could
>> > include specific build information, as we know that a "new" build would
>> change
>> > the filesystem hashes and cause a rebuild anyway.
>>
>> This is always a tricky problem. There is a lot to be said for our
>> builds always being reproducible, but my impression is also that
>> regulatory agencies also don't understand the nuanced repercussions
>> when they ask for things like this (see SBoM timestamp requirements
>> for a classic example). I also fully expect that this information will
>> be wanted, so our choice is more or less to say "sorry, we won't do
>> that, figure it out yourself", or figure out some compromise that
>> allows it to work. I'm not sure which is the correct one, or even
>> which one I would vouch for ATM. While the taskhashes do conceptually
>> cover this, I don't think it's going to satisfy anyone trying to do an
>> audit unless we can do something like the siginfo described above (and
>> even then, I would postulate that to an outsider, it's not clear why
>> that is actually sufficient or how to correctly use that to, for
>> example, tell that there was a security vulnerability in bitbake).
>>
>> There is another angle to think about also, which is that this could
>> actually make sstate _more_ usable from a supply chain perspective.
>> Before SPDX, you had no idea of the provenance of a given sstate
>> object; you just downloaded it from the server and used it, having no
>> idea who built it and under what conditions. This encouraged clean
>> builds with no sstate reuse when you needed that build where you
>> actually wanted to know the provenance for everything. With SPDX, we
>> now have at least a slightly better idea, since
>> SPDX_INCLUDE_BITBAKE_PARENT_BUILD will include the parent bitbake
>> build from sstate, giving at least some provenance. It's far from
>> perfect for many, many reasons, but it's better than nothing. The
>> layer SHAs would also fall into this category.
>>
>> I have done some work to see if we could include the SHA-1 of the
>> layer in the SBoM, similar to what we do in the buildinfo file. The
>> hashes themselves are not too hard (reproducibility concerns aside),
>> except that you really also need the upstream URL to go with it, which
>> gets very tricky when the repository has multiple remotes; I'm not
>> sure the best way to deal with that.
>>
>> If you really want more precise information about every possible build
>> step that was performed (including things outside of the knowledge of
>> bitbake), that starts to get more in the full SLSA territory and there
>> are already tools to do that, such as
>>
>> https://in-toto.readthedocs.io/en/latest/command-line-tools/in-toto-record.html
>> ,
>> so one choice would be to tell people to just use that, although it
>> doesn't really give the sstate provenance described above.
>>
>> Again, this is just some thoughts I've had on this for a while. I'm
>> not ready to give an opinion as to a direction.
>>
>>
>
> I think there is an assumption here that I didn't make: that I'm asking to
> store the build state and build hashes of layers.
>
> This isn't useful without other information (host configuration, config
> files etc).
> As each layer can potentially influence every single package, I do not
> thing it
> makes sense to store those dependencies. I do not see the need at this
> time.
>
> What I'm asking about is the layer name and its version. What could be
> useful
> is the exact hash and the download location (same things like for all other
> sources).
>
> That could be at the high level of the SBOM (image dependency).
>
> And at the same stage we can also add the bootloader, device tree and so on
> (see https://lists.openembedded.org/g/openembedded-core/message/237625)
>

FYI I started looking at capturing the output from do_deploy et. al.
yesterday, so I hopefully should have something to help with this soon.



> Kind regards,
> Marta
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#2391): 
https://lists.openembedded.org/g/openembedded-architecture/message/2391
Mute This Topic: https://lists.openembedded.org/mt/119626760/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to