On Wed, Jun 3, 2026 at 10:57 AM Mark Hatle <[email protected]> wrote: > > > > On 6/3/26 6:35 AM, Richard Purdie via lists.openembedded.org wrote: > > On Wed, 2026-06-03 at 13:22 +0200, Marta Rybczynska via > > lists.openembedded.org > > wrote: > >> > >> Because of this, I believe the minimum information that should be included > >> in > >> the generated SBOM is: > >> > >> * The list of layers used for the build. > >> * The version or revision of each layer. > >> * Potentially the exact source revision used for each layer repository. > >> > >> It may also be worth considering whether layers should have a more explicit > >> and consistently available versioning mechanism... (but that in step 2) > >> > >> When we have the list of layers, we also have information on bbappends, so > >> that does not need to be directly present. > >> > >> Some of this (or most) is available with SPDX_INCLUDE_BUILD_VARIABLES = > >> "1" and > >> SPDX_INCLUDE_BITBAKE_PARENT_BUILD = "1", but in a non-standard and > >> non-portable way. > >> > >> What do you think? > >> > > I'm worried about this since it implies every time you update your > > metadata, you > > have to regenerate every spdx file. You update oe-core, it changes the top > > level > > README and everything would then have to rebuild. It also means the spdx > > sstate > > is never reusable without the exact same layer config, you can't add or > > remove > > any layer and have reuse work, even if that layer has no effect on the > > recipes > > in question. > > > > So whilst I see why you might want this information in there, it would > > effectively destroy some of the key things OE brings to the builds. > > This is really what the task hashing is supposed to accomplish, combine the > build environment, metadata and related into something that indicates a build > changed or not. While it doesn't indicate the contents of the configuration, > it > does indicate it changed. > > Is there any way we can incorporate not the hash itself, but what the hash > indicates that something built into the recipe SDPX files? (I can't think of > anything right off the top of my head.)
The siginfo would contain that, if we could figure out a way to inject that into the SBoM. The biggest hurdle there is that it would have to be per sstate task instead of the single "create_spdx" task we have today. > > When the recipe SPDX files are rolled up into a filesystem SBOM though, we > could > include specific build information, as we know that a "new" build would change > the filesystem hashes and cause a rebuild anyway. This is always a tricky problem. There is a lot to be said for our builds always being reproducible, but my impression is also that regulatory agencies also don't understand the nuanced repercussions when they ask for things like this (see SBoM timestamp requirements for a classic example). I also fully expect that this information will be wanted, so our choice is more or less to say "sorry, we won't do that, figure it out yourself", or figure out some compromise that allows it to work. I'm not sure which is the correct one, or even which one I would vouch for ATM. While the taskhashes do conceptually cover this, I don't think it's going to satisfy anyone trying to do an audit unless we can do something like the siginfo described above (and even then, I would postulate that to an outsider, it's not clear why that is actually sufficient or how to correctly use that to, for example, tell that there was a security vulnerability in bitbake). There is another angle to think about also, which is that this could actually make sstate _more_ usable from a supply chain perspective. Before SPDX, you had no idea of the provenance of a given sstate object; you just downloaded it from the server and used it, having no idea who built it and under what conditions. This encouraged clean builds with no sstate reuse when you needed that build where you actually wanted to know the provenance for everything. With SPDX, we now have at least a slightly better idea, since SPDX_INCLUDE_BITBAKE_PARENT_BUILD will include the parent bitbake build from sstate, giving at least some provenance. It's far from perfect for many, many reasons, but it's better than nothing. The layer SHAs would also fall into this category. I have done some work to see if we could include the SHA-1 of the layer in the SBoM, similar to what we do in the buildinfo file. The hashes themselves are not too hard (reproducibility concerns aside), except that you really also need the upstream URL to go with it, which gets very tricky when the repository has multiple remotes; I'm not sure the best way to deal with that. If you really want more precise information about every possible build step that was performed (including things outside of the knowledge of bitbake), that starts to get more in the full SLSA territory and there are already tools to do that, such as https://in-toto.readthedocs.io/en/latest/command-line-tools/in-toto-record.html, so one choice would be to tell people to just use that, although it doesn't really give the sstate provenance described above. Again, this is just some thoughts I've had on this for a while. I'm not ready to give an opinion as to a direction. > > --Mark > > > Cheers, > > > > Richard > > > > > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#2380): https://lists.openembedded.org/g/openembedded-architecture/message/2380 Mute This Topic: https://lists.openembedded.org/mt/119626760/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
