On Thu, Jun 4, 2026, 5:08 AM Marta Rybczynska <[email protected]> wrote:
> > > On Wed, Jun 3, 2026 at 7:39 PM Joshua Watt <[email protected]> wrote: > >> On Wed, Jun 3, 2026 at 10:57 AM Mark Hatle >> <[email protected]> wrote: >> > >> > >> > >> > On 6/3/26 6:35 AM, Richard Purdie via lists.openembedded.org wrote: >> > > On Wed, 2026-06-03 at 13:22 +0200, Marta Rybczynska via >> lists.openembedded.org >> > > wrote: >> > >> >> > >> Because of this, I believe the minimum information that should be >> included in >> > >> the generated SBOM is: >> > >> >> > >> * The list of layers used for the build. >> > >> * The version or revision of each layer. >> > >> * Potentially the exact source revision used for each layer >> repository. >> > >> >> > >> It may also be worth considering whether layers should have a more >> explicit >> > >> and consistently available versioning mechanism... (but that in step >> 2) >> > >> >> > >> When we have the list of layers, we also have information on >> bbappends, so >> > >> that does not need to be directly present. >> > >> >> > >> Some of this (or most) is available with >> SPDX_INCLUDE_BUILD_VARIABLES = "1" and >> > >> SPDX_INCLUDE_BITBAKE_PARENT_BUILD = "1", but in a non-standard and >> > >> non-portable way. >> > >> >> > >> What do you think? >> > >> >> > > I'm worried about this since it implies every time you update your >> metadata, you >> > > have to regenerate every spdx file. You update oe-core, it changes >> the top level >> > > README and everything would then have to rebuild. It also means the >> spdx sstate >> > > is never reusable without the exact same layer config, you can't add >> or remove >> > > any layer and have reuse work, even if that layer has no effect on >> the recipes >> > > in question. >> > > >> > > So whilst I see why you might want this information in there, it would >> > > effectively destroy some of the key things OE brings to the builds. >> > >> > This is really what the task hashing is supposed to accomplish, combine >> the >> > build environment, metadata and related into something that indicates a >> build >> > changed or not. While it doesn't indicate the contents of the >> configuration, it >> > does indicate it changed. >> > >> > Is there any way we can incorporate not the hash itself, but what the >> hash >> > indicates that something built into the recipe SDPX files? (I can't >> think of >> > anything right off the top of my head.) >> >> The siginfo would contain that, if we could figure out a way to inject >> that into the SBoM. The biggest hurdle there is that it would have to >> be per sstate task instead of the single "create_spdx" task we have >> today. >> >> > >> > When the recipe SPDX files are rolled up into a filesystem SBOM though, >> we could >> > include specific build information, as we know that a "new" build would >> change >> > the filesystem hashes and cause a rebuild anyway. >> >> This is always a tricky problem. There is a lot to be said for our >> builds always being reproducible, but my impression is also that >> regulatory agencies also don't understand the nuanced repercussions >> when they ask for things like this (see SBoM timestamp requirements >> for a classic example). I also fully expect that this information will >> be wanted, so our choice is more or less to say "sorry, we won't do >> that, figure it out yourself", or figure out some compromise that >> allows it to work. I'm not sure which is the correct one, or even >> which one I would vouch for ATM. While the taskhashes do conceptually >> cover this, I don't think it's going to satisfy anyone trying to do an >> audit unless we can do something like the siginfo described above (and >> even then, I would postulate that to an outsider, it's not clear why >> that is actually sufficient or how to correctly use that to, for >> example, tell that there was a security vulnerability in bitbake). >> >> There is another angle to think about also, which is that this could >> actually make sstate _more_ usable from a supply chain perspective. >> Before SPDX, you had no idea of the provenance of a given sstate >> object; you just downloaded it from the server and used it, having no >> idea who built it and under what conditions. This encouraged clean >> builds with no sstate reuse when you needed that build where you >> actually wanted to know the provenance for everything. With SPDX, we >> now have at least a slightly better idea, since >> SPDX_INCLUDE_BITBAKE_PARENT_BUILD will include the parent bitbake >> build from sstate, giving at least some provenance. It's far from >> perfect for many, many reasons, but it's better than nothing. The >> layer SHAs would also fall into this category. >> >> I have done some work to see if we could include the SHA-1 of the >> layer in the SBoM, similar to what we do in the buildinfo file. The >> hashes themselves are not too hard (reproducibility concerns aside), >> except that you really also need the upstream URL to go with it, which >> gets very tricky when the repository has multiple remotes; I'm not >> sure the best way to deal with that. >> >> If you really want more precise information about every possible build >> step that was performed (including things outside of the knowledge of >> bitbake), that starts to get more in the full SLSA territory and there >> are already tools to do that, such as >> >> https://in-toto.readthedocs.io/en/latest/command-line-tools/in-toto-record.html >> , >> so one choice would be to tell people to just use that, although it >> doesn't really give the sstate provenance described above. >> >> Again, this is just some thoughts I've had on this for a while. I'm >> not ready to give an opinion as to a direction. >> >> > > I think there is an assumption here that I didn't make: that I'm asking to > store the build state and build hashes of layers. > > This isn't useful without other information (host configuration, config > files etc). > As each layer can potentially influence every single package, I do not > thing it > makes sense to store those dependencies. I do not see the need at this > time. > > What I'm asking about is the layer name and its version. What could be > useful > is the exact hash and the download location (same things like for all other > sources). > > That could be at the high level of the SBOM (image dependency). > > And at the same stage we can also add the bootloader, device tree and so on > (see https://lists.openembedded.org/g/openembedded-core/message/237625) > FYI I started looking at capturing the output from do_deploy et. al. yesterday, so I hopefully should have something to help with this soon. > Kind regards, > Marta >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#2391): https://lists.openembedded.org/g/openembedded-architecture/message/2391 Mute This Topic: https://lists.openembedded.org/mt/119626760/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
