On Thu, Jun 27, 2024 at 10:33 AM Joshua Watt <[email protected]> wrote: > > On Tue, Jun 25, 2024 at 12:41 PM Mark Hatle > <[email protected]> wrote: > > > > Comments inline below > > > > On 6/24/24 2:10 PM, Joshua Watt wrote: > -- snip -- > > > + > > > +SPDX_BUILD_HOST[doc] = "The base variable name to describe the build > > > host on \ > > > + which a build is running. Must be an SPDX_IMPORTS key" > > > > Is there any sort of documentation or external reference for the variable > > above > > (as well as the SPDX_ below) that explains what the SPDX standard is > > expecting > > to be put in there? > > Not specifically for this, but for the SPDX 3.0 spec in general, the > web docs are pretty comprehensive: > https://spdx.github.io/spdx-spec/v3.0/ . Although, be aware the > navigation sidebar is really annoying ATM, but that's supposed to get > fixed soon. > > For a starter on how SPDX documents are written, see: > https://github.com/spdx/spdx-spec/blob/development/v3.0.1/docs/annexes/getting-started.md > > It's a little tricky to encode the SPDX 3 structured data in bitbake > variables; this is what I could come up with so far but if you have > suggestions on improvements, let me know. > > Specifically for this variable, it's referencing a "key" in > SPDX_IMPORTS. SPDX_IMPORTS in turn is encoding entries in the > "imports" property of > https://spdx.github.io/spdx-spec/v3.0/model/Core/Classes/SpdxDocument/ > . The indirection is necessary so that you can tell users where the > SPDX ID lives, since it's external to this document. It's pretty much > impossible for us to validate that the SPDX ID you put in is real, > short of downloading the referenced document, parsing it, and seeing > if the SPDX ID is present. > > For a walk through of how to cross-link SPDX 3 documents, look at: > https://github.com/spdx/spdx-spec/blob/development/v3.0.1/docs/annexes/cross-reference.md > > > > > I.e. machine name, host type, an arbitrary string that means something to > > the > > agent construction the SPDX, etc. I.e. how would I know what is valid in > > these > > various things? > > There are pretty good tools to validate SPDX documents (offline > even!). > https://github.com/spdx/spdx-3-model/blob/main/serialization/json_ld/validation.md > gives an overview on how to do this. It still won't do the validation > that the external SPDX ID is valid for the same reasons as above, but > it's pretty good otherwise. > > > > > This then leads to a second question, deterministic > > behavior/reproducibility. I > > believe the purpose of this is reproducible builds, but we should have a > > more > > deterministic approach in the Yocto Project where we provide (and/or check > > for > > host capabilities) to help allow this to be a more generic, many different > > host > > process. > > I've attempted to make this process as deterministic as SPDX 3 allows. > As an example, SPDX IDs are generated by hashing deterministic data > (where as random SPDX IDs would be decidedly simpler!). However, there > are parts of SPDX 3 that are simply not deterministic for various > reasons (please read to the end of this section). They generally fall > under a few categories: > > The first category is probably best classified as "not very useful if > deterministic" and omitted by default; SPDX_BUILD_HOST would be one > such examples, since if you are going to set this to the same value > all the time and not reference the _actual_ host, you may as well not > include it at all. For these, I've set no default value (there isn't > one that would make sense anyway), so their omission keeps things > deterministic. However users do actually want this, it will > necessarily result in non-deterministic builds unless you always do > your builds on the same host. I think it might be helpful to annotate > in the doc string which variables will introduce such non-determinism, > so I'll do that. > > The second category is "not very useful if deterministic", but are > included in the output by this patch. Examples of this would be build > timestamps and the bitbake parent build tracking (which basically > tracks the invocation of bitbake itself as the "parent" build, so you > can tell which tasks ran in the same invocation). These are useful > pieces of information, and consumers do actually care about these > things, so if push comes to shove we could add a flag to enable them, > but I'm also leery of having too many configuration options for SPDX. > > The last category are the require non-deterministic fields in SPDX 3. > The primary offender here is the SPDX creation info "created" > datestamp: > https://spdx.github.io/spdx-spec/v3.0/model/Core/Classes/CreationInfo/ > . This is a mandatory field that is the timestamp of when the SPDX > data itself was created, and every SPDX object you create links to > this so you can track exactly when each object in a merged document > was created. I did attempt to make the argument to SPDX that it was > mandatory non-determinism, but it is very important to the SPDX > community (for reasons I've not fully understood), and they _really_ > want it to be the actual document creation date, not SOURCE_DATE_EPOCH > or similar, so I really am not sure what to do about that one. I was > more or less told to ignore these fields when calculating if output is > "deterministic", which is a little annoying, but not an argument I > could win.
Heh, of course I just read the spec a little more carefully, and SOURCE_DATE_EPOCH is allowed here, so we can do that and make everything deterministic by default, with options to enable non-determinism. https://spdx.github.io/spdx-spec/v3.0/model/Core/Properties/created/ *facepalm* > > I'm a little stuck between the SPDX side and the Yocto side on the > determinism front. It's easy to say "SPDX must be deterministic", and > "it's awful if non-deterministic" if your looking at it from just the > point of view of you want to spit out some data, but equally, it's > easy to say "determinism doesn't really matter" and "the > non-deterministic data is important information" when you are > consuming the data. This particular patch series errs on the side of > making the data the most useful to the end consumers, in part because > I really want Yocto to generate the most comprehensive and useful > output it can; we are a pretty early adopter of SPDX 3, so being able > to provide the most useful data we can early can means consumers > writing downstream tools can use us as a reference which drastically > improves their compatibility with our output. Yocto has a phenomenal > supply chain story to tell, and I really want to tell it to the > fullest extent that we can in our SPDX data. I can't reconcile that > with "everything must be deterministic" though, so..... ? > > > > > Which leads to a third question, when a build uses sstate-cache, each host > > could > > be different then the build host that actually combines the builds into an > > image > > (SBOM). Is this a concern? > > That is very much on purpose so you can track where your sstate came > from, who built it, etc, but also why we don't set any of these by > default (see above). This is pretty important from a supply chain > tracking perspective, and part of the comprehensive story we can tell > about the supply chain (IMHO anyway). > > Currently, we are only tracking the actual do_create_spdx() task as > the "build" (which isn't clear in the generated SPDX, but I'll make > change to fix that), so you have to be a little bit careful about some > of the conclusions you draw from that. Once we get the base SPDX 3 > support in place, I want to look at having other sstate tasks generate > SPDX fragements when they run which would allow us to trace those > tasks more precisely. I don't really want to solve that now though as > it is going to be quite a bit more complex to solve in a satisfactory > manner. > > > > > -- snip --
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#201207): https://lists.openembedded.org/g/openembedded-core/message/201207 Mute This Topic: https://lists.openembedded.org/mt/106856878/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
