On Thu, Jun 27, 2024 at 10:33 AM Joshua Watt <[email protected]> wrote:
>
> On Tue, Jun 25, 2024 at 12:41 PM Mark Hatle
> <[email protected]> wrote:
> >
> > Comments inline below
> >
> > On 6/24/24 2:10 PM, Joshua Watt wrote:
> -- snip --
> > > +
> > > +SPDX_BUILD_HOST[doc] = "The base variable name to describe the build 
> > > host on \
> > > +    which a build is running. Must be an SPDX_IMPORTS key"
> >
> > Is there any sort of documentation or external reference for the variable 
> > above
> > (as well as the SPDX_ below) that explains what the SPDX standard is 
> > expecting
> > to be put in there?
>
> Not specifically for this, but for the SPDX 3.0 spec in general, the
> web docs are pretty comprehensive:
> https://spdx.github.io/spdx-spec/v3.0/ . Although, be aware the
> navigation sidebar is really annoying ATM, but that's supposed to get
> fixed soon.
>
> For a starter on how SPDX documents are written, see:
> https://github.com/spdx/spdx-spec/blob/development/v3.0.1/docs/annexes/getting-started.md
>
> It's a little tricky to encode the SPDX 3 structured data in bitbake
> variables; this is what I could come up with so far but if you have
> suggestions on improvements, let me know.
>
> Specifically for this variable, it's referencing a "key" in
> SPDX_IMPORTS. SPDX_IMPORTS in turn is encoding entries in the
> "imports" property of
> https://spdx.github.io/spdx-spec/v3.0/model/Core/Classes/SpdxDocument/
> . The indirection is necessary so that you can tell users where the
> SPDX ID lives, since it's external to this document. It's pretty much
> impossible for us to validate that the SPDX ID you put in is real,
> short of downloading the referenced document, parsing it, and seeing
> if the SPDX ID is present.
>
> For a walk through of how to cross-link SPDX 3 documents, look at:
> https://github.com/spdx/spdx-spec/blob/development/v3.0.1/docs/annexes/cross-reference.md
>
> >
> > I.e. machine name, host type, an arbitrary string that means something to 
> > the
> > agent construction the SPDX, etc.  I.e. how would I know what is valid in 
> > these
> > various things?
>
> There are pretty good tools to validate SPDX documents (offline
> even!). 
> https://github.com/spdx/spdx-3-model/blob/main/serialization/json_ld/validation.md
> gives an overview on how to do this. It still won't do the validation
> that the external SPDX ID is valid for the same reasons as above, but
> it's pretty good otherwise.
>
> >
> > This then leads to a second question, deterministic 
> > behavior/reproducibility.  I
> > believe the purpose of this is reproducible builds, but we should have a 
> > more
> > deterministic approach in the Yocto Project where we provide (and/or check 
> > for
> > host capabilities) to help allow this to be a more generic, many different 
> > host
> > process.
>
> I've attempted to make this process as deterministic as SPDX 3 allows.
> As an example, SPDX IDs are generated by hashing deterministic data
> (where as random SPDX IDs would be decidedly simpler!). However, there
> are parts of SPDX 3 that are simply not deterministic for various
> reasons (please read to the end of this section). They generally fall
> under a few categories:
>
> The first category is probably best classified as "not very useful if
> deterministic" and omitted by default; SPDX_BUILD_HOST would be one
> such examples, since if you are going to set this to the same value
> all the time and not reference the _actual_ host, you may as well not
> include it at all. For these, I've set no default value (there isn't
> one that would make sense anyway), so their omission keeps things
> deterministic. However users do actually want this, it will
> necessarily result in non-deterministic builds unless you always do
> your builds on the same host. I think it might be helpful to annotate
> in the doc string which variables will introduce such non-determinism,
> so I'll do that.
>
> The second category is "not very useful if deterministic", but are
> included in the output by this patch. Examples of this would be build
> timestamps and the bitbake parent build tracking (which basically
> tracks the invocation of bitbake itself as the "parent" build, so you
> can tell which tasks ran in the same invocation). These are useful
> pieces of information, and consumers do actually care about these
> things, so if push comes to shove we could add a flag to enable them,
> but I'm also leery of having too many configuration options for SPDX.
>
> The last category are the require non-deterministic fields in SPDX 3.
> The primary offender here is the SPDX creation info "created"
> datestamp: 
> https://spdx.github.io/spdx-spec/v3.0/model/Core/Classes/CreationInfo/
> . This is a mandatory field that is the timestamp of when the SPDX
> data itself was created, and every SPDX object you create links to
> this so you can track exactly when each object in a merged document
> was created. I did attempt to make the argument to SPDX that it was
> mandatory non-determinism, but it is very important to the SPDX
> community (for reasons I've not fully understood), and they _really_
> want it to be the actual document creation date, not SOURCE_DATE_EPOCH
> or similar, so I really am not sure what to do about that one. I was
> more or less told to ignore these fields when calculating if output is
> "deterministic", which is a little annoying, but not an argument I
> could win.

Heh, of course I just read the spec a little more carefully, and
SOURCE_DATE_EPOCH is allowed here, so we can do that and make
everything deterministic by default, with options to enable
non-determinism.
https://spdx.github.io/spdx-spec/v3.0/model/Core/Properties/created/

*facepalm*

>
> I'm a little stuck between the SPDX side and the Yocto side on the
> determinism front. It's easy to say "SPDX must be deterministic", and
> "it's awful if non-deterministic" if your looking at it from just the
> point of view of you want to spit out some data, but equally, it's
> easy to say "determinism doesn't really matter" and "the
> non-deterministic data is important information" when you are
> consuming the data. This particular patch series errs on the side of
> making the data the most useful to the end consumers, in part because
> I really want Yocto to generate the most comprehensive and useful
> output it can; we are a pretty early adopter of SPDX 3, so being able
> to provide the most useful data we can early can means consumers
> writing downstream tools can use us as a reference which drastically
> improves their compatibility with our output. Yocto has a phenomenal
> supply chain story to tell, and I really want to tell it to the
> fullest extent that we can in our SPDX data. I can't reconcile that
> with "everything must be deterministic" though, so..... ?
>
> >
> > Which leads to a third question, when a build uses sstate-cache, each host 
> > could
> > be different then the build host that actually combines the builds into an 
> > image
> > (SBOM).  Is this a concern?
>
> That is very much on purpose so you can track where your sstate came
> from, who built it, etc, but also why we don't set any of these by
> default (see above). This is pretty important from a supply chain
> tracking perspective, and part of the comprehensive story we can tell
> about the supply chain (IMHO anyway).
>
> Currently, we are only tracking the actual do_create_spdx() task as
> the "build" (which isn't clear in the generated SPDX, but I'll make
> change to fix that), so you have to be a little bit careful about some
> of the conclusions you draw from that. Once we get the base SPDX 3
> support in place, I want to look at having other sstate tasks generate
> SPDX fragements when they run which would allow us to trace those
> tasks more precisely. I don't really want to solve that now though as
> it is going to be quite a bit more complex to solve in a satisfactory
> manner.
>
>
> >
> -- snip --
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#201207): 
https://lists.openembedded.org/g/openembedded-core/message/201207
Mute This Topic: https://lists.openembedded.org/mt/106856878/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to