On Mon, Feb 17, 2025 at 12:29 PM Richard Purdie <
[email protected]> wrote:

> FWIW I did try and discuss this in the OE TSC meeting today but it is a
> US holiday. There were only two people who showed up.
>

Also a Canadian Holiday! I was one of the missing attendees, so I'll follow
up here.


>
> On Mon, 2025-02-17 at 17:38 +0100, Stefan Herbrechtsmeier wrote:
> >  Am 17.02.2025 um 13:43 schrieb Richard Purdie:
> >  On Mon, 2025-02-17 at 12:00 +0100, Stefan Herbrechtsmeier wrote:
> > > >  Am 13.02.2025 um 11:43 schrieb Richard Purdie via
> lists.openembedded.org:
> > > > >
> > > > > e) add the ability to add custom hooks in the fetch process to
> > > > > handle
> > > > > the cases of needing to alter the flow for patching the components
> > > > > list
> > > > >
> > > > I fear this isn't easy possible and we have a design problem. Bitbake
> > > > and oe-core assume that a patch doesn't influence the fetch and that
> > > > the SRC_URI contains all fetched sources. Many code use the Fetch
> > > > class or the SRC_URI direct and doesn't expect recursive implicit
> > > > URIs of gitsm. In reality the SRC_URI describe the given URIs only
> > > > and doesn't contain the additional implicit URIs. The task flow
> > > > applies patches after the unpack and prevent patches to the gitsm. In
> > > > reality a patch could influence the implicit URIs and isn't
> > > > independent.
> > > >
> > > > I would recommend to keep the patching outside of bitbake and instead
> > > > handle the implicit URIs inside oe-core. We could write the implicit
> > > > URIs into files in the work directory and migrate the scattered
> > > > direct users of the SRC_URI / Fetch class to a common oe-core
> > > > function. This functions returns the given URIs and optional any
> > > > additional implicit URIs. This allows the usage of the task
> > > > dependencies to ensure that the implicit URIs are resolved before
> > > > use.
> > > >
> > > > The package manager lock file needs a single unpack and patch step
> > > > whereas the gitsm needs a recursive unpack and patch of every
> > > > successive gitsm. This makes it impossible to use additional tasks.
> > > > We could add the resolve of the implicit URIs to the fetch task to
> > > > support recursive resolve (gitsm).
> > > >
> > > > 1. download SRC_URIs
> > > > 2. exit if no URI is marked as unresolved
> > > > 3. unpack unresolved URIs
> > > > 4. apply associated patches
> > > > 5. resolve URIs
> > > > 6. goto 2
> > > >
> > > > This will eliminate the need for the partial implicit URI and gitsm
> > > > support in bitbake and makes the fetcher code simpler. The patching
> > > > could remain in oe-core and patches could be applied to gitsm and
> > > > package manager lock files. Additionally we remove code duplication
> > > > and simplify future changes to the SRC_URI determination.
> > > >
> > > > It will add an indirection for the SRC_URI but the current code
> > > > already shows that this makes problems. Only the archiver class uses
> > > > the expanded_urldata function and all other classes including the
> > > > spdx class ignore the implicit SRC_URIs.
> > >
> > > I'm a bit worried we're getting caught up by the terminology. We don't
> > > necessarily have to "patch" the component list so much as allow
> > > something like a function to hook and adjust things. That then removes
> > > the need for it to be a specific task.
> > >
> >
> >  We can use a common patch without a do_patch task. We could even
> > call the patch code from the do_patch without an extra task. I
> > propose to call the code from the do_patch task in the do_fetch task.
> > It doesn't matter if we use a hook or task. We should minimize the
> > work for the user. The common solution to manipulate a package
> > manager lock file is a patch file and we have the code to handle
> > patch files.
>
> One of the challenges is the "patch" code/module lives in OE-Core. That
> means the fetcher can't call into it.
>
> I appreciate this means you'll argue the fetcher code should move to
> OE, or the vendor code should be in OE, however I am pretty decided
> that is a really bad idea. Whether I can convey why, I don't know but I
> have tried.
>
> I also very strongly want to maintain our separation of do_fetch,
> do_unpack and do_patch.
>

I also agree with this, which is why I've been suggesting the "drop-in"
lock files. That's a fetch and copy and not a patching.  I ended up
doing a custom patch step for some of the kernel meta data and to
this day it is still on my list to change (and I'll be doing that shortly).


> > > I'm also not sure that drawing gitsm into this is a great idea when it
> > > is effectively already working  quite well.
> >
> > To my knowledge it isn't possible to patch a sub module revision. Is
> > it intended that the SBOM doesn't contain the recursive implicit git
> > repositories?
>
> A patch was merged to allow the data about implicit git repos (amongst
> other things) to be available:
>
>
> https://git.yoctoproject.org/poky/commit/bitbake/?id=ef3e46afd910d4b7727d42c4c18b501525c65695
>
> That tracer code was added after a lot of discussion that took up a ton
> of my time, then people moved on and it hasn't seem much
> development/use :(.
>
> So there were mechanisms added to make the data available.
>
> > >  I appreciate the desire to
> > > have neat abstractions applying accross everything equally but I'm not
> > > sure we can achieve that with the level of usability we need and I'd
> > > rather improve usability at the cost of the inclusion of gitsm, which
> > > is the most functional fetcher we currently have in this group.
> >
> > I include gitsm because it is similar to the package manager and have
> > some drawbacks which I solve with my approach. It is the only fetcher
> > that parse a file to determine additional URIs. Its integration into
> > oe-core is sub-optimal.
>
> I believe there was a way added to solve the drawback you mention. The
> same mechanism could be used for other fetchers too yet we're talking
> about much more radical re-designs.
>
> > > I will repeat again that I do strongly feel that a strong API in
> > > bitbake is going to lead to an overall better design that something
> > > bolted onto the fetcher in OE-Core.
>
> The "fetcher" is an API exposed by Bitbake with the intent of
> standardising the fetching of source code. It's aim is to support
> objectives such as allowing resilience through source mirrors and
> support build reproducibility. There is a document in bitbake which
> tries to at least summarise some of that after it was requested it be
> written down.
>

FWIW. This is the mental model that I've always worked with
during my reviews of the series.


> > What is a fetcher? We have fetchers for protocols (wget, git),
>
> These were the original fetchers using the original APIs.
>
> > wrapper around other fetchers (crate, gomod and gomodgit)
>
> These are newer and some work better than others are people have
> attempted to git the newer languages into the older framework.
>
> >  and wrapper around other fetchers which embedded file parser (gitsm,
> > npmsw).
>
> Of these, gitsm is now relatively successfully integrated and
> functional.
>
> >  Thereby the gitsm fetcher fetch a source and parse the embedded
> > file. The npmsw fetcher parse a file from the meta layer. The gitsm
> > fetcher recursive combines the download and unpack function call in
> > mostly every class function.
>
> Sadly there did not seem to be any other way to handle that. I'd
> imagine that some of the other fetchers are going to have to work
> similarly, unfortunately.
>
>
> > The wrappers has the disadvantage that they use uncommon URIs which
> > are useless for the SBOM. We support PyPI without the need for an
> > extra fetcher.
>
> We support pypi in so much as we have a recipe per dependency. That was
> not deemed possible for npm or crates. I would disagree that uncommon
> URIS are useless for SBOMs. They can be problematic but having the data
> there means it is at least listed. There are also ways they can be
> extended (e.g. the intercept code I link to above).
>
> > Is PyPI or crate the desired way to go?
>
> Compared to the complexity in some of the patches proposed, I'd
> actually say yes.
>

I think I'm following this part correctly. When we say "crate the desired
way to go", I read that as "one dependency fetch by the language
specific 'protocol' (crate:// in this example)".  If so, then I also agree.

It became clear very quickly that packaging go dependencies as
separate recipes wasn't going to work, which I assume is the pypi
reference (that works for languages with "stricter" package release
processes).


>
> > Instead of expand the fetcher I would focus the fetcher on the
> > download of provides URIs and keep the file parsing and post unpack
> > processing outside of it.
>
> The challenge is that once you process it, you need to fetch more
> things. Our ecosystem is built on the idea that fetching happens in
> do_fetch. If we change that most mirroring scripts break for example.
> It is like compiling happening during do_configure or do_install - it
> is not what the user expects and that is problematic. I am against
> changing behaviour in ways users do not expect.
>

Which is how we ended up with the create:// lines in the .inc files
and also how I ended up with all my git:// fetches in my bigger
golang recipes. That we could loop, recurse, do whatever we needed
to fully resolve the dependencies, but when the actual fetch and
build of the recipe happened, that has all been detangled and is
a simple iterative processing of the lines.

For me, that separates my debugging into distinct phases. At one
point I'm scratching my head and looking at the language dependency
files, and fixing any issues there. Later on, if there are any issues my
problems are with mirrors, downloading, etc.

I'd also add a last stage of "getting the files in the position that
the language needs for vendoring". I then debug that as a separate
thing.


> > The package manager lock file is a simple file which references to
> > package sources. Thereby a main source could contain multiple package
> > manager lock files. At the moment we have no fetcher which reference
> > an other source as base or combines multiple fetchers.
>
> This is the case for good reason. If the lock file references all the
> sources, perhaps we should just include it alongside the recipe and be
> done with it?
>

I'm definitely ok with this as well, as long as it is complete and is
fully expanded, we don't need the SRC_URIs explicitly expanded in
the recipes. I may still prefer the fully expanded SRC_URIs, but I
also see a route to add that later in my own recipes as needed.


> > Based on my two proposed solutions and the work the last month, I
> > have the following feasible requirements:
> >  * Support patches for the package manager lock file to simply
>      updates, back ports and upsteaming
> >  * Support multiple package manager lock files per main source
> >  * Include the real https or git URI in the SBOM
> >  * Include the dependency name and version in the SBOM
> >  * No manual bitbake command call after a SRC_URI, PV or SRCREV
> > change
>
> I think we may have to relax some of these requirements as we can't do
> everything.
>

Avoiding both some bikeshedding and feature creep! :)

Richard: Which ones would you say don't need to be in something initial ?

To me, I'd say the "multiple package manager files" or the "no manual
bitbake call after"  and "support for patching the lockfile".


> > The following code is mostly used in oe-core and ignore the implicit
> > URIs:
> >
> >     src_uri = (d.getVar('SRC_URI') or "").split()
> >     fetcher = bb.fetch2.Fetch(src_uri, d)
> >     for url in fetcher.urls:
> >
> > The following code is used in the archiver class only and includes
> > the implicit URIs:
> >
> >     src_uri = get_src_uris(d)
> >     fetcher = bb.fetch2.Fetch(src_uri, d)
> >     for ud in fetcher.expanded_urldata():
>
> Ironically, I think this an example of the kind of "strong API" I'm
> referring to, where a standard API is exposed by bitbake. In some ways
> this was already attempted with the tracer API above where the
> expand_urldata call would actually be a call into the tracer API to
> obtain the actual urls used.
>
>
> > My last series enable oe-core to generate implicit URIs and therefore
> > replace the usage of the plain SRC_URI with a function:
> >
> >      fetcher = bb.fetch.Fetch(get_src_uris(d), d)
> >     for url in fetcher.urls:
> >
> >  The SRC_URI contains the given URIs and the get_src_uri function
> > contains the given and implicit URIs. The concept is taken from other
> > classes like the spdx classes which use files to exchange data
> > between tasks.
>
> I believe the get_src_uris() functionality should happen within the
> fetcher module and not outside of it as a bolt on interface.
>
> I do not know where we go from here. You disagree with me on several
> key things such as whether the code should be in bitbake or OE-Core,
> whether gitsm should be changed, whether we need to have some kind of
> checksum to verify the "lockfile" is correct as originally intended in
> the recipe and probably more.
>

As for where something belongs vs another, let's just say I have
no experience to distrust Richard's experience in these areas. I've
only rarely had to delve deep enough to even know some of the
pitfalls. But I do strongly agree that we need to keep the phases
separate with some sort of clear handoff (via files, but not necessarily
files in recipes) between them.

Cheers,

Bruce


> I'm in the position of trying to mediate things (i.e. see the different
> viewpoints and bring people together, trying to find common ground) and
> yet at the same time, express views as someone who as spent a lot of
> years trying to maintain and improve this code.
>
> I have tried to seek help from the people who effectively oversee me as
> a maintainer (i.e. the TSC) with limited success. I'm not sure what to
> do from here.
>
> Regards,
>
> Richard
>
>

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end
- "Use the force Harry" - Gandalf, Star Trek II
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#211638): 
https://lists.openembedded.org/g/openembedded-core/message/211638
Mute This Topic: https://lists.openembedded.org/mt/111229335/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to