FWIW I did try and discuss this in the OE TSC meeting today but it is a
US holiday. There were only two people who showed up.

On Mon, 2025-02-17 at 17:38 +0100, Stefan Herbrechtsmeier wrote:
>  Am 17.02.2025 um 13:43 schrieb Richard Purdie:
>  On Mon, 2025-02-17 at 12:00 +0100, Stefan Herbrechtsmeier wrote:
> > >  Am 13.02.2025 um 11:43 schrieb Richard Purdie via lists.openembedded.org:
> > > > 
> > > > e) add the ability to add custom hooks in the fetch process to
> > > > handle
> > > > the cases of needing to alter the flow for patching the components
> > > > list
> > > >  
> > > I fear this isn't easy possible and we have a design problem. Bitbake
> > > and oe-core assume that a patch doesn't influence the fetch and that
> > > the SRC_URI contains all fetched sources. Many code use the Fetch
> > > class or the SRC_URI direct and doesn't expect recursive implicit
> > > URIs of gitsm. In reality the SRC_URI describe the given URIs only
> > > and doesn't contain the additional implicit URIs. The task flow
> > > applies patches after the unpack and prevent patches to the gitsm. In
> > > reality a patch could influence the implicit URIs and isn't
> > > independent.
> > > 
> > > I would recommend to keep the patching outside of bitbake and instead
> > > handle the implicit URIs inside oe-core. We could write the implicit
> > > URIs into files in the work directory and migrate the scattered
> > > direct users of the SRC_URI / Fetch class to a common oe-core
> > > function. This functions returns the given URIs and optional any
> > > additional implicit URIs. This allows the usage of the task
> > > dependencies to ensure that the implicit URIs are resolved before
> > > use.
> > > 
> > > The package manager lock file needs a single unpack and patch step
> > > whereas the gitsm needs a recursive unpack and patch of every
> > > successive gitsm. This makes it impossible to use additional tasks.
> > > We could add the resolve of the implicit URIs to the fetch task to
> > > support recursive resolve (gitsm).
> > >  
> > > 1. download SRC_URIs
> > > 2. exit if no URI is marked as unresolved
> > > 3. unpack unresolved URIs
> > > 4. apply associated patches
> > > 5. resolve URIs
> > > 6. goto 2 
> > >  
> > > This will eliminate the need for the partial implicit URI and gitsm
> > > support in bitbake and makes the fetcher code simpler. The patching
> > > could remain in oe-core and patches could be applied to gitsm and
> > > package manager lock files. Additionally we remove code duplication
> > > and simplify future changes to the SRC_URI determination.
> > > 
> > > It will add an indirection for the SRC_URI but the current code
> > > already shows that this makes problems. Only the archiver class uses
> > > the expanded_urldata function and all other classes including the
> > > spdx class ignore the implicit SRC_URIs.
> > 
> > I'm a bit worried we're getting caught up by the terminology. We don't
> > necessarily have to "patch" the component list so much as allow
> > something like a function to hook and adjust things. That then removes
> > the need for it to be a specific task.
> > 
>
>  We can use a common patch without a do_patch task. We could even
> call the patch code from the do_patch without an extra task. I
> propose to call the code from the do_patch task in the do_fetch task.
> It doesn't matter if we use a hook or task. We should minimize the
> work for the user. The common solution to manipulate a package
> manager lock file is a patch file and we have the code to handle
> patch files.

One of the challenges is the "patch" code/module lives in OE-Core. That
means the fetcher can't call into it.

I appreciate this means you'll argue the fetcher code should move to
OE, or the vendor code should be in OE, however I am pretty decided
that is a really bad idea. Whether I can convey why, I don't know but I
have tried.

I also very strongly want to maintain our separation of do_fetch,
do_unpack and do_patch.

> > I'm also not sure that drawing gitsm into this is a great idea when it
> > is effectively already working  quite well.
> 
> To my knowledge it isn't possible to patch a sub module revision. Is
> it intended that the SBOM doesn't contain the recursive implicit git
> repositories?

A patch was merged to allow the data about implicit git repos (amongst
other things) to be available:

https://git.yoctoproject.org/poky/commit/bitbake/?id=ef3e46afd910d4b7727d42c4c18b501525c65695

That tracer code was added after a lot of discussion that took up a ton
of my time, then people moved on and it hasn't seem much
development/use :(.

So there were mechanisms added to make the data available.

> >  I appreciate the desire to
> > have neat abstractions applying accross everything equally but I'm not
> > sure we can achieve that with the level of usability we need and I'd
> > rather improve usability at the cost of the inclusion of gitsm, which
> > is the most functional fetcher we currently have in this group.
> 
> I include gitsm because it is similar to the package manager and have
> some drawbacks which I solve with my approach. It is the only fetcher
> that parse a file to determine additional URIs. Its integration into
> oe-core is sub-optimal.

I believe there was a way added to solve the drawback you mention. The
same mechanism could be used for other fetchers too yet we're talking
about much more radical re-designs.

> > I will repeat again that I do strongly feel that a strong API in
> > bitbake is going to lead to an overall better design that something
> > bolted onto the fetcher in OE-Core.

The "fetcher" is an API exposed by Bitbake with the intent of
standardising the fetching of source code. It's aim is to support
objectives such as allowing resilience through source mirrors and
support build reproducibility. There is a document in bitbake which
tries to at least summarise some of that after it was requested it be
written down.

> What is a fetcher? We have fetchers for protocols (wget, git),

These were the original fetchers using the original APIs.

> wrapper around other fetchers (crate, gomod and gomodgit)

These are newer and some work better than others are people have
attempted to git the newer languages into the older framework.

>  and wrapper around other fetchers which embedded file parser (gitsm,
> npmsw).

Of these, gitsm is now relatively successfully integrated and
functional.

>  Thereby the gitsm fetcher fetch a source and parse the embedded
> file. The npmsw fetcher parse a file from the meta layer. The gitsm
> fetcher recursive combines the download and unpack function call in
> mostly every class function.

Sadly there did not seem to be any other way to handle that. I'd
imagine that some of the other fetchers are going to have to work
similarly, unfortunately.


> The wrappers has the disadvantage that they use uncommon URIs which
> are useless for the SBOM. We support PyPI without the need for an
> extra fetcher.

We support pypi in so much as we have a recipe per dependency. That was
not deemed possible for npm or crates. I would disagree that uncommon
URIS are useless for SBOMs. They can be problematic but having the data
there means it is at least listed. There are also ways they can be
extended (e.g. the intercept code I link to above).

> Is PyPI or crate the desired way to go?

Compared to the complexity in some of the patches proposed, I'd
actually say yes.

> Instead of expand the fetcher I would focus the fetcher on the
> download of provides URIs and keep the file parsing and post unpack
> processing outside of it.

The challenge is that once you process it, you need to fetch more
things. Our ecosystem is built on the idea that fetching happens in
do_fetch. If we change that most mirroring scripts break for example.
It is like compiling happening during do_configure or do_install - it
is not what the user expects and that is problematic. I am against
changing behaviour in ways users do not expect.

> The package manager lock file is a simple file which references to
> package sources. Thereby a main source could contain multiple package
> manager lock files. At the moment we have no fetcher which reference
> an other source as base or combines multiple fetchers.

This is the case for good reason. If the lock file references all the
sources, perhaps we should just include it alongside the recipe and be
done with it?

> Based on my two proposed solutions and the work the last month, I
> have the following feasible requirements:
>  * Support patches for the package manager lock file to simply
     updates, back ports and upsteaming
>  * Support multiple package manager lock files per main source
>  * Include the real https or git URI in the SBOM
>  * Include the dependency name and version in the SBOM
>  * No manual bitbake command call after a SRC_URI, PV or SRCREV
> change

I think we may have to relax some of these requirements as we can't do
everything.
 
> The following code is mostly used in oe-core and ignore the implicit
> URIs:
> 
>     src_uri = (d.getVar('SRC_URI') or "").split()
>     fetcher = bb.fetch2.Fetch(src_uri, d)
>     for url in fetcher.urls:
> 
> The following code is used in the archiver class only and includes
> the implicit URIs:
> 
>     src_uri = get_src_uris(d)
>     fetcher = bb.fetch2.Fetch(src_uri, d)
>     for ud in fetcher.expanded_urldata():
 
Ironically, I think this an example of the kind of "strong API" I'm
referring to, where a standard API is exposed by bitbake. In some ways
this was already attempted with the tracer API above where the
expand_urldata call would actually be a call into the tracer API to
obtain the actual urls used.
 
 
> My last series enable oe-core to generate implicit URIs and therefore
> replace the usage of the plain SRC_URI with a function:
> 
>      fetcher = bb.fetch.Fetch(get_src_uris(d), d)
>     for url in fetcher.urls:
>
>  The SRC_URI contains the given URIs and the get_src_uri function
> contains the given and implicit URIs. The concept is taken from other
> classes like the spdx classes which use files to exchange data
> between tasks.

I believe the get_src_uris() functionality should happen within the
fetcher module and not outside of it as a bolt on interface.

I do not know where we go from here. You disagree with me on several
key things such as whether the code should be in bitbake or OE-Core,
whether gitsm should be changed, whether we need to have some kind of
checksum to verify the "lockfile" is correct as originally intended in
the recipe and probably more.

I'm in the position of trying to mediate things (i.e. see the different
viewpoints and bring people together, trying to find common ground) and
yet at the same time, express views as someone who as spent a lot of
years trying to maintain and improve this code.

I have tried to seek help from the people who effectively oversee me as
a maintainer (i.e. the TSC) with limited success. I'm not sure what to
do from here.
 
Regards,

Richard 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#211547): 
https://lists.openembedded.org/g/openembedded-core/message/211547
Mute This Topic: https://lists.openembedded.org/mt/111229335/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to