Am 17.02.2025 um 13:43 schrieb Richard Purdie:
On Mon, 2025-02-17 at 12:00 +0100, Stefan Herbrechtsmeier wrote:
Am 13.02.2025 um 11:43 schrieb Richard Purdie via lists.openembedded.org:
e) add the ability to add custom hooks in the fetch process to
handle
the cases of needing to alter the flow for patching the components
list
I fear this isn't easy possible and we have a design problem. Bitbake
and oe-core assume that a patch doesn't influence the fetch and that
the SRC_URI contains all fetched sources. Many code use the Fetch
class or the SRC_URI direct and doesn't expect recursive implicit
URIs of gitsm. In reality the SRC_URI describe the given URIs only
and doesn't contain the additional implicit URIs. The task flow
applies patches after the unpack and prevent patches to the gitsm. In
reality a patch could influence the implicit URIs and isn't
independent.
I would recommend to keep the patching outside of bitbake and instead
handle the implicit URIs inside oe-core. We could write the implicit
URIs into files in the work directory and migrate the scattered
direct users of the SRC_URI / Fetch class to a common oe-core
function. This functions returns the given URIs and optional any
additional implicit URIs. This allows the usage of the task
dependencies to ensure that the implicit URIs are resolved before
use.
The package manager lock file needs a single unpack and patch step
whereas the gitsm needs a recursive unpack and patch of every
successive gitsm. This makes it impossible to use additional tasks.
We could add the resolve of the implicit URIs to the fetch task to
support recursive resolve (gitsm).
1. download SRC_URIs
2. exit if no URI is marked as unresolved
3. unpack unresolved URIs
4. apply associated patches
5. resolve URIs
6. goto 2
This will eliminate the need for the partial implicit URI and gitsm
support in bitbake and makes the fetcher code simpler. The patching
could remain in oe-core and patches could be applied to gitsm and
package manager lock files. Additionally we remove code duplication
and simplify future changes to the SRC_URI determination.
It will add an indirection for the SRC_URI but the current code
already shows that this makes problems. Only the archiver class uses
the expanded_urldata function and all other classes including the
spdx class ignore the implicit SRC_URIs.
I'm a bit worried we're getting caught up by the terminology. We don't
necessarily have to "patch" the component list so much as allow
something like a function to hook and adjust things. That then removes
the need for it to be a specific task.
We can use a common patch without a do_patch task. We could even call
the patch code from the do_patch without an extra task. I propose to
call the code from the do_patch task in the do_fetch task. It doesn't
matter if we use a hook or task. We should minimize the work for the
user. The common solution to manipulate a package manager lock file is a
patch file and we have the code to handle patch files.
I'm also not sure that drawing gitsm into this is a great idea when it
is effectively already working quite well.
To my knowledge it isn't possible to patch a sub module revision. Is it
intended that the SBOM doesn't contain the recursive implicit git
repositories?
I appreciate the desire to
have neat abstractions applying accross everything equally but I'm not
sure we can achieve that with the level of usability we need and I'd
rather improve usability at the cost of the inclusion of gitsm, which
is the most functional fetcher we currently have in this group.
I include gitsm because it is similar to the package manager and have
some drawbacks which I solve with my approach. It is the only fetcher
that parse a file to determine additional URIs. Its integration into
oe-core is sub-optimal.
I will repeat again that I do strongly feel that a strong API in
bitbake is going to lead to an overall better design that something
bolted onto the fetcher in OE-Core.
What is a fetcher? We have fetchers for protocols (wget, git), wrapper
around other fetchers (crate, gomod and gomodgit) and wrapper around
other fetchers which embedded file parser (gitsm, npmsw). Thereby the
gitsm fetcher fetch a source and parse the embedded file. The npmsw
fetcher parse a file from the meta layer. The gitsm fetcher recursive
combines the download and unpack function call in mostly every class
function.
The wrappers has the disadvantage that they use uncommon URIs which are
useless for the SBOM. We support PyPI without the need for an extra fetcher.
Is PyPI or crate the desired way to go?
Instead of expand the fetcher I would focus the fetcher on the download
of provides URIs and keep the file parsing and post unpack processing
outside of it.
The package manager lock file is a simple file which references to
package sources. Thereby a main source could contain multiple package
manager lock files. At the moment we have no fetcher which reference an
other source as base or combines multiple fetchers.
Based on my two proposed solutions and the work the last month, I have
the following feasible requirements:
* Support patches for the package manager lock file to simply updates,
back ports and upsteaming
* Support multiple package manager lock files per main source
* Include the real https or git URI in the SBOM
* Include the dependency name and version in the SBOM
* No manual bitbake command call after a SRC_URI, PV or SRCREV change
The following code is mostly used in oe-core and ignore the implicit URIs:
src_uri = (d.getVar('SRC_URI') or "").split()
fetcher = bb.fetch2.Fetch(src_uri, d)
for url in fetcher.urls:
The following code is used in the archiver class only and includes the
implicit URIs:
src_uri = get_src_uris(d)
fetcher = bb.fetch2.Fetch(src_uri, d)
for ud in fetcher.expanded_urldata():
My last series enable oe-core to generate implicit URIs and therefore
replace the usage of the plain SRC_URI with a function:
fetcher = bb.fetch.Fetch(get_src_uris(d), d)
for url in fetcher.urls:
The SRC_URI contains the given URIs and the get_src_uri function
contains the given and implicit URIs. The concept is taken from other
classes like the spdx classes which use files to exchange data between
tasks.
I have no idea how this "strong API" should look like.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#211545):
https://lists.openembedded.org/g/openembedded-core/message/211545
Mute This Topic: https://lists.openembedded.org/mt/111229335/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-