FWIW I did try and discuss this in the OE TSC meeting today but it is a US holiday. There were only two people who showed up.
On Mon, 2025-02-17 at 17:38 +0100, Stefan Herbrechtsmeier wrote: > Am 17.02.2025 um 13:43 schrieb Richard Purdie: > On Mon, 2025-02-17 at 12:00 +0100, Stefan Herbrechtsmeier wrote: > > > Am 13.02.2025 um 11:43 schrieb Richard Purdie via lists.openembedded.org: > > > > > > > > e) add the ability to add custom hooks in the fetch process to > > > > handle > > > > the cases of needing to alter the flow for patching the components > > > > list > > > > > > > I fear this isn't easy possible and we have a design problem. Bitbake > > > and oe-core assume that a patch doesn't influence the fetch and that > > > the SRC_URI contains all fetched sources. Many code use the Fetch > > > class or the SRC_URI direct and doesn't expect recursive implicit > > > URIs of gitsm. In reality the SRC_URI describe the given URIs only > > > and doesn't contain the additional implicit URIs. The task flow > > > applies patches after the unpack and prevent patches to the gitsm. In > > > reality a patch could influence the implicit URIs and isn't > > > independent. > > > > > > I would recommend to keep the patching outside of bitbake and instead > > > handle the implicit URIs inside oe-core. We could write the implicit > > > URIs into files in the work directory and migrate the scattered > > > direct users of the SRC_URI / Fetch class to a common oe-core > > > function. This functions returns the given URIs and optional any > > > additional implicit URIs. This allows the usage of the task > > > dependencies to ensure that the implicit URIs are resolved before > > > use. > > > > > > The package manager lock file needs a single unpack and patch step > > > whereas the gitsm needs a recursive unpack and patch of every > > > successive gitsm. This makes it impossible to use additional tasks. > > > We could add the resolve of the implicit URIs to the fetch task to > > > support recursive resolve (gitsm). > > > > > > 1. download SRC_URIs > > > 2. exit if no URI is marked as unresolved > > > 3. unpack unresolved URIs > > > 4. apply associated patches > > > 5. resolve URIs > > > 6. goto 2 > > > > > > This will eliminate the need for the partial implicit URI and gitsm > > > support in bitbake and makes the fetcher code simpler. The patching > > > could remain in oe-core and patches could be applied to gitsm and > > > package manager lock files. Additionally we remove code duplication > > > and simplify future changes to the SRC_URI determination. > > > > > > It will add an indirection for the SRC_URI but the current code > > > already shows that this makes problems. Only the archiver class uses > > > the expanded_urldata function and all other classes including the > > > spdx class ignore the implicit SRC_URIs. > > > > I'm a bit worried we're getting caught up by the terminology. We don't > > necessarily have to "patch" the component list so much as allow > > something like a function to hook and adjust things. That then removes > > the need for it to be a specific task. > > > > We can use a common patch without a do_patch task. We could even > call the patch code from the do_patch without an extra task. I > propose to call the code from the do_patch task in the do_fetch task. > It doesn't matter if we use a hook or task. We should minimize the > work for the user. The common solution to manipulate a package > manager lock file is a patch file and we have the code to handle > patch files. One of the challenges is the "patch" code/module lives in OE-Core. That means the fetcher can't call into it. I appreciate this means you'll argue the fetcher code should move to OE, or the vendor code should be in OE, however I am pretty decided that is a really bad idea. Whether I can convey why, I don't know but I have tried. I also very strongly want to maintain our separation of do_fetch, do_unpack and do_patch. > > I'm also not sure that drawing gitsm into this is a great idea when it > > is effectively already working quite well. > > To my knowledge it isn't possible to patch a sub module revision. Is > it intended that the SBOM doesn't contain the recursive implicit git > repositories? A patch was merged to allow the data about implicit git repos (amongst other things) to be available: https://git.yoctoproject.org/poky/commit/bitbake/?id=ef3e46afd910d4b7727d42c4c18b501525c65695 That tracer code was added after a lot of discussion that took up a ton of my time, then people moved on and it hasn't seem much development/use :(. So there were mechanisms added to make the data available. > > I appreciate the desire to > > have neat abstractions applying accross everything equally but I'm not > > sure we can achieve that with the level of usability we need and I'd > > rather improve usability at the cost of the inclusion of gitsm, which > > is the most functional fetcher we currently have in this group. > > I include gitsm because it is similar to the package manager and have > some drawbacks which I solve with my approach. It is the only fetcher > that parse a file to determine additional URIs. Its integration into > oe-core is sub-optimal. I believe there was a way added to solve the drawback you mention. The same mechanism could be used for other fetchers too yet we're talking about much more radical re-designs. > > I will repeat again that I do strongly feel that a strong API in > > bitbake is going to lead to an overall better design that something > > bolted onto the fetcher in OE-Core. The "fetcher" is an API exposed by Bitbake with the intent of standardising the fetching of source code. It's aim is to support objectives such as allowing resilience through source mirrors and support build reproducibility. There is a document in bitbake which tries to at least summarise some of that after it was requested it be written down. > What is a fetcher? We have fetchers for protocols (wget, git), These were the original fetchers using the original APIs. > wrapper around other fetchers (crate, gomod and gomodgit) These are newer and some work better than others are people have attempted to git the newer languages into the older framework. > and wrapper around other fetchers which embedded file parser (gitsm, > npmsw). Of these, gitsm is now relatively successfully integrated and functional. > Thereby the gitsm fetcher fetch a source and parse the embedded > file. The npmsw fetcher parse a file from the meta layer. The gitsm > fetcher recursive combines the download and unpack function call in > mostly every class function. Sadly there did not seem to be any other way to handle that. I'd imagine that some of the other fetchers are going to have to work similarly, unfortunately. > The wrappers has the disadvantage that they use uncommon URIs which > are useless for the SBOM. We support PyPI without the need for an > extra fetcher. We support pypi in so much as we have a recipe per dependency. That was not deemed possible for npm or crates. I would disagree that uncommon URIS are useless for SBOMs. They can be problematic but having the data there means it is at least listed. There are also ways they can be extended (e.g. the intercept code I link to above). > Is PyPI or crate the desired way to go? Compared to the complexity in some of the patches proposed, I'd actually say yes. > Instead of expand the fetcher I would focus the fetcher on the > download of provides URIs and keep the file parsing and post unpack > processing outside of it. The challenge is that once you process it, you need to fetch more things. Our ecosystem is built on the idea that fetching happens in do_fetch. If we change that most mirroring scripts break for example. It is like compiling happening during do_configure or do_install - it is not what the user expects and that is problematic. I am against changing behaviour in ways users do not expect. > The package manager lock file is a simple file which references to > package sources. Thereby a main source could contain multiple package > manager lock files. At the moment we have no fetcher which reference > an other source as base or combines multiple fetchers. This is the case for good reason. If the lock file references all the sources, perhaps we should just include it alongside the recipe and be done with it? > Based on my two proposed solutions and the work the last month, I > have the following feasible requirements: > * Support patches for the package manager lock file to simply updates, back ports and upsteaming > * Support multiple package manager lock files per main source > * Include the real https or git URI in the SBOM > * Include the dependency name and version in the SBOM > * No manual bitbake command call after a SRC_URI, PV or SRCREV > change I think we may have to relax some of these requirements as we can't do everything. > The following code is mostly used in oe-core and ignore the implicit > URIs: > > src_uri = (d.getVar('SRC_URI') or "").split() > fetcher = bb.fetch2.Fetch(src_uri, d) > for url in fetcher.urls: > > The following code is used in the archiver class only and includes > the implicit URIs: > > src_uri = get_src_uris(d) > fetcher = bb.fetch2.Fetch(src_uri, d) > for ud in fetcher.expanded_urldata(): Ironically, I think this an example of the kind of "strong API" I'm referring to, where a standard API is exposed by bitbake. In some ways this was already attempted with the tracer API above where the expand_urldata call would actually be a call into the tracer API to obtain the actual urls used. > My last series enable oe-core to generate implicit URIs and therefore > replace the usage of the plain SRC_URI with a function: > > fetcher = bb.fetch.Fetch(get_src_uris(d), d) > for url in fetcher.urls: > > The SRC_URI contains the given URIs and the get_src_uri function > contains the given and implicit URIs. The concept is taken from other > classes like the spdx classes which use files to exchange data > between tasks. I believe the get_src_uris() functionality should happen within the fetcher module and not outside of it as a bolt on interface. I do not know where we go from here. You disagree with me on several key things such as whether the code should be in bitbake or OE-Core, whether gitsm should be changed, whether we need to have some kind of checksum to verify the "lockfile" is correct as originally intended in the recipe and probably more. I'm in the position of trying to mediate things (i.e. see the different viewpoints and bring people together, trying to find common ground) and yet at the same time, express views as someone who as spent a lot of years trying to maintain and improve this code. I have tried to seek help from the people who effectively oversee me as a maintainer (i.e. the TSC) with limited success. I'm not sure what to do from here. Regards, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#211547): https://lists.openembedded.org/g/openembedded-core/message/211547 Mute This Topic: https://lists.openembedded.org/mt/111229335/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-