Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
Il 2022-12-29 16:06 Richard Purdie ha scritto: On Thu, 2022-12-29 at 08:50 -0600, Joshua Watt wrote: On Thu, Dec 29, 2022 at 7:56 AM Richard Purdie wrote: > > I was asked about a WORKDIR/fetcher interaction problem and the bugs it > results in. I've tried to write down my thoughts. > > The unpack task writes it's output to WORKDIR as base.bbclass says: > > fetcher = bb.fetch2.Fetch(src_uri, d) > fetcher.unpack(d.getVar('WORKDIR') > > We historically dealt with tarballs which usually have a NAME-VERSION > directory within them, so when you extract them, they go into a sub > directory which tar creates. We usually call that subdirectory "S". > > When we wrote the git fetcher, we emulated this by using a "git" > directory to extract into rather than WORKDIR. > > For local files, there is no sub directory so they go into WORKDIR. > This includes patches, which do_patch looks for in WORKDIR and applies > them from there. > > What issues does this cause? If you have an existing WORKDIR and run a > build with: > > SRC_URI = "file://a file://b" > > then change it to: > > SRC_URI = "file://a" > > and rebuild the recipe, the fetch and unpack tasks will rerun and their > hashes will change but the file "b" is still in WORKDIR. Nothing in the > codebase knows that it should delete "b" from there. If you have code > which does "if exists(b)", which is common, it will break. > > There are variations on this, such as a conditional append on some > override to SRC_URI but the fundamental problem is one of cleanup when > unpack is to rerun. > > The naive approach is then to think "lets just delete WORKDIR" when > running do_unpack. There is the small problem of WORKDIR/temp with logs > in. There is also the pseudo database and other things tasks could have > done. Basically, whilst tempting, it doesn't work out well in practise > particularly as that whilst unpack might rerun, not all other tasks > might. > > I did also try a couple of other ideas. We could fetch into a > subdirectory, then either copy or symlink into place depending on which > set of performance/usabiity challenges you want to deal with. You could > involve a manifest of the files and then move into position so later > you'd know which ones to delete. > > Part of the problem is that in some cases recipes do: > > S = "${WORKDIR}" > > for simplicity. This means that you also can't wipe out S as it might > point at WORKDIR. > > SPDX users have requested a json file of file and checksums after the > unpack and before do_patch. Such a manifest could also be useful for > attempting cleanup of an existing WORKDIR so I suspect the solution > probably lies in that direction, probably unpacking into a subdir, > indexing it, then moving into position. By "moving it into position" do you mean moving the files from the clean subdirectory to the locations they would occupy today? If so I don't understand why that's strictly necessary. It seems like almost all of the complexity of this will be to support a use-case we don't really like anyway (S = "${WORKDIR}"). Manifests are great and all, but it causes a lot of problems if they get out of sync and I suspect that would happen more often than we would like, e.g. with devtool, make config, manual editing, etc. If we can keep it simple and not rely on external state (e.g. a manifest) I think it will be a lot easier to maintain in the long run. Dropping S = "${WORKDIR}" doesn't solve the problem being described here, it just removes something which complicates current code and makes that problem harder to solve. Even not supporting S = "${WORKDIR}", do_unpack still unpacks to WORKDIR with the S directory created by the tarball. Cheers, Richard Hi Richard, I'm not sure I'm one of the SPDX guys you intended to refer to :) but I definitely support the idea of having a separate manifest with file relpaths and checksums (and also download loacation) for each item in SRC_URI before do_patch (and before putting everything together into S). That would be a game changer as to software composition analysis and IP compliance, and it would help also the creation of a SPDX license calculation tool I was asked to contribute[^1]: such manifest would allow to analyze real upstream sources and match them with license metadata coming from existing online resources[^2]. I understand Joshua's concerns about using such manifest to handle source cleanup in case of SRC_URI modifications, and I don't have an answer for that (that is not my field). By the way, IMHO the requirement by SPDX users would be a strong enough motivation to implement such manifest generation. I would be glad to contribute, if you decide to do it Cheers, Alberto [^1]: https://bugzilla.yoctoproject.org/show_bug.cgi?id=4517#c2 [^2]: Apart from the well-known ones (ClearlyDefined, Software Heritage, Debian, etc.) there's an interesting new project by osadl.org, which may become a good source of trusted license metadata
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
> -Original Message- > From: openembedded-architecture@lists.openembedded.org > On Behalf Of Richard Purdie > Sent: den 31 december 2022 18:03 > To: Mark Hatle ; Joshua Watt > > Cc: openembedded-architecture > ; Trevor Woerner > > Subject: Re: [Openembedded-architecture] WORKDIR fetcher interaction issue > > On Sat, 2022-12-31 at 10:47 -0600, Mark Hatle wrote: > > > > On 12/29/22 9:06 AM, Richard Purdie wrote: > > > On Thu, 2022-12-29 at 08:50 -0600, Joshua Watt wrote: > > > > On Thu, Dec 29, 2022 at 7:56 AM Richard Purdie > > > > wrote: > > > > > > > > > > I was asked about a WORKDIR/fetcher interaction problem and the bugs > > > > > it > > > > > results in. I've tried to write down my thoughts. > > > > > > > > > > The unpack task writes it's output to WORKDIR as base.bbclass says: > > > > > > > > > > fetcher = bb.fetch2.Fetch(src_uri, d) > > > > > fetcher.unpack(d.getVar('WORKDIR') > > > > > > > > > > We historically dealt with tarballs which usually have a NAME-VERSION > > > > > directory within them, so when you extract them, they go into a sub > > > > > directory which tar creates. We usually call that subdirectory "S". > > > > > > > > > > When we wrote the git fetcher, we emulated this by using a "git" > > > > > directory to extract into rather than WORKDIR. > > > > > > > > > > For local files, there is no sub directory so they go into WORKDIR. > > > > > This includes patches, which do_patch looks for in WORKDIR and applies > > > > > them from there. > > > > > > > > > > What issues does this cause? If you have an existing WORKDIR and run a > > > > > build with: > > > > > > > > > > SRC_URI = "file://a file://b" > > > > > > > > > > then change it to: > > > > > > > > > > SRC_URI = "file://a" > > > > > > > > > > and rebuild the recipe, the fetch and unpack tasks will rerun and > > > > > their > > > > > hashes will change but the file "b" is still in WORKDIR. Nothing in > > > > > the > > > > > codebase knows that it should delete "b" from there. If you have code > > > > > which does "if exists(b)", which is common, it will break. > > > > > > > > > > There are variations on this, such as a conditional append on some > > > > > override to SRC_URI but the fundamental problem is one of cleanup when > > > > > unpack is to rerun. > > > > > > > > > > The naive approach is then to think "lets just delete WORKDIR" when > > > > > running do_unpack. There is the small problem of WORKDIR/temp with > > > > > logs > > > > > in. There is also the pseudo database and other things tasks could > > > > > have > > > > > done. Basically, whilst tempting, it doesn't work out well in practise > > > > > particularly as that whilst unpack might rerun, not all other tasks > > > > > might. > > > > > > > > > > I did also try a couple of other ideas. We could fetch into a > > > > > subdirectory, then either copy or symlink into place depending on > > > > > which > > > > > set of performance/usabiity challenges you want to deal with. You > > > > > could > > > > > involve a manifest of the files and then move into position so later > > > > > you'd know which ones to delete. > > > > > > > > > > Part of the problem is that in some cases recipes do: > > > > > > > > > > S = "${WORKDIR}" > > > > > > > > > > for simplicity. This means that you also can't wipe out S as it might > > > > > point at WORKDIR. > > > > > > > > > > SPDX users have requested a json file of file and checksums after the > > > > > unpack and before do_patch. Such a manifest could also be useful for > > > > > attempting cleanup of an existing WORKDIR so I suspect the solution > > > > > probably lies in that direction, probably unpacking into a subdir, > > > > > indexing it, then moving into position. > > > > >
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On Sat, 2022-12-31 at 10:47 -0600, Mark Hatle wrote: > > On 12/29/22 9:06 AM, Richard Purdie wrote: > > On Thu, 2022-12-29 at 08:50 -0600, Joshua Watt wrote: > > > On Thu, Dec 29, 2022 at 7:56 AM Richard Purdie > > > wrote: > > > > > > > > I was asked about a WORKDIR/fetcher interaction problem and the bugs it > > > > results in. I've tried to write down my thoughts. > > > > > > > > The unpack task writes it's output to WORKDIR as base.bbclass says: > > > > > > > > fetcher = bb.fetch2.Fetch(src_uri, d) > > > > fetcher.unpack(d.getVar('WORKDIR') > > > > > > > > We historically dealt with tarballs which usually have a NAME-VERSION > > > > directory within them, so when you extract them, they go into a sub > > > > directory which tar creates. We usually call that subdirectory "S". > > > > > > > > When we wrote the git fetcher, we emulated this by using a "git" > > > > directory to extract into rather than WORKDIR. > > > > > > > > For local files, there is no sub directory so they go into WORKDIR. > > > > This includes patches, which do_patch looks for in WORKDIR and applies > > > > them from there. > > > > > > > > What issues does this cause? If you have an existing WORKDIR and run a > > > > build with: > > > > > > > > SRC_URI = "file://a file://b" > > > > > > > > then change it to: > > > > > > > > SRC_URI = "file://a" > > > > > > > > and rebuild the recipe, the fetch and unpack tasks will rerun and their > > > > hashes will change but the file "b" is still in WORKDIR. Nothing in the > > > > codebase knows that it should delete "b" from there. If you have code > > > > which does "if exists(b)", which is common, it will break. > > > > > > > > There are variations on this, such as a conditional append on some > > > > override to SRC_URI but the fundamental problem is one of cleanup when > > > > unpack is to rerun. > > > > > > > > The naive approach is then to think "lets just delete WORKDIR" when > > > > running do_unpack. There is the small problem of WORKDIR/temp with logs > > > > in. There is also the pseudo database and other things tasks could have > > > > done. Basically, whilst tempting, it doesn't work out well in practise > > > > particularly as that whilst unpack might rerun, not all other tasks > > > > might. > > > > > > > > I did also try a couple of other ideas. We could fetch into a > > > > subdirectory, then either copy or symlink into place depending on which > > > > set of performance/usabiity challenges you want to deal with. You could > > > > involve a manifest of the files and then move into position so later > > > > you'd know which ones to delete. > > > > > > > > Part of the problem is that in some cases recipes do: > > > > > > > > S = "${WORKDIR}" > > > > > > > > for simplicity. This means that you also can't wipe out S as it might > > > > point at WORKDIR. > > > > > > > > SPDX users have requested a json file of file and checksums after the > > > > unpack and before do_patch. Such a manifest could also be useful for > > > > attempting cleanup of an existing WORKDIR so I suspect the solution > > > > probably lies in that direction, probably unpacking into a subdir, > > > > indexing it, then moving into position. > > > > > > By "moving it into position" do you mean moving the files from the > > > clean subdirectory to the locations they would occupy today? > > > > > > If so I don't understand why that's strictly necessary. It seems > > > like almost all of the complexity of this will be to support a > > > use-case we don't really like anyway (S = "${WORKDIR}"). Manifests are > > > great and all, but it causes a lot of problems if they get out of sync > > > and I suspect that would happen more often than we would like, e.g. > > > with devtool, make config, manual editing, etc. If we can keep it > > > simple and not rely on external state (e.g. a manifest) I think it > > > will be a lot easier to maintain in the long run. > > > > Dropping S = "${WORKDIR}" doesn't solve the problem being described > > here, it just removes something which complicates current code and > > makes that problem harder to solve. Even not supporting S = > > "${WORKDIR}", do_unpack still unpacks to WORKDIR with the S directory > > created by the tarball. > > In this particular piece, it's always bugged me that I don't have control > over > the place it unpacks (whatever it is), where it patches and the S directory. > (These are NOT the same thing in some cases of cases, but we ending up having > to > "make them the same".) > > For instance, I've got software that is going to download (currently) into: > > WORKDIR/embeddedsw/ > apps/ >app1/ > variant1 > variant2 >app2/ > variant1 > variant2 > > (I don't have ownership over the structure, so I have to live with it...) > > Each app & variant is a separate recipe. So we end up having to play games > with > the S and patchlevel and other thing so that I can
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On 12/29/22 9:06 AM, Richard Purdie wrote: On Thu, 2022-12-29 at 08:50 -0600, Joshua Watt wrote: On Thu, Dec 29, 2022 at 7:56 AM Richard Purdie wrote: I was asked about a WORKDIR/fetcher interaction problem and the bugs it results in. I've tried to write down my thoughts. The unpack task writes it's output to WORKDIR as base.bbclass says: fetcher = bb.fetch2.Fetch(src_uri, d) fetcher.unpack(d.getVar('WORKDIR') We historically dealt with tarballs which usually have a NAME-VERSION directory within them, so when you extract them, they go into a sub directory which tar creates. We usually call that subdirectory "S". When we wrote the git fetcher, we emulated this by using a "git" directory to extract into rather than WORKDIR. For local files, there is no sub directory so they go into WORKDIR. This includes patches, which do_patch looks for in WORKDIR and applies them from there. What issues does this cause? If you have an existing WORKDIR and run a build with: SRC_URI = "file://a file://b" then change it to: SRC_URI = "file://a" and rebuild the recipe, the fetch and unpack tasks will rerun and their hashes will change but the file "b" is still in WORKDIR. Nothing in the codebase knows that it should delete "b" from there. If you have code which does "if exists(b)", which is common, it will break. There are variations on this, such as a conditional append on some override to SRC_URI but the fundamental problem is one of cleanup when unpack is to rerun. The naive approach is then to think "lets just delete WORKDIR" when running do_unpack. There is the small problem of WORKDIR/temp with logs in. There is also the pseudo database and other things tasks could have done. Basically, whilst tempting, it doesn't work out well in practise particularly as that whilst unpack might rerun, not all other tasks might. I did also try a couple of other ideas. We could fetch into a subdirectory, then either copy or symlink into place depending on which set of performance/usabiity challenges you want to deal with. You could involve a manifest of the files and then move into position so later you'd know which ones to delete. Part of the problem is that in some cases recipes do: S = "${WORKDIR}" for simplicity. This means that you also can't wipe out S as it might point at WORKDIR. SPDX users have requested a json file of file and checksums after the unpack and before do_patch. Such a manifest could also be useful for attempting cleanup of an existing WORKDIR so I suspect the solution probably lies in that direction, probably unpacking into a subdir, indexing it, then moving into position. By "moving it into position" do you mean moving the files from the clean subdirectory to the locations they would occupy today? If so I don't understand why that's strictly necessary. It seems like almost all of the complexity of this will be to support a use-case we don't really like anyway (S = "${WORKDIR}"). Manifests are great and all, but it causes a lot of problems if they get out of sync and I suspect that would happen more often than we would like, e.g. with devtool, make config, manual editing, etc. If we can keep it simple and not rely on external state (e.g. a manifest) I think it will be a lot easier to maintain in the long run. Dropping S = "${WORKDIR}" doesn't solve the problem being described here, it just removes something which complicates current code and makes that problem harder to solve. Even not supporting S = "${WORKDIR}", do_unpack still unpacks to WORKDIR with the S directory created by the tarball. In this particular piece, it's always bugged me that I don't have control over the place it unpacks (whatever it is), where it patches and the S directory. (These are NOT the same thing in some cases of cases, but we ending up having to "make them the same".) For instance, I've got software that is going to download (currently) into: WORKDIR/embeddedsw/ apps/ app1/ variant1 variant2 app2/ variant1 variant2 (I don't have ownership over the structure, so I have to live with it...) Each app & variant is a separate recipe. So we end up having to play games with the S and patchlevel and other thing so that I can have 2 recipes (app1 and app2) that will build the correct variant for their machine. If I could say: unpack to $WORKDIR patch in $WORKDIR/embeddedsw/apps source in $WORKDIR/embeddedsw/apps/app1/variant1 it would make it easier in this extreme case. I would guess in most other cases, the complexity could be hidden by defaults to preserve existing behavior. In some ways, what we're really trying to do is define for a given task what directory it should be working within. So maybe that is a better way of thinking of this. (adding that configure,compile,install would operate within B.) Anyway, just a few thoughts from reading through this. --Mark Cheers, Richard -=-=-=-=-=-=-=-=-=-=-=-
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On Thu, Dec 29, 2022 at 5:28 PM Trevor Woerner wrote: > On Thu 2022-12-29 @ 03:51:08 PM, Martin Jansa wrote: > > On Thu, Dec 29, 2022 at 3:38 PM Trevor Woerner > wrote: > > > > > On Thu 2022-12-29 @ 01:56:51 PM, Richard Purdie wrote: > > > > There are variations on this, such as a conditional append on some > > > > override to SRC_URI but the fundamental problem is one of cleanup > when > > > > unpack is to rerun. > > > > > > ...just to elaborate a bit more on this variation for everyone's > benefit > > > (Richard already understands the details of my scenario): > > > > > > Some recipes require us to generate config files by hand in order to > get a > > > piece of software/service to work a correctly in our environment. A > > > concrete > > > example could be specifying the IP address of a time server to use for > > > clock > > > synchronization in chrony's /etc/chrony.conf file. Another example > could > > > be to > > > provide a /etc/network/interfaces file so networking works on a given > > > device > > > in our specific network. > > > > > > In my case I might want to build the same image, for the same device, > but > > > use > > > two different sets of config files. If the device is going to run on my > > > non-routable network then it will use CONDITION1 config files. If I > want to > > > build a set of images for devices running on my routable network then > I'll > > > need to use the CONDITION2 set of config files: > > > > > > meta-project > > > ├── README > > > ├── conf > > > │ └── layer.conf > > > └── recipes-configfiles > > > ├── chrony > > > │ ├── chrony_%.bbappend > > > │ └── files > > > │ ├── condition1 > > > │ │ └── chrony.conf > > > │ └── condition2 > > > │ └── chrony.conf > > > └── init-ifupdown > > > ├── files > > > │ ├── condition1 > > > │ │ └── interfaces > > > │ └── condition2 > > > │ └── interfaces > > > └── init-ifupdown_%.bbappend > > > > > > Then, somewhere, I either specify: > > > > > > MACHINEOVERRIDES .= ":condition1" > > > > > > or: > > > > > > MACHINEOVERRIDES .= ":condition2" > > > > > > NOTE: using "OVERRIDES .= ":conditionX" doesn't work, it has to be a > > > MACHINEOVERRIDES since not all overrides are evaluated for the > > > fetcher > > > in order to save parsing time (is that correct?) > > > > > > If I do a: > > > > > > $ bitbake -c cleansstate > > > > > > (perhaps "-c clean" would be enough?) then perform a build, I always > get > > > the > > > correct set of config files in my image. If I don't do a clenastate > between > > > builds in which I change the override, then I simply get the last > config > > > file > > > that's in the WORKDIR. > > > > > > This example is a bit surprising to me. > > > > I understand the case mentioned by Richard that files aren't removed from > > WORKDIR when they are no longer in SRC_URI (happens to me all the time > when > > e.g. renaming a .patch file and then seeing both old and new .patch file > in > > WORKDIR). > > > > But why doesn't fetcher overwrite your chrony.conf and interfaces file > > after MACHINEOVERRIDES is changed? > > I spent a fair amount of time yesterday proving to myself that it wasn't > changing the config file by simply changing the MACHINEOVERRIDES. But it > wouldn't be the first time I was certain something was working a certain > way, > then later couldn't reproduce it. > > > And are you really changing MACHINEOVERRIDES while MACHINE stays the > same? > > I would expect 2 MACHINEs each with own set of MACHINEOVERRIDES and > recipes > > like this being MACHINE_ARCH not TUNE_PKGARCH and then each will have own > > WORKDIR with own set of files. > > If there's a better way, I'd be quite interested in learning it. I'm pretty > sure MACHINEOVERRIDES wasn't designed for this, and probably isn't the > right > way to go about it, but it's a tool that I have and a tool that, in theory, > should do what I want (?) > I was always using MACHINEOVERRIDES to have common override for multiple different MACHINEs (like SOC_FAMILY, MACHINE_CLASS, MACHINE_VARIANT variables various projects use). I guess changing MACHINE_FEATURES or something like that would be slightly better fit for the usecase, but I understand that you wanted to take advantage of MACHINEOVERRIDES being included in: meta/conf/bitbake.conf:FILESOVERRIDES = "${TRANSLATED_TARGET_ARCH}:${MACHINEOVERRIDES}:${DISTROOVERRIDES}" > it has to be a MACHINEOVERRIDES since not all overrides are evaluated for the fetcher > in order to save parsing time (is that correct?) It's partially correct, it's the FILESOVERRIDES which is used to define FILESPATH which is already quite long and all possible locations are tried until the first existing file is found
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On Thu 2022-12-29 @ 03:51:08 PM, Martin Jansa wrote: > On Thu, Dec 29, 2022 at 3:38 PM Trevor Woerner wrote: > > > On Thu 2022-12-29 @ 01:56:51 PM, Richard Purdie wrote: > > > There are variations on this, such as a conditional append on some > > > override to SRC_URI but the fundamental problem is one of cleanup when > > > unpack is to rerun. > > > > ...just to elaborate a bit more on this variation for everyone's benefit > > (Richard already understands the details of my scenario): > > > > Some recipes require us to generate config files by hand in order to get a > > piece of software/service to work a correctly in our environment. A > > concrete > > example could be specifying the IP address of a time server to use for > > clock > > synchronization in chrony's /etc/chrony.conf file. Another example could > > be to > > provide a /etc/network/interfaces file so networking works on a given > > device > > in our specific network. > > > > In my case I might want to build the same image, for the same device, but > > use > > two different sets of config files. If the device is going to run on my > > non-routable network then it will use CONDITION1 config files. If I want to > > build a set of images for devices running on my routable network then I'll > > need to use the CONDITION2 set of config files: > > > > meta-project > > ├── README > > ├── conf > > │ └── layer.conf > > └── recipes-configfiles > > ├── chrony > > │ ├── chrony_%.bbappend > > │ └── files > > │ ├── condition1 > > │ │ └── chrony.conf > > │ └── condition2 > > │ └── chrony.conf > > └── init-ifupdown > > ├── files > > │ ├── condition1 > > │ │ └── interfaces > > │ └── condition2 > > │ └── interfaces > > └── init-ifupdown_%.bbappend > > > > Then, somewhere, I either specify: > > > > MACHINEOVERRIDES .= ":condition1" > > > > or: > > > > MACHINEOVERRIDES .= ":condition2" > > > > NOTE: using "OVERRIDES .= ":conditionX" doesn't work, it has to be a > > MACHINEOVERRIDES since not all overrides are evaluated for the > > fetcher > > in order to save parsing time (is that correct?) > > > > If I do a: > > > > $ bitbake -c cleansstate > > > > (perhaps "-c clean" would be enough?) then perform a build, I always get > > the > > correct set of config files in my image. If I don't do a clenastate between > > builds in which I change the override, then I simply get the last config > > file > > that's in the WORKDIR. > > > This example is a bit surprising to me. > > I understand the case mentioned by Richard that files aren't removed from > WORKDIR when they are no longer in SRC_URI (happens to me all the time when > e.g. renaming a .patch file and then seeing both old and new .patch file in > WORKDIR). > > But why doesn't fetcher overwrite your chrony.conf and interfaces file > after MACHINEOVERRIDES is changed? I spent a fair amount of time yesterday proving to myself that it wasn't changing the config file by simply changing the MACHINEOVERRIDES. But it wouldn't be the first time I was certain something was working a certain way, then later couldn't reproduce it. > And are you really changing MACHINEOVERRIDES while MACHINE stays the same? > I would expect 2 MACHINEs each with own set of MACHINEOVERRIDES and recipes > like this being MACHINE_ARCH not TUNE_PKGARCH and then each will have own > WORKDIR with own set of files. If there's a better way, I'd be quite interested in learning it. I'm pretty sure MACHINEOVERRIDES wasn't designed for this, and probably isn't the right way to go about it, but it's a tool that I have and a tool that, in theory, should do what I want (?) There are a couple things I'm doing, and maybe I'm not doing them the right way. First off, the decision as to which set of config files to use should be done by the user at build time. As such I'm tweaking MACHINEOVERRIDES in conf/local.conf. Maybe that's too late in the parsing process? Second, I'm checking the contents of the config file by looking in chrony's packages-split area. Maybe that's the wrong place? Would I have to create multiple machine.conf files? If so, that's not really the correct semantics for this use-case either. Creating multiple binary packages that are just dropped in at the end could work too, but would also be cumbersome (assuming the set of config files would have to be tarballed up). -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1695): https://lists.openembedded.org/g/openembedded-architecture/message/1695 Mute This Topic: https://lists.openembedded.org/mt/95936561/21656 Group Owner: openembedded-architecture+ow...@lists.openembedded.org Unsubscribe:
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On Thu, 2022-12-29 at 08:50 -0600, Joshua Watt wrote: > On Thu, Dec 29, 2022 at 7:56 AM Richard Purdie > wrote: > > > > I was asked about a WORKDIR/fetcher interaction problem and the bugs it > > results in. I've tried to write down my thoughts. > > > > The unpack task writes it's output to WORKDIR as base.bbclass says: > > > > fetcher = bb.fetch2.Fetch(src_uri, d) > > fetcher.unpack(d.getVar('WORKDIR') > > > > We historically dealt with tarballs which usually have a NAME-VERSION > > directory within them, so when you extract them, they go into a sub > > directory which tar creates. We usually call that subdirectory "S". > > > > When we wrote the git fetcher, we emulated this by using a "git" > > directory to extract into rather than WORKDIR. > > > > For local files, there is no sub directory so they go into WORKDIR. > > This includes patches, which do_patch looks for in WORKDIR and applies > > them from there. > > > > What issues does this cause? If you have an existing WORKDIR and run a > > build with: > > > > SRC_URI = "file://a file://b" > > > > then change it to: > > > > SRC_URI = "file://a" > > > > and rebuild the recipe, the fetch and unpack tasks will rerun and their > > hashes will change but the file "b" is still in WORKDIR. Nothing in the > > codebase knows that it should delete "b" from there. If you have code > > which does "if exists(b)", which is common, it will break. > > > > There are variations on this, such as a conditional append on some > > override to SRC_URI but the fundamental problem is one of cleanup when > > unpack is to rerun. > > > > The naive approach is then to think "lets just delete WORKDIR" when > > running do_unpack. There is the small problem of WORKDIR/temp with logs > > in. There is also the pseudo database and other things tasks could have > > done. Basically, whilst tempting, it doesn't work out well in practise > > particularly as that whilst unpack might rerun, not all other tasks > > might. > > > > I did also try a couple of other ideas. We could fetch into a > > subdirectory, then either copy or symlink into place depending on which > > set of performance/usabiity challenges you want to deal with. You could > > involve a manifest of the files and then move into position so later > > you'd know which ones to delete. > > > > Part of the problem is that in some cases recipes do: > > > > S = "${WORKDIR}" > > > > for simplicity. This means that you also can't wipe out S as it might > > point at WORKDIR. > > > > SPDX users have requested a json file of file and checksums after the > > unpack and before do_patch. Such a manifest could also be useful for > > attempting cleanup of an existing WORKDIR so I suspect the solution > > probably lies in that direction, probably unpacking into a subdir, > > indexing it, then moving into position. > > By "moving it into position" do you mean moving the files from the > clean subdirectory to the locations they would occupy today? > > If so I don't understand why that's strictly necessary. It seems > like almost all of the complexity of this will be to support a > use-case we don't really like anyway (S = "${WORKDIR}"). Manifests are > great and all, but it causes a lot of problems if they get out of sync > and I suspect that would happen more often than we would like, e.g. > with devtool, make config, manual editing, etc. If we can keep it > simple and not rely on external state (e.g. a manifest) I think it > will be a lot easier to maintain in the long run. Dropping S = "${WORKDIR}" doesn't solve the problem being described here, it just removes something which complicates current code and makes that problem harder to solve. Even not supporting S = "${WORKDIR}", do_unpack still unpacks to WORKDIR with the S directory created by the tarball. Cheers, Richard -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1694): https://lists.openembedded.org/g/openembedded-architecture/message/1694 Mute This Topic: https://lists.openembedded.org/mt/95936561/21656 Group Owner: openembedded-architecture+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On Thu, Dec 29, 2022 at 3:38 PM Trevor Woerner wrote: > On Thu 2022-12-29 @ 01:56:51 PM, Richard Purdie wrote: > > There are variations on this, such as a conditional append on some > > override to SRC_URI but the fundamental problem is one of cleanup when > > unpack is to rerun. > > ...just to elaborate a bit more on this variation for everyone's benefit > (Richard already understands the details of my scenario): > > Some recipes require us to generate config files by hand in order to get a > piece of software/service to work a correctly in our environment. A > concrete > example could be specifying the IP address of a time server to use for > clock > synchronization in chrony's /etc/chrony.conf file. Another example could > be to > provide a /etc/network/interfaces file so networking works on a given > device > in our specific network. > > In my case I might want to build the same image, for the same device, but > use > two different sets of config files. If the device is going to run on my > non-routable network then it will use CONDITION1 config files. If I want to > build a set of images for devices running on my routable network then I'll > need to use the CONDITION2 set of config files: > > meta-project > ├── README > ├── conf > │ └── layer.conf > └── recipes-configfiles > ├── chrony > │ ├── chrony_%.bbappend > │ └── files > │ ├── condition1 > │ │ └── chrony.conf > │ └── condition2 > │ └── chrony.conf > └── init-ifupdown > ├── files > │ ├── condition1 > │ │ └── interfaces > │ └── condition2 > │ └── interfaces > └── init-ifupdown_%.bbappend > > Then, somewhere, I either specify: > > MACHINEOVERRIDES .= ":condition1" > > or: > > MACHINEOVERRIDES .= ":condition2" > > NOTE: using "OVERRIDES .= ":conditionX" doesn't work, it has to be a > MACHINEOVERRIDES since not all overrides are evaluated for the > fetcher > in order to save parsing time (is that correct?) > > If I do a: > > $ bitbake -c cleansstate > > (perhaps "-c clean" would be enough?) then perform a build, I always get > the > correct set of config files in my image. If I don't do a clenastate between > builds in which I change the override, then I simply get the last config > file > that's in the WORKDIR. This example is a bit surprising to me. I understand the case mentioned by Richard that files aren't removed from WORKDIR when they are no longer in SRC_URI (happens to me all the time when e.g. renaming a .patch file and then seeing both old and new .patch file in WORKDIR). But why doesn't fetcher overwrite your chrony.conf and interfaces file after MACHINEOVERRIDES is changed? And are you really changing MACHINEOVERRIDES while MACHINE stays the same? I would expect 2 MACHINEs each with own set of MACHINEOVERRIDES and recipes like this being MACHINE_ARCH not TUNE_PKGARCH and then each will have own WORKDIR with own set of files. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1693): https://lists.openembedded.org/g/openembedded-architecture/message/1693 Mute This Topic: https://lists.openembedded.org/mt/95936561/21656 Group Owner: openembedded-architecture+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On Thu, Dec 29, 2022 at 7:56 AM Richard Purdie wrote: > > I was asked about a WORKDIR/fetcher interaction problem and the bugs it > results in. I've tried to write down my thoughts. > > The unpack task writes it's output to WORKDIR as base.bbclass says: > > fetcher = bb.fetch2.Fetch(src_uri, d) > fetcher.unpack(d.getVar('WORKDIR') > > We historically dealt with tarballs which usually have a NAME-VERSION > directory within them, so when you extract them, they go into a sub > directory which tar creates. We usually call that subdirectory "S". > > When we wrote the git fetcher, we emulated this by using a "git" > directory to extract into rather than WORKDIR. > > For local files, there is no sub directory so they go into WORKDIR. > This includes patches, which do_patch looks for in WORKDIR and applies > them from there. > > What issues does this cause? If you have an existing WORKDIR and run a > build with: > > SRC_URI = "file://a file://b" > > then change it to: > > SRC_URI = "file://a" > > and rebuild the recipe, the fetch and unpack tasks will rerun and their > hashes will change but the file "b" is still in WORKDIR. Nothing in the > codebase knows that it should delete "b" from there. If you have code > which does "if exists(b)", which is common, it will break. > > There are variations on this, such as a conditional append on some > override to SRC_URI but the fundamental problem is one of cleanup when > unpack is to rerun. > > The naive approach is then to think "lets just delete WORKDIR" when > running do_unpack. There is the small problem of WORKDIR/temp with logs > in. There is also the pseudo database and other things tasks could have > done. Basically, whilst tempting, it doesn't work out well in practise > particularly as that whilst unpack might rerun, not all other tasks > might. > > I did also try a couple of other ideas. We could fetch into a > subdirectory, then either copy or symlink into place depending on which > set of performance/usabiity challenges you want to deal with. You could > involve a manifest of the files and then move into position so later > you'd know which ones to delete. > > Part of the problem is that in some cases recipes do: > > S = "${WORKDIR}" > > for simplicity. This means that you also can't wipe out S as it might > point at WORKDIR. > > SPDX users have requested a json file of file and checksums after the > unpack and before do_patch. Such a manifest could also be useful for > attempting cleanup of an existing WORKDIR so I suspect the solution > probably lies in that direction, probably unpacking into a subdir, > indexing it, then moving into position. By "moving it into position" do you mean moving the files from the clean subdirectory to the locations they would occupy today? If so I don't understand why that's strictly necessary. It seems like almost all of the complexity of this will be to support a use-case we don't really like anyway (S = "${WORKDIR}"). Manifests are great and all, but it causes a lot of problems if they get out of sync and I suspect that would happen more often than we would like, e.g. with devtool, make config, manual editing, etc. If we can keep it simple and not rely on external state (e.g. a manifest) I think it will be a lot easier to maintain in the long run. > > Personally, I'd also like to see S = "${WORKDIR}" deprecated and > dropped so that a subdir is always used, just to stop our code getting > too full of corner cases which are hard to maintain. > > I've had a few experiments with variations on both issues on various > branches at different times, I just haven't had enough time to > socialise the changes, migrate code and handle the inevitable fallout. > > Cheers, > > Richard > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1692): https://lists.openembedded.org/g/openembedded-architecture/message/1692 Mute This Topic: https://lists.openembedded.org/mt/95936561/21656 Group Owner: openembedded-architecture+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [Openembedded-architecture] WORKDIR fetcher interaction issue
On Thu 2022-12-29 @ 01:56:51 PM, Richard Purdie wrote: > There are variations on this, such as a conditional append on some > override to SRC_URI but the fundamental problem is one of cleanup when > unpack is to rerun. ...just to elaborate a bit more on this variation for everyone's benefit (Richard already understands the details of my scenario): Some recipes require us to generate config files by hand in order to get a piece of software/service to work a correctly in our environment. A concrete example could be specifying the IP address of a time server to use for clock synchronization in chrony's /etc/chrony.conf file. Another example could be to provide a /etc/network/interfaces file so networking works on a given device in our specific network. In my case I might want to build the same image, for the same device, but use two different sets of config files. If the device is going to run on my non-routable network then it will use CONDITION1 config files. If I want to build a set of images for devices running on my routable network then I'll need to use the CONDITION2 set of config files: meta-project ├── README ├── conf │ └── layer.conf └── recipes-configfiles ├── chrony │ ├── chrony_%.bbappend │ └── files │ ├── condition1 │ │ └── chrony.conf │ └── condition2 │ └── chrony.conf └── init-ifupdown ├── files │ ├── condition1 │ │ └── interfaces │ └── condition2 │ └── interfaces └── init-ifupdown_%.bbappend Then, somewhere, I either specify: MACHINEOVERRIDES .= ":condition1" or: MACHINEOVERRIDES .= ":condition2" NOTE: using "OVERRIDES .= ":conditionX" doesn't work, it has to be a MACHINEOVERRIDES since not all overrides are evaluated for the fetcher in order to save parsing time (is that correct?) If I do a: $ bitbake -c cleansstate (perhaps "-c clean" would be enough?) then perform a build, I always get the correct set of config files in my image. If I don't do a clenastate between builds in which I change the override, then I simply get the last config file that's in the WORKDIR. -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1691): https://lists.openembedded.org/g/openembedded-architecture/message/1691 Mute This Topic: https://lists.openembedded.org/mt/95936561/21656 Group Owner: openembedded-architecture+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[Openembedded-architecture] WORKDIR fetcher interaction issue
I was asked about a WORKDIR/fetcher interaction problem and the bugs it results in. I've tried to write down my thoughts. The unpack task writes it's output to WORKDIR as base.bbclass says: fetcher = bb.fetch2.Fetch(src_uri, d) fetcher.unpack(d.getVar('WORKDIR') We historically dealt with tarballs which usually have a NAME-VERSION directory within them, so when you extract them, they go into a sub directory which tar creates. We usually call that subdirectory "S". When we wrote the git fetcher, we emulated this by using a "git" directory to extract into rather than WORKDIR. For local files, there is no sub directory so they go into WORKDIR. This includes patches, which do_patch looks for in WORKDIR and applies them from there. What issues does this cause? If you have an existing WORKDIR and run a build with: SRC_URI = "file://a file://b" then change it to: SRC_URI = "file://a" and rebuild the recipe, the fetch and unpack tasks will rerun and their hashes will change but the file "b" is still in WORKDIR. Nothing in the codebase knows that it should delete "b" from there. If you have code which does "if exists(b)", which is common, it will break. There are variations on this, such as a conditional append on some override to SRC_URI but the fundamental problem is one of cleanup when unpack is to rerun. The naive approach is then to think "lets just delete WORKDIR" when running do_unpack. There is the small problem of WORKDIR/temp with logs in. There is also the pseudo database and other things tasks could have done. Basically, whilst tempting, it doesn't work out well in practise particularly as that whilst unpack might rerun, not all other tasks might. I did also try a couple of other ideas. We could fetch into a subdirectory, then either copy or symlink into place depending on which set of performance/usabiity challenges you want to deal with. You could involve a manifest of the files and then move into position so later you'd know which ones to delete. Part of the problem is that in some cases recipes do: S = "${WORKDIR}" for simplicity. This means that you also can't wipe out S as it might point at WORKDIR. SPDX users have requested a json file of file and checksums after the unpack and before do_patch. Such a manifest could also be useful for attempting cleanup of an existing WORKDIR so I suspect the solution probably lies in that direction, probably unpacking into a subdir, indexing it, then moving into position. Personally, I'd also like to see S = "${WORKDIR}" deprecated and dropped so that a subdir is always used, just to stop our code getting too full of corner cases which are hard to maintain. I've had a few experiments with variations on both issues on various branches at different times, I just haven't had enough time to socialise the changes, migrate code and handle the inevitable fallout. Cheers, Richard -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1690): https://lists.openembedded.org/g/openembedded-architecture/message/1690 Mute This Topic: https://lists.openembedded.org/mt/95936561/21656 Group Owner: openembedded-architecture+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-