Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Florian Schmaus writes: > [[PGP Signed Part:Undecided]] > On 30/06/2023 10.22, Sam James wrote: >> Florian Schmaus writes: >>> [[PGP Signed Part:Undecided]] >>> [in reply to a gentoo-project@ post, but it was asked to continue this >>> on gentoo-dev@] >>> On 28/06/2023 16.46, Sam James wrote: and questions remain unanswered on the ML (why not implement a check in pkgcheck similar to what is in Portage, for example)? >>> >>> On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for >>> the total package-directory size. I only care a little about the tool >>> that checks this limit, but pkgcheck is an obvious choice. I also >>> suggested that we review this policy once the number of Go packages >>> has doubled or two years after this policy was established (whatever >>> comes first). >>> >>> But I fear you may be referring to another kind of check. You may be >>> talking about a check that forbids EGO_SUM in ::gentoo but allows it >>> overlays. >> My position on this has been consistent: > a check is needed to >> statically >> determine when the environment size is too big. Copying the Portage >> check into pkgcheck (in terms of the metrics) would satisfy this. > > It is not as easy as merely copying existing portage code into > pkgcheck (unless I am missing something). > That's why I said "in terms of the metrics". > I've talked to arthurzam, and there appears to be a .environment file > created by pkgcheck, which we could use to approximate the exported > environment. > > Another option would be to have pkgcheck count the EGO_SUM > entries. The tree-sitter API for Bash, which pkgcheck already uses, > seems to allow for that. But that would be different from the check in > portage. Although, IMHO, counting EGO_SUM entries would be sufficient. Right. > > >> That is, regardless of raw size, I'm asking for a calculation based on >> the contents of EGO_SUM where, if exceeded, the package will not be >> installable on some systems. You didn't have an issue implementing this >> for Portage and I've mentioned this a bunch of times since, so I thought >> it was clear what I was hoping to see. > > So pkgcheck counting EGO_SUM entries would be sufficient for the > purpose of having a static check that notices if the ebuild would > likely run into the environment limit? > If you check it actually fires in some of the old broken scenarios (see Bugzilla), then yes. But I'd be interested in your thoughts on radhermit's reply (please reply there). > To find a common compromise, I would possibly invest my time in > developing such a test. Even though I do not deem such a check a > strict prerequisite to reintroduce EGO_SUM. Yes, you've made clear you disagree. > > >>> Intelligibly, EGO_SUM can be considered ugly. Compared to a >>> traditional Gentoo package, EGO_SUM-based ones are larger. The same is >>> true for Rust packages. However, looking at the bigger picture, >>> EGO_SUM's advantages outweigh its disadvantages. >>> >> Again, am on record as being fine with the general EGO_SUM approach, >> even if I wish we didn't need it, as I see it as inevitable for things >> like yarn, .NET, and of course Rust as we already have it. >> Just ideally not huge ones, and certainly not huge ones which then >> aren't even reliably installable because of environment size. > > Talking about "reliably installable" makes it sound to me like there > are cases where installing a EGO_SUM-based package sometimes works and > sometimes not. But the kernel-limit is fixed and not even > configurable, besides, of course patching the source (and in the > absence of architectures with a page size below 4 KiB) [1]. > ulm's reply notes that this is a limitation in the Linux kernel, so I have no idea why musl tinderboxes seemed to disproportionately hit these issues and I assume one of us either missing something or it was just a crazy fluke. > Any developer testing whether or notan ebuild is installable would > become immediately aware if the ebuild runs into the environment > limit, or not. > This clearly didn't happen with the previous examples (see what I said above too), as there were times when they installed for some people, but not in CI/tinderboxes. I don't know why and it merits investigation. signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Zoltan Puskas writes: > On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote: >> On 2023-07-03 Mon 04:17, Florian Schmaus wrote: >> >On 30/06/2023 13.33, Eray Aslan wrote: >> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: >> >>>Why do we have to keep exporting the related variables that generally >> >>>cause these size issues to the environment? >> >> >> >>I really do not want to make a +1 response but this is an excellent >> >>question that we need to answer before implementing EGO_SUM. >> > >> >Could you please discuss why you make the reintroduction of EGO_SUM >> >dependent on this question? >> >> Just to be clear, I don't particularly care about EGO_SUM enough to gate >> its reintroduction (and don't have any leverage to do so anyway). I'm >> just tired of the circular discussions around env issues that all seem >> to avoid actual fixes, catering instead to functionality used by a >> vanishingly small subset of ebuilds in the main repo that compels a >> certain design mostly due to how portage functioned before EAPI 0. >> >> Other than that, supporting EGO_SUM (or any other language ecosystem >> trending towards distro-unfriendly releases) is fine as long as devs are >> cognizant how the related global-scope eclass design affects everyone >> running or working on the raw repo. I hope devs continue leveraging the >> relatively recent benchmark tooling (and perhaps more future support) to >> improve their work. Along those lines, it could be nice to see sample >> benchmark data in commit messages for large, global-scope eclass work >> just to reinforce that it was taken into account. >> >> Tim >> > > I've been following the EGO_SUM thread for quite some time now. One other > thing > I did not see mentioned in favour of EGO_SUM so far: reproducibility. > > The problem with external tarballs is that they are gone once the ebuild is > dropped from the tree. Should a user ever want to roll back to a previous > version of an application, either by checking out on older version of the > portage tree or copying said ebuild into their local overlay, they still > cannot > simply run an emerge on the it as they have to somehow recreate the tarball > itself too. I believe Hank's email coves this. signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open
On Thu, Jul 6, 2023 Zoltan Puskas wrote: > I've been following the EGO_SUM thread for quite some time now. One > other thing I did not see mentioned in favour of EGO_SUM so far: > reproducibility. > The problem with external tarballs is that they are gone once the > ebuild is dropped from the tree. Should a user ever want to roll back > to a previous version of an application, either by checking out on > older version of the portage tree or copying said ebuild into their > local overlay, they still cannot simply run an emerge on the it as > they have to somehow recreate the tarball itself too. > While upstream may not host everything forever, it's pretty much > guaranteed to be available for much longer than Gentoo's custom > tarball bundles of dependencies. I see this brought up every once in a while in these EGO_SUM threads, but I think reproducable tarballs are a solved problem, or at least, the tools exist and we just need to decide how to best equip people with them. thesamesam/sam-gentoo-scripts has maint/bump-go which builds these tarballs smartly and reproducably: - use --sort=name to order files inside in a consistent way - use consistent owner:group (portage:portage) - use consistent LC and TZ settings - set a standard timestamp (since 'go mod download' doesn't preserve upstream timestamps anyway, this loses no useful information) With that, multiple developers can independently generate a -deps tarball for a given Go package version with checksums that match. The main distro tarball's checksums are verified against Manifest, and then within it are the list and checksums of the individual downloads which would be verified by go mod download (right?) and the resulting -deps files should also match Manifest entries. So a similar approach could be used in the case of expired ::gentoo versions being installed, or overlays using -deps files without a way to host them. Set things up so this can be done easily on demand or perhaps automatically as needed (maybe through a variation on pkg_nofetch in a Go eclass; that part is not obvious to me). Thanks, -- Hank Leininger 9606 3BF9 B593 4CBC E31A A384 6200 F6E3 781E 3DD7 signature.asc Description: Digital signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote: > On 2023-07-03 Mon 04:17, Florian Schmaus wrote: > >On 30/06/2023 13.33, Eray Aslan wrote: > >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: > >>>Why do we have to keep exporting the related variables that generally > >>>cause these size issues to the environment? > >> > >>I really do not want to make a +1 response but this is an excellent > >>question that we need to answer before implementing EGO_SUM. > > > >Could you please discuss why you make the reintroduction of EGO_SUM > >dependent on this question? > > Just to be clear, I don't particularly care about EGO_SUM enough to gate > its reintroduction (and don't have any leverage to do so anyway). I'm > just tired of the circular discussions around env issues that all seem > to avoid actual fixes, catering instead to functionality used by a > vanishingly small subset of ebuilds in the main repo that compels a > certain design mostly due to how portage functioned before EAPI 0. > > Other than that, supporting EGO_SUM (or any other language ecosystem > trending towards distro-unfriendly releases) is fine as long as devs are > cognizant how the related global-scope eclass design affects everyone > running or working on the raw repo. I hope devs continue leveraging the > relatively recent benchmark tooling (and perhaps more future support) to > improve their work. Along those lines, it could be nice to see sample > benchmark data in commit messages for large, global-scope eclass work > just to reinforce that it was taken into account. > > Tim > I've been following the EGO_SUM thread for quite some time now. One other thing I did not see mentioned in favour of EGO_SUM so far: reproducibility. The problem with external tarballs is that they are gone once the ebuild is dropped from the tree. Should a user ever want to roll back to a previous version of an application, either by checking out on older version of the portage tree or copying said ebuild into their local overlay, they still cannot simply run an emerge on the it as they have to somehow recreate the tarball itself too. While upstream may not host everything forever, it's pretty much guaranteed to be available for much longer than Gentoo's custom tarball bundles of dependencies. Regarding space we are also likely making trade-off. By deprecating EGO_SUM we are saving space in the portage tree but in exchange inflating distfiles as it will start accumulating the same dependencies potentially multiple times since now the content is hidden in tarballs containing a combination of dependencies. This is essentially the source file version of "statically linking". Finally a personal opinion: I find dependency tarballs opaque. With EGO_SUM the ebuild defines all the upstream sources it needs to build the package as well as how to build it, but with the dependency tarball the sources are all hidden and makes verification all that much harder. Zoltan
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Wed, Jul 05, 2023 at 20:40:34 +0200, Gerion Entrup wrote: > Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > > > just to be curious about the whole discussion. I did not follow in the > > > > deepest detail but what I got is: > > > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > > > to be respected. A lot of these Manifest files lead to a extremely > > > > increased Portage tree size. EGO_SUM is just one example (though the > > > > biggest one). Statically linked languages like Rust etc. have the same > > > > problem. > > > > - The current solution is to prepackage all modules, put it somewhere on > > > > a webserver and just manifest that file. This make the Portage tree > > > > small in size again, but requires a webserver/mirror and is thus > > > > unfriendly for overlay devs. > > > > > > > > I'm not sure if it was mentioned before but has anyone considered hash > > > > trees / Merkle trees for the manifest file? The idea would be to hash > > > > the standard manifest file a second time if it gets too big and write > > > > down that hash as new manifest file and leave EGO_SUM as is. > > > This is out-of-tree/indirect Manifests, that I proposed here, more than > > > a year ago: > > > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2 > > > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2 > > > > > > Developing it requires PMS work in addition to package manager > > > development, because it introduces phases. > > > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > > - primary validation of distfiles > > > - secondary fetch of $SRC_URI per indirect Manifest > > > - secondary validation of additional distfiles > > > > > > A significantly impacted use case is "emerge -f", it now needs to run > > > downloads twice. > > > > > > > I'm not sure double downloading is required. Consider a flow similar to > > this: > > > > 1. distfiles are fetched as per the ebuild > > 2. distfiles are hashed into a temporary Manifest > > 3. temporary Manifest is hashed and compared with the hashes stored in > >the in-tree Manifest for the direct Manifest > > This is exactly, what I meant. A webstorage is not needed. A second > download process is also not needed. Just an additional Manifest format > is needed for ebuilds with more than n distfiles. > > > > A new Manifest format would be required in order to differentiate the > > current ones from an indirect one. This may require PMS changes, > > although I suspect ammending GLEP 74 may be enough since the PMS seems > > to just refer to the GLEP for a description of Manifests. > > > > This would also either rely on a stable ordering of Manifest contents > > when generating it or having a separate file listing in the indirect > > Manifest which corresponds to the order in the direct Manifest. For the > > latter, it should also have separate entries for different package > > versions so that every single distfile for every single version of said > > package does not need to be fetched in order to build the direct > > Manifest. > > > > I'm imagining something along these lines: > > > > INDIRECT true > > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 > > ALGO2 hash2 ... > > PACKAGE ... > > Maybe it is reasonable to skip the distfile names at all (or just > provide a hash value of the concatenated file names). Then the manifest > would just contain two/three hashes (for as many distfiles as the ebuild > needs). Since these kind of indirect Manifests should be more rare than > the normal ones, a slightly longer processing time does not have much > impact I would say. > My reasoning behind having the list of files is so that the intermediat/direct Manifest can be accurately recreated. Consider the following (not-so-)hypothetical Manifest: DIST dist.tar.gz 84703 BLAKE2B ... SHA512 ... DIST dist.tar.gz.asc 228 BLAKE2B ... SHA512 ... EBUILD package-r1.ebuild 1535 BLAKE2B ... SHA512 ... EBUILD package.ebuild 1536 BLAKE2B ... SHA512 ... MISC metadata.xml 959 BLAKE2B ... SHA512 ... It is "well behaved" because pkgdev created it. My main concern is if $OTHER_TOOLING generates the Manifest in a different order which would mean the Manifest may be correct, but you get a false negative since the hashes don't match what is in the in-tree indirect Manifest. Having the order specified in the indirect Manifest renders this moot because $OTHER_TOOLING would have to respect this in order to correctly handle indirect Manifests. Additionally, in repos without thin-manifests, the SRC_URI is not enough to build up the Manifest. This may or may not be an issue depending on if a repo's metadata/layout.conf is parsed as part of the Manifest verification process. > > > > Here `ALGO1` a
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Wed, Jul 5, 2023 at 2:40 PM Gerion Entrup wrote: > > Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > > > > > > Developing it requires PMS work in addition to package manager > > > development, because it introduces phases. > > > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > > - primary validation of distfiles > > > - secondary fetch of $SRC_URI per indirect Manifest > > > - secondary validation of additional distfiles > > > > > > A significantly impacted use case is "emerge -f", it now needs to run > > > downloads twice. > > > > I'm not sure double downloading is required. Consider a flow similar to > > this: > > > > 1. distfiles are fetched as per the ebuild > > 2. distfiles are hashed into a temporary Manifest > > 3. temporary Manifest is hashed and compared with the hashes stored in > >the in-tree Manifest for the direct Manifest > > This is exactly, what I meant. A webstorage is not needed. A second > download process is also not needed. Just an additional Manifest format > is needed for ebuilds with more than n distfiles. > I suspect that Robin was proposing indirect manfests AND src uris, and not just indirect manifests. In any case, if he wasn't, then I'd suggest it would make sense to have that so that we don't need giant lists of src_uris or go sums or whatever in ebuilds. Sure, the manifests are even larger than the original file references, but those will still be long. Plus if a file is used by 5 versions of an ebuild it will be present in the manifests once per hash function, but in the ebuilds 5 times. I agree though that if only the manifests are moved to a fetched file then you could fetch that on the first pass, though you'd still need the extra logic to parse it. I'm not sure it really is much of a difference to the effort involved. Aren't go sums already content hashes? It might make even more sense to create some kind of modular manifest verification logic in portage so that the same eclass that handles EGO_SUM could tell the package manager how to check the integrity of the files that are fetched. Well, assuming we trust whatever hash function they're using (I'm afraid to check - maybe this isn't such a great idea...). -- Rich
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > > just to be curious about the whole discussion. I did not follow in the > > > deepest detail but what I got is: > > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > > to be respected. A lot of these Manifest files lead to a extremely > > > increased Portage tree size. EGO_SUM is just one example (though the > > > biggest one). Statically linked languages like Rust etc. have the same > > > problem. > > > - The current solution is to prepackage all modules, put it somewhere on > > > a webserver and just manifest that file. This make the Portage tree > > > small in size again, but requires a webserver/mirror and is thus > > > unfriendly for overlay devs. > > > > > > I'm not sure if it was mentioned before but has anyone considered hash > > > trees / Merkle trees for the manifest file? The idea would be to hash > > > the standard manifest file a second time if it gets too big and write > > > down that hash as new manifest file and leave EGO_SUM as is. > > This is out-of-tree/indirect Manifests, that I proposed here, more than > > a year ago: > > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2 > > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2 > > > > Developing it requires PMS work in addition to package manager > > development, because it introduces phases. > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > - primary validation of distfiles > > - secondary fetch of $SRC_URI per indirect Manifest > > - secondary validation of additional distfiles > > > > A significantly impacted use case is "emerge -f", it now needs to run > > downloads twice. > > > > I'm not sure double downloading is required. Consider a flow similar to > this: > > 1. distfiles are fetched as per the ebuild > 2. distfiles are hashed into a temporary Manifest > 3. temporary Manifest is hashed and compared with the hashes stored in >the in-tree Manifest for the direct Manifest This is exactly, what I meant. A webstorage is not needed. A second download process is also not needed. Just an additional Manifest format is needed for ebuilds with more than n distfiles. > A new Manifest format would be required in order to differentiate the > current ones from an indirect one. This may require PMS changes, > although I suspect ammending GLEP 74 may be enough since the PMS seems > to just refer to the GLEP for a description of Manifests. > > This would also either rely on a stable ordering of Manifest contents > when generating it or having a separate file listing in the indirect > Manifest which corresponds to the order in the direct Manifest. For the > latter, it should also have separate entries for different package > versions so that every single distfile for every single version of said > package does not need to be fetched in order to build the direct > Manifest. > > I'm imagining something along these lines: > > INDIRECT true > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 > ALGO2 hash2 ... > PACKAGE ... Maybe it is reasonable to skip the distfile names at all (or just provide a hash value of the concatenated file names). Then the manifest would just contain two/three hashes (for as many distfiles as the ebuild needs). Since these kind of indirect Manifests should be more rare than the normal ones, a slightly longer processing time does not have much impact I would say. > Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest > containing the distfiles (and potentially other files if a repo does not > have thin-manifests enabled) and their hashes in the order specified > previously. > > The indirect Manifest as described above would be large-ish for a > package that has lots of distfiles, but likely much smaller than if each > distfile had its set of hashes stored directly. Without storing the filenames, the Manifest file would have the same small size for any amount of distfiles needed. Gerion > Please correct me if there's some detail I've overlooked. > > - Oskari > > > The rest of the posts also go into the matter of duplication within > > EGO_SUM & the indirect Manifests: limiting the growth requires some form > > of content-addressed layout. > > > > It's absolutely something we should get developed, but it's a lot of > > work. > > > > The indirect Manifests still provide a hosting challenge for overlays. > > > > > signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > just to be curious about the whole discussion. I did not follow in the > > deepest detail but what I got is: > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > to be respected. A lot of these Manifest files lead to a extremely > > increased Portage tree size. EGO_SUM is just one example (though the > > biggest one). Statically linked languages like Rust etc. have the same > > problem. > > - The current solution is to prepackage all modules, put it somewhere on > > a webserver and just manifest that file. This make the Portage tree > > small in size again, but requires a webserver/mirror and is thus > > unfriendly for overlay devs. > > > > I'm not sure if it was mentioned before but has anyone considered hash > > trees / Merkle trees for the manifest file? The idea would be to hash > > the standard manifest file a second time if it gets too big and write > > down that hash as new manifest file and leave EGO_SUM as is. > This is out-of-tree/indirect Manifests, that I proposed here, more than > a year ago: > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2 > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2 > > Developing it requires PMS work in addition to package manager > development, because it introduces phases. > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > - primary validation of distfiles > - secondary fetch of $SRC_URI per indirect Manifest > - secondary validation of additional distfiles > > A significantly impacted use case is "emerge -f", it now needs to run > downloads twice. > I'm not sure double downloading is required. Consider a flow similar to this: 1. distfiles are fetched as per the ebuild 2. distfiles are hashed into a temporary Manifest 3. temporary Manifest is hashed and compared with the hashes stored in the in-tree Manifest for the direct Manifest A new Manifest format would be required in order to differentiate the current ones from an indirect one. This may require PMS changes, although I suspect ammending GLEP 74 may be enough since the PMS seems to just refer to the GLEP for a description of Manifests. This would also either rely on a stable ordering of Manifest contents when generating it or having a separate file listing in the indirect Manifest which corresponds to the order in the direct Manifest. For the latter, it should also have separate entries for different package versions so that every single distfile for every single version of said package does not need to be fetched in order to build the direct Manifest. I'm imagining something along these lines: INDIRECT true PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ... PACKAGE ... Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest containing the distfiles (and potentially other files if a repo does not have thin-manifests enabled) and their hashes in the order specified previously. The indirect Manifest as described above would be large-ish for a package that has lots of distfiles, but likely much smaller than if each distfile had its set of hashes stored directly. Please correct me if there's some detail I've overlooked. - Oskari > The rest of the posts also go into the matter of duplication within > EGO_SUM & the indirect Manifests: limiting the growth requires some form > of content-addressed layout. > > It's absolutely something we should get developed, but it's a lot of > work. > > The indirect Manifests still provide a hosting challenge for overlays. > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation Treasurer > E-Mail : robb...@gentoo.org > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > just to be curious about the whole discussion. I did not follow in the > deepest detail but what I got is: > - EGO_SUM blows up the Manifest file, since every little Go module needs > to be respected. A lot of these Manifest files lead to a extremely > increased Portage tree size. EGO_SUM is just one example (though the > biggest one). Statically linked languages like Rust etc. have the same > problem. > - The current solution is to prepackage all modules, put it somewhere on > a webserver and just manifest that file. This make the Portage tree > small in size again, but requires a webserver/mirror and is thus > unfriendly for overlay devs. > > I'm not sure if it was mentioned before but has anyone considered hash > trees / Merkle trees for the manifest file? The idea would be to hash > the standard manifest file a second time if it gets too big and write > down that hash as new manifest file and leave EGO_SUM as is. This is out-of-tree/indirect Manifests, that I proposed here, more than a year ago: https://marc.info/?l=gentoo-dev&m=168280762310716&w=2 https://marc.info/?l=gentoo-dev&m=165472088822215&w=2 Developing it requires PMS work in addition to package manager development, because it introduces phases. - primary fetch of $SRC_URI per ebuild, including indirect Manifest - primary validation of distfiles - secondary fetch of $SRC_URI per indirect Manifest - secondary validation of additional distfiles A significantly impacted use case is "emerge -f", it now needs to run downloads twice. The rest of the posts also go into the matter of duplication within EGO_SUM & the indirect Manifests: limiting the growth requires some form of content-addressed layout. It's absolutely something we should get developed, but it's a lot of work. The indirect Manifests still provide a hosting challenge for overlays. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Am Dienstag, 4. Juli 2023, 09:13:30 CEST schrieb Tim Harder: > On 2023-07-03 Mon 04:17, Florian Schmaus wrote: > >On 30/06/2023 13.33, Eray Aslan wrote: > >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: > >>>Why do we have to keep exporting the related variables that generally > >>>cause these size issues to the environment? > >> > >>I really do not want to make a +1 response but this is an excellent > >>question that we need to answer before implementing EGO_SUM. > > > >Could you please discuss why you make the reintroduction of EGO_SUM > >dependent on this question? > > Just to be clear, I don't particularly care about EGO_SUM enough to gate > its reintroduction (and don't have any leverage to do so anyway). I'm > just tired of the circular discussions around env issues that all seem > to avoid actual fixes, catering instead to functionality used by a > vanishingly small subset of ebuilds in the main repo that compels a > certain design mostly due to how portage functioned before EAPI 0. > > Other than that, supporting EGO_SUM (or any other language ecosystem > trending towards distro-unfriendly releases) is fine as long as devs are > cognizant how the related global-scope eclass design affects everyone > running or working on the raw repo. I hope devs continue leveraging the > relatively recent benchmark tooling (and perhaps more future support) to > improve their work. Along those lines, it could be nice to see sample > benchmark data in commit messages for large, global-scope eclass work > just to reinforce that it was taken into account. > > Tim Hi, just to be curious about the whole discussion. I did not follow in the deepest detail but what I got is: - EGO_SUM blows up the Manifest file, since every little Go module needs to be respected. A lot of these Manifest files lead to a extremely increased Portage tree size. EGO_SUM is just one example (though the biggest one). Statically linked languages like Rust etc. have the same problem. - The current solution is to prepackage all modules, put it somewhere on a webserver and just manifest that file. This make the Portage tree small in size again, but requires a webserver/mirror and is thus unfriendly for overlay devs. I'm not sure if it was mentioned before but has anyone considered hash trees / Merkle trees for the manifest file? The idea would be to hash the standard manifest file a second time if it gets too big and write down that hash as new manifest file and leave EGO_SUM as is. When Portage tries to install the package, it can download all modules, build the "normal" Manifest file like normally, but instead of directly compare it to the Manifest in the tree it can hash it again and compare that to the provided Manifest. With this, Portage should have more less the same guarantees about the validity of the source code, but the manifest file consists of just two hashes again. What one would loose is the direct comparison of file names (they are included in the "meta"-hash, though) or do I miss something? Gerion signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 2023-07-03 Mon 04:17, Florian Schmaus wrote: On 30/06/2023 13.33, Eray Aslan wrote: On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: Why do we have to keep exporting the related variables that generally cause these size issues to the environment? I really do not want to make a +1 response but this is an excellent question that we need to answer before implementing EGO_SUM. Could you please discuss why you make the reintroduction of EGO_SUM dependent on this question? Just to be clear, I don't particularly care about EGO_SUM enough to gate its reintroduction (and don't have any leverage to do so anyway). I'm just tired of the circular discussions around env issues that all seem to avoid actual fixes, catering instead to functionality used by a vanishingly small subset of ebuilds in the main repo that compels a certain design mostly due to how portage functioned before EAPI 0. Other than that, supporting EGO_SUM (or any other language ecosystem trending towards distro-unfriendly releases) is fine as long as devs are cognizant how the related global-scope eclass design affects everyone running or working on the raw repo. I hope devs continue leveraging the relatively recent benchmark tooling (and perhaps more future support) to improve their work. Along those lines, it could be nice to see sample benchmark data in commit messages for large, global-scope eclass work just to reinforce that it was taken into account. Tim
Re: [gentoo-dev] EGO_SUM
> On Mon, 03 Jul 2023, Florian Schmaus wrote: > So pkgcheck counting EGO_SUM entries would be sufficient for the > purpose of having a static check that notices if the ebuild would > likely run into the environment limit? > To find a common compromise, I would possibly invest my time in > developing such a test. Even though I do not deem such a check a > strict prerequisite to reintroduce EGO_SUM. The so-called "environment limit" is 32 pages, i.e. normally 128 KiB. With the A variable anywhere near this, the size of the Manifest file would be close to 1 MiB. IMHO this is way too large to be used on a regular basis. I am aware that we have some packages with large Manifests (71 packages above 50 KiB, 6 packages above 200 KiB, out of 18812 packages in total), but these should really remain the exception. Ulrich signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 30/06/2023 13.33, Eray Aslan wrote: On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: Why do we have to keep exporting the related variables that generally cause these size issues to the environment? I really do not want to make a +1 response but this is an excellent question that we need to answer before implementing EGO_SUM. Could you please discuss why you make the reintroduction of EGO_SUM dependent on this question? Portage will show you a warning message if the exported environment approaches the kernel limit, and it will show a detailed error message if executing an ebuild failed due to the limit being reached. There seems to be no reason why you should not be able to allow EGO_SUM again without first fixing, for example, https://bugs.gentoo.org/721088. - Flow OpenPGP_0x8CAC2A9678548E35.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 30/06/2023 10.22, Sam James wrote: Florian Schmaus writes: [[PGP Signed Part:Undecided]] [in reply to a gentoo-project@ post, but it was asked to continue this on gentoo-dev@] On 28/06/2023 16.46, Sam James wrote: and questions remain unanswered on the ML (why not implement a check in pkgcheck similar to what is in Portage, for example)? On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for the total package-directory size. I only care a little about the tool that checks this limit, but pkgcheck is an obvious choice. I also suggested that we review this policy once the number of Go packages has doubled or two years after this policy was established (whatever comes first). But I fear you may be referring to another kind of check. You may be talking about a check that forbids EGO_SUM in ::gentoo but allows it overlays. My position on this has been consistent: > a check is needed to statically determine when the environment size is too big. Copying the Portage check into pkgcheck (in terms of the metrics) would satisfy this. It is not as easy as merely copying existing portage code into pkgcheck (unless I am missing something). I've talked to arthurzam, and there appears to be a .environment file created by pkgcheck, which we could use to approximate the exported environment. Another option would be to have pkgcheck count the EGO_SUM entries. The tree-sitter API for Bash, which pkgcheck already uses, seems to allow for that. But that would be different from the check in portage. Although, IMHO, counting EGO_SUM entries would be sufficient. That is, regardless of raw size, I'm asking for a calculation based on the contents of EGO_SUM where, if exceeded, the package will not be installable on some systems. You didn't have an issue implementing this for Portage and I've mentioned this a bunch of times since, so I thought it was clear what I was hoping to see. So pkgcheck counting EGO_SUM entries would be sufficient for the purpose of having a static check that notices if the ebuild would likely run into the environment limit? To find a common compromise, I would possibly invest my time in developing such a test. Even though I do not deem such a check a strict prerequisite to reintroduce EGO_SUM. Intelligibly, EGO_SUM can be considered ugly. Compared to a traditional Gentoo package, EGO_SUM-based ones are larger. The same is true for Rust packages. However, looking at the bigger picture, EGO_SUM's advantages outweigh its disadvantages. Again, am on record as being fine with the general EGO_SUM approach, even if I wish we didn't need it, as I see it as inevitable for things like yarn, .NET, and of course Rust as we already have it. Just ideally not huge ones, and certainly not huge ones which then aren't even reliably installable because of environment size. Talking about "reliably installable" makes it sound to me like there are cases where installing a EGO_SUM-based package sometimes works and sometimes not. But the kernel-limit is fixed and not even configurable, besides, of course patching the source (and in the absence of architectures with a page size below 4 KiB) [1]. Any developer testing whether or not an ebuild is installable would become immediately aware if the ebuild runs into the environment limit, or not. That said, static code checks are always preferable over dynamic ones. - Flow 1: https://elixir.bootlin.com/linux/v6.4.1/source/include/uapi/linux/binfmts.h#L15 OpenPGP_0x8CAC2A9678548E35.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: > Why do we have to keep exporting the related variables that generally > cause these size issues to the environment? I really do not want to make a +1 response but this is an excellent question that we need to answer before implementing EGO_SUM. -- Eray
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 2023-06-30 Fri 02:22, Sam James wrote: > My position on this has been consistent: a check is needed to statically > determine when the environment size is too big. Copying the Portage > check into pkgcheck (in terms of the metrics) would satisfy this. > > That is, regardless of raw size, I'm asking for a calculation based on > the contents of EGO_SUM where, if exceeded, the package will not be > installable on some systems. You didn't have an issue implementing this > for Portage and I've mentioned this a bunch of times since, so I thought > it was clear what I was hoping to see. > > I would also like (which is not what I was referring to here) some > limit on the size, given that we already have a limit on the size of > ${FILESDIR}, but this is less of a concern for me given it's bounded > by the aforementioned environment size check. Why do we have to keep exporting the related variables that generally cause these size issues to the environment? I've asked as much on IRC multiple times (nearly every time this discussion has been brought up) and the answers I've gotten are some variation on "it's always been that way" or "not exporting them would break using commands as external programs" (e.g. calling via xargs). The first response isn't a great argument and the second response, while more valid, also feels less important than having a more minimalistic, exported environment that causes less issues like this one and others such as potentially affecting a package's build system in an unexpected fashion. See bug #721088 for the related discussion on environment variable exports. >From my stance, the spec should state that the only variables to be exported are ones already "semi-standard" and used externally of package manager internals in the expected fashion, which probably only includes HOME, TMPDIR, and maybe ROOT. This would of course currently break packages that use `xargs` while calling internal commands depending on some of those exported variables, but from a cursory glance at the gentoo repo, there aren't many ebuilds using that functionality and in general those that are could be written in an easier to understand fashion without using xargs. It should also be possible to proxy the required variables to those commands in various fashions without using the environment if using commands externally is extremely important to the few ebuild maintainers who make use of that functionality. In short, adding checks to portage and pkgcheck feels like a ill-suited workaround that foists hacking around the error onto users or developers due to a poor decision made decades ago on environment handling. Tim
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Florian Schmaus writes: > [[PGP Signed Part:Undecided]] > [in reply to a gentoo-project@ post, but it was asked to continue this > on gentoo-dev@] > > On 28/06/2023 16.46, Sam James wrote: >> Florian Schmaus writes: >>> On 17/06/2023 10.37, Arthur Zamarin wrote: I also want to nominate people who I feel contribute a lot to Gentoo and I have a lot of interaction with (ordered by name, not priority): […] flow >>> >>> I apologize for the late reply, and thank you for the nomination. I am >>> honored and accept. >>> >>> As many of you know, I am spending a lot of time on the EGO_SUM >>> situation, as it is one of the most critical issues to solve. >>> >>> I have used the last few days to carefully consider whether a seat on >>> the council is more harmful or beneficial to my efforts regarding >>> EGO_SUM. On the one hand, council work means I have less time to >>> improve the EGO_SUM situation. On the other hand, a seat in the >>> council increases the probability of positively influencing Gentoo's >>> future, also regarding EGO_SUM. >>> >> That's fine and it's great to see more people running! > > Excellent that we share this view. :) > > >> But with regard to EGO_SUM: you didn't appear at the meeting where we >> discussed >> your previous EGO_SUM proposal, > > Naively, as I am, I expected that the mailing list would be used for > discussion and that the council meeting would be used chiefly for > voting and intra-council discussion. And since the request to the > council to vote on a concrete proposal was preceded by a > multiple-week, if not month-long, mailing list discussion, I assumed > that my presence in the council meeting was optional. > > Had I known that my presence was required, or that the absence in the > meeting would be blamed on me afterward, I would have appeared if > possible. I'm not blaming you for anything. But you didn't speak in #gentoo-council before the meeting (a few days before IIRC) when we were discussing the problem, I pinged you during the meeting, and you didn't appear there afterwards. You also didn't seem to respond to the council decision (or non-decision) in that meeting either, unless I've missed it. It seems self-evident that discussion would happen in the meeting before voting...? What am I misunderstanding? We regularly discuss things before voting on them. Do you normally observe council meetings? I don't think what we did in this instance was at all unusual. (Also: there's the issue of whether or not the council should really be voting on overriding an eclass maintainer who would then be forced to keep something working they don't want to. mgorny raised that.) > > >> and questions remain unanswered on the >> ML (why not implement a check in pkgcheck similar to what is in Portage, >> for example)? > > On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for > the total package-directory size. I only care a little about the tool > that checks this limit, but pkgcheck is an obvious choice. I also > suggested that we review this policy once the number of Go packages > has doubled or two years after this policy was established (whatever > comes first). > > But I fear you may be referring to another kind of check. You may be > talking about a check that forbids EGO_SUM in ::gentoo but allows it > overlays. My position on this has been consistent: a check is needed to statically determine when the environment size is too big. Copying the Portage check into pkgcheck (in terms of the metrics) would satisfy this. That is, regardless of raw size, I'm asking for a calculation based on the contents of EGO_SUM where, if exceeded, the package will not be installable on some systems. You didn't have an issue implementing this for Portage and I've mentioned this a bunch of times since, so I thought it was clear what I was hoping to see. I would also like (which is not what I was referring to here) some limit on the size, given that we already have a limit on the size of ${FILESDIR}, but this is less of a concern for me given it's bounded by the aforementioned environment size check. > > Intelligibly, EGO_SUM can be considered ugly. Compared to a > traditional Gentoo package, EGO_SUM-based ones are larger. The same is > true for Rust packages. However, looking at the bigger picture, > EGO_SUM's advantages outweigh its disadvantages. > Again, am on record as being fine with the general EGO_SUM approach, even if I wish we didn't need it, as I see it as inevitable for things like yarn, .NET, and of course Rust as we already have it. Just ideally not huge ones, and certainly not huge ones which then aren't even reliably installable because of environment size. signature.asc Description: PGP signature
[gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
[in reply to a gentoo-project@ post, but it was asked to continue this on gentoo-dev@] On 28/06/2023 16.46, Sam James wrote: Florian Schmaus writes: On 17/06/2023 10.37, Arthur Zamarin wrote: I also want to nominate people who I feel contribute a lot to Gentoo and I have a lot of interaction with (ordered by name, not priority): […] flow I apologize for the late reply, and thank you for the nomination. I am honored and accept. As many of you know, I am spending a lot of time on the EGO_SUM situation, as it is one of the most critical issues to solve. I have used the last few days to carefully consider whether a seat on the council is more harmful or beneficial to my efforts regarding EGO_SUM. On the one hand, council work means I have less time to improve the EGO_SUM situation. On the other hand, a seat in the council increases the probability of positively influencing Gentoo's future, also regarding EGO_SUM. That's fine and it's great to see more people running! Excellent that we share this view. :) But with regard to EGO_SUM: you didn't appear at the meeting where we discussed your previous EGO_SUM proposal, Naively, as I am, I expected that the mailing list would be used for discussion and that the council meeting would be used chiefly for voting and intra-council discussion. And since the request to the council to vote on a concrete proposal was preceded by a multiple-week, if not month-long, mailing list discussion, I assumed that my presence in the council meeting was optional. Had I known that my presence was required, or that the absence in the meeting would be blamed on me afterward, I would have appeared if possible. and questions remain unanswered on the ML (why not implement a check in pkgcheck similar to what is in Portage, for example)? On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for the total package-directory size. I only care a little about the tool that checks this limit, but pkgcheck is an obvious choice. I also suggested that we review this policy once the number of Go packages has doubled or two years after this policy was established (whatever comes first). But I fear you may be referring to another kind of check. You may be talking about a check that forbids EGO_SUM in ::gentoo but allows it overlays. However, as stated before [2], this is not a viable approach. One reason why it is not practicable is auditability. The blocker is not a council seat, it's about addressing people's concerns... Unfortunately, it appears that I am terrible at convincing everyone that the deprecation of EGO_SUM was a mistake. I tried to respond to every concern. Often, the response included arguments based on factual data. But eventually, I would only expect to convince some, as the EGO_SUM question touches the subjective realm of style. I know that the EGO_SUM situation and the resulting discussion grew huge and left many understandably bored or confused, which then turned away. But that is a pity because it is a relevant discussion for Gentoo's long-term success. The bottom line is that the EGO_SUM discussion yielded no evidence or even a slight indication that EGO_SUM was deprecated based on technical issues. Instead, it appears that EGO_SUM was deprecated because some deemed it unaesthetic. Intelligibly, EGO_SUM can be considered ugly. Compared to a traditional Gentoo package, EGO_SUM-based ones are larger. The same is true for Rust packages. However, looking at the bigger picture, EGO_SUM's advantages outweigh its disadvantages. - Flow 1: https://marc.info/?l=gentoo-dev&m=168546196902731 <25308876-7ac4-8c90-8641-1034cc67c...@gentoo.org> 2: https://marc.info/?l=gentoo-dev&m=168569387514376 <012fa74d-2910-ea90-6008-26cc23604...@gentoo.org> OpenPGP_0x8CAC2A9678548E35.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [gentoo-dev] EGO_SUM
On 01/06/2023 21.55, William Hubbs wrote: The EGO_SUM alternatives - do not have the same level of trust and therefore have a negative impact on security (a dubious tarball someone put somewhere, especially when proxy-maint) For this, I would argue that vetting the tarball falls to the developer who is proxying. If you don't trust the proxy maintainer you are pushing for, it is easy to make a dependency tarball yourself and add it to your dev space. - are not easily verifiable I don't have a response to this other than to say that go does its own verification of modules with the dependency tarballs that it can't do with vendor tarballs. Yes, go has "go mod verify", which was added to the go-mod eclass after I asked on 2022-10-21 in #gentoo-dev if the eclass verifies the dependency tarball. robbat2 was so kind to provide a proof of concept of the security issue I was pointing out, which is available under https://gist.github.com/robbat2/82f4c208b6674e707081eda689096d55. This demonstration of the issue triggered https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=733b4944c1a061269f96219cc96530f89d8f439e, which made the go-module.eclass run "go mod verify". Unfortunately, a malicious contributor can trivially sidestep this verification step, rendering it ineffective. First, neither portage [1] nor PMS require that a later (source) archive can not override an existing file. This looseness allows, for example, the (non-upstream) dependency tarball, to override (upstream's) go.sum. Secondly, a dependency tarball could create the vendor/ directory, preventing the condition under which the go-module.eclass runs "go mod verify". Both approaches allow the dependency tarball to inject malicious code. With the first approach, "go mod verify" completes successfully; with the second, "go mod verify" is simply not invoked. The verification, as is, is ineffective. Last but not least, we have the same situation in the Rust ecosystem, but we allow the EGO_SUM "equivalent" there. I'm not sure it is quite the same because Rust projects tend to have much smaller numbers of dependencies. I am curious to know of any specific reason why Rust projects generally get by with fewer dependencies. This impression may be deceiving, caused by the fact that the Go-lang ecosystem hosts several projects with a more significant number of dependencies. If you look at the analysis [2], you find that under the top 10 Go packages by EGO_SUM entry count are cri-o, prometheus, k3s, and k3d, among others. If someone rewrites any of those in Rust, they would probably end up with the same number of dependencies. Another thing to consider is that using EGO_SUM adds a significant amount of processing to the go-module eclass. I was advised recently that this isn't a good idea since bash is slow, so I am considering moving most of that processing into get-ego-vendor by having it generate the contents of SRC_URI directly instead of using the eclass code to do that. Was this analyzed and quantified? Is this hurting us? The cache regeneration of an ebuild tree is an embarrassingly parallel operation, so this would need to be exponentially complex [3] to be of any significance. It may be possible to tune the existing EGO_SUM handling. We should keep EGO_SUM if viable, as it directly maps Go's go.sum and makes developing Go-lang ebuilds as frictionless as possible. - Flow 1: https://github.com/gentoo/portage/pull/1030 2: https://dev.gentoo.org/~flow/gentoo-tree-analysis-results/2023-05-17T100838-gentoo-at-2022-02-16-60dc7a03ff2f/post-processed-ego-sum.txt 3: something similar to what was recently found in the latex ebuilds, see https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6ee282f0645dcfccf1836b9cc7ae6629eb8b
Re: [gentoo-dev] EGO_SUM
On 2.6.2023 21.06, William Hubbs wrote: >> >> In theory it's "easy", but in practice how'd you work? This would be >> fine when a single developer is proxying a single maintainer, but when a >> a stack of devs (project) are proxying hundreds of different people, it >> becomes messy and unsustainable rather fast. > > This comment is completely off topic for this thread, so start another > thread for it if you want, but if hundreds of people are being proxied > by proxy-maint, that seems to be a concern unrelated to this. It seems > the fix for that is to advocate for some of these hundreds of people to > become developers so they don't have to be proxied any more. > How is it offtopic when I'm answering concerns you raised? Imagine there are tens of people who do 4 commits a year, roughly, to bump random go packages. What do you believe is the time investment for reviewing, testing and committing their contributions, vs. mentoring them to become devs if they don't involve themselves much outside bumping these packages? Also, will _you_ volunteer to mentor them? It's so easy to push more work for others to do. Sorry if I come out harsh but this is reality, not just theory. -- juippis OpenPGP_signature Description: OpenPGP digital signature
Re: [gentoo-dev] EGO_SUM
On Fri, Jun 02, 2023 at 10:13:55AM +0300, Joonas Niilola wrote: > On 1.6.2023 22.55, William Hubbs wrote: > >> > >> The EGO_SUM alternatives > >> - do not have the same level of trust and therefore have a negative > >> impact on security (a dubious tarball someone put somewhere, especially > >> when proxy-maint) > > > > For this, I would argue that vetting the tarball falls to the developer > > who is proxying. If you don't trust the proxy maintainer you > > are pushing for, it is easy to make a dependency tarball yourself and > > add it to your dev space. > > > > > >> - require additional effort when developing ebuilds > > > > This "additional effort" is pretty subjective. Making a dependency tarball > > isn't a lot of work, especially with the script that I posted in this > > thread. > > > > In theory it's "easy", but in practice how'd you work? This would be > fine when a single developer is proxying a single maintainer, but when a > a stack of devs (project) are proxying hundreds of different people, it > becomes messy and unsustainable rather fast. This comment is completely off topic for this thread, so start another thread for it if you want, but if hundreds of people are being proxied by proxy-maint, that seems to be a concern unrelated to this. It seems the fix for that is to advocate for some of these hundreds of people to become developers so they don't have to be proxied any more. > I do want to point out that any proxied maintainer can and should upload > the vendor tarballs to their own Github / Gitlab distfile-repos for the > time being, but allowing EGO_SUM to be used again would be the easiest > solution here in my opinion for everyone involved. I'm aware it's pushed > back due to technicalities. Like I said at another point in the thread, I want to get rid of EGO_SUM by moving most of the processing for it out of the eclass. I'm looking into that now. This will still run into the same problem as EGO_SUM if $A is still exported, but it should speed up ebuild processing. William signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM
On 1.6.2023 22.55, William Hubbs wrote: >> >> The EGO_SUM alternatives >> - do not have the same level of trust and therefore have a negative >> impact on security (a dubious tarball someone put somewhere, especially >> when proxy-maint) > > For this, I would argue that vetting the tarball falls to the developer > who is proxying. If you don't trust the proxy maintainer you > are pushing for, it is easy to make a dependency tarball yourself and > add it to your dev space. > > >> - require additional effort when developing ebuilds > > This "additional effort" is pretty subjective. Making a dependency tarball > isn't a lot of work, especially with the script that I posted in this thread. > In theory it's "easy", but in practice how'd you work? This would be fine when a single developer is proxying a single maintainer, but when a a stack of devs (project) are proxying hundreds of different people, it becomes messy and unsustainable rather fast. I do want to point out that any proxied maintainer can and should upload the vendor tarballs to their own Github / Gitlab distfile-repos for the time being, but allowing EGO_SUM to be used again would be the easiest solution here in my opinion for everyone involved. I'm aware it's pushed back due to technicalities. -- juippis OpenPGP_signature Description: OpenPGP digital signature
Re: [gentoo-dev] EGO_SUM
I know I'm pretty late to this thread, but I'm going to respond to some of the concerns and suggest another alternative. On Mon, Apr 17, 2023 at 09:37:32AM +0200, Florian Schmaus wrote: > I want to continue the discussion to re-instate EGO_SUM, potentially > leading to a democratic vote on whether EGO_SUM should be re-instated or > deprecated. > > For the past months, I tried to find *technical reasons*, e.g., reasons > that affect end-users, that justify the deprecation of EGO_SUM. However, > I was unable to find any. The closest thing I could find was portage > being unable to process an ebuild due to its large environment (bug > 830187). However, as this happens while developing an ebuild, it should > never affect users. Obviously this is a situation where EGO_SUM can not > be used. Fortunately, it does not affect most Go packages, as seen in my > previous analysis of Go packages in ::gentoo and their EGO_SUM size. > Furthermore, newer portage versions, with USE=gentoo-dev, will > proactively warn you if the environment caused by the ebuild becomes large. > > All further arguments for the deprecation of EGO_SUM where of cosmetic > nature. > > However, the deprecation of EGO_SUM is harmful to Gentoo and its users. > To briefly re-iterate the reasons: > > The EGO_SUM alternatives > - do not have the same level of trust and therefore have a negative > impact on security (a dubious tarball someone put somewhere, especially > when proxy-maint) For this, I would argue that vetting the tarball falls to the developer who is proxying. If you don't trust the proxy maintainer you are pushing for, it is easy to make a dependency tarball yourself and add it to your dev space. > - are not easily verifiable I don't have a response to this other than to say that go does its own verification of modules with the dependency tarballs that it can't do with vendor tarballs. > - require additional effort when developing ebuilds This "additional effort" is pretty subjective. Making a dependency tarball isn't a lot of work, especially with the script that I posted in this thread. > - hinder the packaging and Gentoo's adoption of Go-based projects, which > is worrisome as Go is very popular I don't have a response here. I don't see it as much of a henderance (this is obviously subjective). > - prevent Go modules from being shared as DISTFILES on the mirrors > across various packages The issue here is really the duplicate data in the dependency or vendor tarballs, and yes, there is a lot of it. > Last but not least, we have the same situation in the Rust ecosystem, > but we allow the EGO_SUM "equivalent" there. I'm not sure it is quite the same because Rust projects tend to have much smaller numbers of dependencies. Another thing to consider is that using EGO_SUM adds a significant amount of processing to the go-module eclass. I was advised recently that this isn't a good idea since bash is slow, so I am considering moving most of that processing into get-ego-vendor by having it generate the contents of SRC_URI directly instead of using the eclass code to do that. My thought is to have get-ego-vendor output the value for a variable, GO_SRC_URI and add that to SRC_URI in the ebuild like so: # The output from get-ego-vendor: GO_SRC_URI=" # dependency 1 # dependency 2 " SRC_URI="https://main-project-here ${GO_SRC_URI}" This should speed things up some since most of the processing we are doing in the eclass would be removed, so I would rather not see the council force the use of EGO_SUM. This, however, is still going to hit the limitation of bug 830187. I am, however, open to another solution, so I will keep following this thread. I think the better question should be around what we can do to get bug 721088 or bug 833567 to move forward. Thanks, William signature.asc Description: PGP signature
[gentoo-dev] EGO_SUM
I want to continue the discussion to re-instate EGO_SUM, potentially leading to a democratic vote on whether EGO_SUM should be re-instated or deprecated. For the past months, I tried to find *technical reasons*, e.g., reasons that affect end-users, that justify the deprecation of EGO_SUM. However, I was unable to find any. The closest thing I could find was portage being unable to process an ebuild due to its large environment (bug 830187). However, as this happens while developing an ebuild, it should never affect users. Obviously this is a situation where EGO_SUM can not be used. Fortunately, it does not affect most Go packages, as seen in my previous analysis of Go packages in ::gentoo and their EGO_SUM size. Furthermore, newer portage versions, with USE=gentoo-dev, will proactively warn you if the environment caused by the ebuild becomes large. All further arguments for the deprecation of EGO_SUM where of cosmetic nature. However, the deprecation of EGO_SUM is harmful to Gentoo and its users. To briefly re-iterate the reasons: The EGO_SUM alternatives - do not have the same level of trust and therefore have a negative impact on security (a dubious tarball someone put somewhere, especially when proxy-maint) - are not easily verifiable - require additional effort when developing ebuilds - hinder the packaging and Gentoo's adoption of Go-based projects, which is worrisome as Go is very popular - prevent Go modules from being shared as DISTFILES on the mirrors across various packages Last but not least, we have the same situation in the Rust ecosystem, but we allow the EGO_SUM "equivalent" there. So with portage checking the environment of ebuilds and warning if it becomes too large, and with the arguments above, I do not see any reason we should outlaw EGO_SUM. - Flow Previous discussions: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa https://archives.gentoo.org/gentoo-dev/message/d78af7f168cef24bfa302f7f75c3ef11 OpenPGP_0x8CAC2A9678548E35.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature