Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-08 Thread Sam James

Florian Schmaus  writes:

> [[PGP Signed Part:Undecided]]
> On 30/06/2023 10.22, Sam James wrote:
>> Florian Schmaus  writes:
>>> [[PGP Signed Part:Undecided]]
>>> [in reply to a gentoo-project@ post, but it was asked to continue this
>>> on gentoo-dev@]
>>> On 28/06/2023 16.46, Sam James wrote:
 and questions remain unanswered on the
 ML (why not implement a check in pkgcheck similar to what is in Portage,
 for example)?
>>>
>>> On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for
>>> the total package-directory size. I only care a little about the tool
>>> that checks this limit, but pkgcheck is an obvious choice. I also
>>> suggested that we review this policy once the number of Go packages
>>> has doubled or two years after this policy was established (whatever
>>> comes first).
>>>
>>> But I fear you may be referring to another kind of check. You may be
>>> talking about a check that forbids EGO_SUM in ::gentoo but allows it
>>> overlays.
>> My position on this has been consistent:  > a check is needed to
>> statically
>> determine when the environment size is too big. Copying the Portage
>> check into pkgcheck (in terms of the metrics) would satisfy this.
>
> It is not as easy as merely copying existing portage code into
> pkgcheck (unless I am missing something).
>

That's why I said "in terms of the metrics".

> I've talked to arthurzam, and there appears to be a .environment file
> created by pkgcheck, which we could use to approximate the exported
> environment.
>
> Another option would be to have pkgcheck count the EGO_SUM
> entries. The tree-sitter API for Bash, which pkgcheck already uses,
> seems to allow for that. But that would be different from the check in
> portage. Although, IMHO, counting EGO_SUM entries would be sufficient.

Right.

>
>
>> That is, regardless of raw size, I'm asking for a calculation based on
>> the contents of EGO_SUM where, if exceeded, the package will not be
>> installable on some systems. You didn't have an issue implementing this
>> for Portage and I've mentioned this a bunch of times since, so I thought
>> it was clear what I was hoping to see.
>
> So pkgcheck counting EGO_SUM entries would be sufficient for the
> purpose of having a static check that notices if the ebuild would
> likely run into the environment limit?
>

If you check it actually fires in some of the old broken scenarios
(see Bugzilla), then yes. But I'd be interested in your thoughts on
radhermit's reply (please reply there).

> To find a common compromise, I would possibly invest my time in
> developing such a test. Even though I do not deem such a check a
> strict prerequisite to reintroduce EGO_SUM.

Yes, you've made clear you disagree.

>
>
>>> Intelligibly, EGO_SUM can be considered ugly. Compared to a
>>> traditional Gentoo package, EGO_SUM-based ones are larger. The same is
>>> true for Rust packages. However, looking at the bigger picture,
>>> EGO_SUM's advantages outweigh its disadvantages.
>>>
>> Again, am on record as being fine with the general EGO_SUM approach,
>> even if I wish we didn't need it, as I see it as inevitable for things
>> like yarn, .NET, and of course Rust as we already have it.
>> Just ideally not huge ones, and certainly not huge ones which then
>> aren't even reliably installable because of environment size.
>
> Talking about "reliably installable" makes it sound to me like there
> are cases where installing a EGO_SUM-based package sometimes works and
> sometimes not. But the kernel-limit is fixed and not even
> configurable, besides, of course patching the source (and in the
> absence of architectures with a page size below 4 KiB) [1].
>

ulm's reply notes that this is a limitation in the Linux kernel, so I
have no idea why musl tinderboxes seemed to disproportionately hit these
issues and I assume one of us either missing something or it was just
a crazy fluke.

> Any developer testing whether or notan ebuild is installable would
> become immediately aware if the ebuild runs into the environment
> limit, or not.
>

This clearly didn't happen with the previous examples (see what I said
above too), as there were times when they installed for some people, but
not in CI/tinderboxes. I don't know why and it merits investigation.


signature.asc
Description: PGP signature


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-08 Thread Sam James

Zoltan Puskas  writes:

> On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote:
>> On 2023-07-03 Mon 04:17, Florian Schmaus wrote:
>> >On 30/06/2023 13.33, Eray Aslan wrote:
>> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
>> >>>Why do we have to keep exporting the related variables that generally
>> >>>cause these size issues to the environment?
>> >>
>> >>I really do not want to make a +1 response but this is an excellent
>> >>question that we need to answer before implementing EGO_SUM.
>> >
>> >Could you please discuss why you make the reintroduction of EGO_SUM 
>> >dependent on this question?
>> 
>> Just to be clear, I don't particularly care about EGO_SUM enough to gate
>> its reintroduction (and don't have any leverage to do so anyway). I'm
>> just tired of the circular discussions around env issues that all seem
>> to avoid actual fixes, catering instead to functionality used by a
>> vanishingly small subset of ebuilds in the main repo that compels a
>> certain design mostly due to how portage functioned before EAPI 0.
>> 
>> Other than that, supporting EGO_SUM (or any other language ecosystem
>> trending towards distro-unfriendly releases) is fine as long as devs are
>> cognizant how the related global-scope eclass design affects everyone
>> running or working on the raw repo. I hope devs continue leveraging the
>> relatively recent benchmark tooling (and perhaps more future support) to
>> improve their work. Along those lines, it could be nice to see sample
>> benchmark data in commit messages for large, global-scope eclass work
>> just to reinforce that it was taken into account.
>> 
>> Tim
>> 
>
> I've been following the EGO_SUM thread for quite some time now. One other 
> thing
> I did not see mentioned in favour of EGO_SUM so far: reproducibility.
>
> The problem with external tarballs is that they are gone once the ebuild is
> dropped from the tree. Should a user ever want to roll back to a previous
> version of an application, either by checking out on older version of the
> portage tree or copying said ebuild into their local overlay, they still 
> cannot
> simply run an emerge on the it as they have to somehow recreate the tarball
> itself too.

I believe Hank's email coves this.



signature.asc
Description: PGP signature


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-06 Thread Zoltan Puskas
On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote:
> On 2023-07-03 Mon 04:17, Florian Schmaus wrote:
> >On 30/06/2023 13.33, Eray Aslan wrote:
> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
> >>>Why do we have to keep exporting the related variables that generally
> >>>cause these size issues to the environment?
> >>
> >>I really do not want to make a +1 response but this is an excellent
> >>question that we need to answer before implementing EGO_SUM.
> >
> >Could you please discuss why you make the reintroduction of EGO_SUM 
> >dependent on this question?
> 
> Just to be clear, I don't particularly care about EGO_SUM enough to gate
> its reintroduction (and don't have any leverage to do so anyway). I'm
> just tired of the circular discussions around env issues that all seem
> to avoid actual fixes, catering instead to functionality used by a
> vanishingly small subset of ebuilds in the main repo that compels a
> certain design mostly due to how portage functioned before EAPI 0.
> 
> Other than that, supporting EGO_SUM (or any other language ecosystem
> trending towards distro-unfriendly releases) is fine as long as devs are
> cognizant how the related global-scope eclass design affects everyone
> running or working on the raw repo. I hope devs continue leveraging the
> relatively recent benchmark tooling (and perhaps more future support) to
> improve their work. Along those lines, it could be nice to see sample
> benchmark data in commit messages for large, global-scope eclass work
> just to reinforce that it was taken into account.
> 
> Tim
> 

I've been following the EGO_SUM thread for quite some time now. One other thing
I did not see mentioned in favour of EGO_SUM so far: reproducibility.

The problem with external tarballs is that they are gone once the ebuild is
dropped from the tree. Should a user ever want to roll back to a previous
version of an application, either by checking out on older version of the
portage tree or copying said ebuild into their local overlay, they still cannot
simply run an emerge on the it as they have to somehow recreate the tarball
itself too.

While upstream may not host everything forever, it's pretty much guaranteed to
be available for much longer than Gentoo's custom tarball bundles of
dependencies.

Regarding space we are also likely making trade-off. By deprecating EGO_SUM we
are saving space in the portage tree but in exchange inflating distfiles as it
will start accumulating the same dependencies potentially multiple times since
now the content is hidden in tarballs containing a combination of dependencies.
This is essentially the source file version of "statically linking".

Finally a personal opinion: I find dependency tarballs opaque. With EGO_SUM the
ebuild defines all the upstream sources it needs to build the package as well as
how to build it, but with the dependency tarball the sources are all hidden and
makes verification all that much harder.

Zoltan



Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-05 Thread Oskari Pirhonen
On Wed, Jul 05, 2023 at 20:40:34 +0200, Gerion Entrup wrote:
> Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote:
> > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > > > just to be curious about the whole discussion. I did not follow in the
> > > > deepest detail but what I got is:
> > > > - EGO_SUM blows up the Manifest file, since every little Go module needs
> > > >   to be respected. A lot of these Manifest files lead to a extremely
> > > >   increased Portage tree size. EGO_SUM is just one example (though the
> > > >   biggest one). Statically linked languages like Rust etc. have the same
> > > >   problem.
> > > > - The current solution is to prepackage all modules, put it somewhere on
> > > >   a webserver and just manifest that file. This make the Portage tree
> > > >   small in size again, but requires a webserver/mirror and is thus
> > > >   unfriendly for overlay devs.
> > > > 
> > > > I'm not sure if it was mentioned before but has anyone considered hash
> > > > trees / Merkle trees for the manifest file? The idea would be to hash
> > > > the standard manifest file a second time if it gets too big and write
> > > > down that hash as new manifest file and leave EGO_SUM as is.
> > > This is out-of-tree/indirect Manifests, that I proposed here, more than
> > > a year ago:
> > > https://marc.info/?l=gentoo-dev=168280762310716=2
> > > https://marc.info/?l=gentoo-dev=165472088822215=2
> > > 
> > > Developing it requires PMS work in addition to package manager
> > > development, because it introduces phases.
> > > 
> > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > > - primary validation of distfiles
> > > - secondary fetch of $SRC_URI per indirect Manifest
> > > - secondary validation of additional distfiles
> > > 
> > > A significantly impacted use case is "emerge -f", it now needs to run
> > > downloads twice.
> > > 
> > 
> > I'm not sure double downloading is required. Consider a flow similar to
> > this:
> > 
> > 1. distfiles are fetched as per the ebuild
> > 2. distfiles are hashed into a temporary Manifest
> > 3. temporary Manifest is hashed and compared with the hashes stored in
> >the in-tree Manifest for the direct Manifest
> 
> This is exactly, what I meant. A webstorage is not needed. A second
> download process is also not needed. Just an additional Manifest format
> is needed for ebuilds with more than n distfiles.
> 
> 
> > A new Manifest format would be required in order to differentiate the
> > current ones from an indirect one. This may require PMS changes,
> > although I suspect ammending GLEP 74 may be enough since the PMS seems
> > to just refer to the GLEP for a description of Manifests.
> > 
> > This would also either rely on a stable ordering of Manifest contents
> > when generating it or having a separate file listing in the indirect
> > Manifest which corresponds to the order in the direct Manifest. For the
> > latter, it should also have separate entries for different package
> > versions so that every single distfile for every single version of said
> > package does not need to be fetched in order to build the direct
> > Manifest.
> > 
> > I'm imagining something along these lines:
> > 
> > INDIRECT true
> > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 
> > ALGO2 hash2 ...
> > PACKAGE ...
> 
> Maybe it is reasonable to skip the distfile names at all (or just
> provide a hash value of the concatenated file names). Then the manifest
> would just contain two/three hashes (for as many distfiles as the ebuild
> needs). Since these kind of indirect Manifests should be more rare than
> the normal ones, a slightly longer processing time does not have much
> impact I would say.
> 

My reasoning behind having the list of files is so that the
intermediat/direct Manifest can be accurately recreated. Consider the
following (not-so-)hypothetical Manifest:

DIST dist.tar.gz 84703 BLAKE2B ... SHA512 ...
DIST dist.tar.gz.asc 228 BLAKE2B ... SHA512 ...
EBUILD package-r1.ebuild 1535 BLAKE2B ... SHA512 ...
EBUILD package.ebuild 1536 BLAKE2B ... SHA512 ...
MISC metadata.xml 959 BLAKE2B ... SHA512 ...

It is "well behaved" because pkgdev created it. My main concern is if
$OTHER_TOOLING generates the Manifest in a different order which would
mean the Manifest may be correct, but you get a false negative since the
hashes don't match what is in the in-tree indirect Manifest. Having the
order specified in the indirect Manifest renders this moot because
$OTHER_TOOLING would have to respect this in order to correctly handle
indirect Manifests.

Additionally, in repos without thin-manifests, the SRC_URI is not enough
to build up the Manifest. This may or may not be an issue depending on
if a repo's metadata/layout.conf is parsed as part of the Manifest
verification process.

> 
> 
> > Here `ALGO1` and 

Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-05 Thread Rich Freeman
On Wed, Jul 5, 2023 at 2:40 PM Gerion Entrup  wrote:
>
> Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote:
> > >
> > > Developing it requires PMS work in addition to package manager
> > > development, because it introduces phases.
> > >
> > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > > - primary validation of distfiles
> > > - secondary fetch of $SRC_URI per indirect Manifest
> > > - secondary validation of additional distfiles
> > >
> > > A significantly impacted use case is "emerge -f", it now needs to run
> > > downloads twice.
> >
> > I'm not sure double downloading is required. Consider a flow similar to
> > this:
> >
> > 1. distfiles are fetched as per the ebuild
> > 2. distfiles are hashed into a temporary Manifest
> > 3. temporary Manifest is hashed and compared with the hashes stored in
> >the in-tree Manifest for the direct Manifest
>
> This is exactly, what I meant. A webstorage is not needed. A second
> download process is also not needed. Just an additional Manifest format
> is needed for ebuilds with more than n distfiles.
>

I suspect that Robin was proposing indirect manfests AND src uris, and
not just indirect manifests.  In any case, if he wasn't, then I'd
suggest it would make sense to have that so that we don't need giant
lists of src_uris or go sums or whatever in ebuilds.  Sure, the
manifests are even larger than the original file references, but those
will still be long.  Plus if a file is used by 5 versions of an ebuild
it will be present in the manifests once per hash function, but in the
ebuilds 5 times.

I agree though that if only the manifests are moved to a fetched file
then you could fetch that on the first pass, though you'd still need
the extra logic to parse it.  I'm not sure it really is much of a
difference to the effort involved.

Aren't go sums already content hashes?  It might make even more sense
to create some kind of modular manifest verification logic in portage
so that the same eclass that handles EGO_SUM could tell the package
manager how to check the integrity of the files that are fetched.
Well, assuming we trust whatever hash function they're using (I'm
afraid to check - maybe this isn't such a great idea...).

-- 
Rich



Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-05 Thread Gerion Entrup
Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote:
> > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > > just to be curious about the whole discussion. I did not follow in the
> > > deepest detail but what I got is:
> > > - EGO_SUM blows up the Manifest file, since every little Go module needs
> > >   to be respected. A lot of these Manifest files lead to a extremely
> > >   increased Portage tree size. EGO_SUM is just one example (though the
> > >   biggest one). Statically linked languages like Rust etc. have the same
> > >   problem.
> > > - The current solution is to prepackage all modules, put it somewhere on
> > >   a webserver and just manifest that file. This make the Portage tree
> > >   small in size again, but requires a webserver/mirror and is thus
> > >   unfriendly for overlay devs.
> > > 
> > > I'm not sure if it was mentioned before but has anyone considered hash
> > > trees / Merkle trees for the manifest file? The idea would be to hash
> > > the standard manifest file a second time if it gets too big and write
> > > down that hash as new manifest file and leave EGO_SUM as is.
> > This is out-of-tree/indirect Manifests, that I proposed here, more than
> > a year ago:
> > https://marc.info/?l=gentoo-dev=168280762310716=2
> > https://marc.info/?l=gentoo-dev=165472088822215=2
> > 
> > Developing it requires PMS work in addition to package manager
> > development, because it introduces phases.
> > 
> > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > - primary validation of distfiles
> > - secondary fetch of $SRC_URI per indirect Manifest
> > - secondary validation of additional distfiles
> > 
> > A significantly impacted use case is "emerge -f", it now needs to run
> > downloads twice.
> > 
> 
> I'm not sure double downloading is required. Consider a flow similar to
> this:
> 
> 1. distfiles are fetched as per the ebuild
> 2. distfiles are hashed into a temporary Manifest
> 3. temporary Manifest is hashed and compared with the hashes stored in
>the in-tree Manifest for the direct Manifest

This is exactly, what I meant. A webstorage is not needed. A second
download process is also not needed. Just an additional Manifest format
is needed for ebuilds with more than n distfiles.


> A new Manifest format would be required in order to differentiate the
> current ones from an indirect one. This may require PMS changes,
> although I suspect ammending GLEP 74 may be enough since the PMS seems
> to just refer to the GLEP for a description of Manifests.
> 
> This would also either rely on a stable ordering of Manifest contents
> when generating it or having a separate file listing in the indirect
> Manifest which corresponds to the order in the direct Manifest. For the
> latter, it should also have separate entries for different package
> versions so that every single distfile for every single version of said
> package does not need to be fetched in order to build the direct
> Manifest.
> 
> I'm imagining something along these lines:
> 
> INDIRECT true
> PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 
> ALGO2 hash2 ...
> PACKAGE ...

Maybe it is reasonable to skip the distfile names at all (or just
provide a hash value of the concatenated file names). Then the manifest
would just contain two/three hashes (for as many distfiles as the ebuild
needs). Since these kind of indirect Manifests should be more rare than
the normal ones, a slightly longer processing time does not have much
impact I would say.



> Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest
> containing the distfiles (and potentially other files if a repo does not
> have thin-manifests enabled) and their hashes in the order specified
> previously.
> 
> The indirect Manifest as described above would be large-ish for a
> package that has lots of distfiles, but likely much smaller than if each
> distfile had its set of hashes stored directly.

Without storing the filenames, the Manifest file would have the same
small size for any amount of distfiles needed.

Gerion


> Please correct me if there's some detail I've overlooked.
> 
> - Oskari
> 
> > The rest of the posts also go into the matter of duplication within
> > EGO_SUM & the indirect Manifests: limiting the growth requires some form
> > of content-addressed layout.
> > 
> > It's absolutely something we should get developed, but it's a lot of
> > work.
> > 
> > The indirect Manifests still provide a hosting challenge for overlays.
> > 
> 
> 
> 



signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-04 Thread Oskari Pirhonen
On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote:
> On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > just to be curious about the whole discussion. I did not follow in the
> > deepest detail but what I got is:
> > - EGO_SUM blows up the Manifest file, since every little Go module needs
> >   to be respected. A lot of these Manifest files lead to a extremely
> >   increased Portage tree size. EGO_SUM is just one example (though the
> >   biggest one). Statically linked languages like Rust etc. have the same
> >   problem.
> > - The current solution is to prepackage all modules, put it somewhere on
> >   a webserver and just manifest that file. This make the Portage tree
> >   small in size again, but requires a webserver/mirror and is thus
> >   unfriendly for overlay devs.
> > 
> > I'm not sure if it was mentioned before but has anyone considered hash
> > trees / Merkle trees for the manifest file? The idea would be to hash
> > the standard manifest file a second time if it gets too big and write
> > down that hash as new manifest file and leave EGO_SUM as is.
> This is out-of-tree/indirect Manifests, that I proposed here, more than
> a year ago:
> https://marc.info/?l=gentoo-dev=168280762310716=2
> https://marc.info/?l=gentoo-dev=165472088822215=2
> 
> Developing it requires PMS work in addition to package manager
> development, because it introduces phases.
> 
> - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> - primary validation of distfiles
> - secondary fetch of $SRC_URI per indirect Manifest
> - secondary validation of additional distfiles
> 
> A significantly impacted use case is "emerge -f", it now needs to run
> downloads twice.
> 

I'm not sure double downloading is required. Consider a flow similar to
this:

1. distfiles are fetched as per the ebuild
2. distfiles are hashed into a temporary Manifest
3. temporary Manifest is hashed and compared with the hashes stored in
   the in-tree Manifest for the direct Manifest

A new Manifest format would be required in order to differentiate the
current ones from an indirect one. This may require PMS changes,
although I suspect ammending GLEP 74 may be enough since the PMS seems
to just refer to the GLEP for a description of Manifests.

This would also either rely on a stable ordering of Manifest contents
when generating it or having a separate file listing in the indirect
Manifest which corresponds to the order in the direct Manifest. For the
latter, it should also have separate entries for different package
versions so that every single distfile for every single version of said
package does not need to be fetched in order to build the direct
Manifest.

I'm imagining something along these lines:

INDIRECT true
PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 
hash2 ...
PACKAGE ...

Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest
containing the distfiles (and potentially other files if a repo does not
have thin-manifests enabled) and their hashes in the order specified
previously.

The indirect Manifest as described above would be large-ish for a
package that has lots of distfiles, but likely much smaller than if each
distfile had its set of hashes stored directly.

Please correct me if there's some detail I've overlooked.

- Oskari

> The rest of the posts also go into the matter of duplication within
> EGO_SUM & the indirect Manifests: limiting the growth requires some form
> of content-addressed layout.
> 
> It's absolutely something we should get developed, but it's a lot of
> work.
> 
> The indirect Manifests still provide a hosting challenge for overlays.
> 
> -- 
> Robin Hugh Johnson
> Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
> E-Mail   : robb...@gentoo.org
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136




signature.asc
Description: PGP signature


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-04 Thread Robin H. Johnson
On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> just to be curious about the whole discussion. I did not follow in the
> deepest detail but what I got is:
> - EGO_SUM blows up the Manifest file, since every little Go module needs
>   to be respected. A lot of these Manifest files lead to a extremely
>   increased Portage tree size. EGO_SUM is just one example (though the
>   biggest one). Statically linked languages like Rust etc. have the same
>   problem.
> - The current solution is to prepackage all modules, put it somewhere on
>   a webserver and just manifest that file. This make the Portage tree
>   small in size again, but requires a webserver/mirror and is thus
>   unfriendly for overlay devs.
> 
> I'm not sure if it was mentioned before but has anyone considered hash
> trees / Merkle trees for the manifest file? The idea would be to hash
> the standard manifest file a second time if it gets too big and write
> down that hash as new manifest file and leave EGO_SUM as is.
This is out-of-tree/indirect Manifests, that I proposed here, more than
a year ago:
https://marc.info/?l=gentoo-dev=168280762310716=2
https://marc.info/?l=gentoo-dev=165472088822215=2

Developing it requires PMS work in addition to package manager
development, because it introduces phases.

- primary fetch of $SRC_URI per ebuild, including indirect Manifest
- primary validation of distfiles
- secondary fetch of $SRC_URI per indirect Manifest
- secondary validation of additional distfiles

A significantly impacted use case is "emerge -f", it now needs to run
downloads twice.

The rest of the posts also go into the matter of duplication within
EGO_SUM & the indirect Manifests: limiting the growth requires some form
of content-addressed layout.

It's absolutely something we should get developed, but it's a lot of
work.

The indirect Manifests still provide a hosting challenge for overlays.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-04 Thread Gerion Entrup
Am Dienstag, 4. Juli 2023, 09:13:30 CEST schrieb Tim Harder:
> On 2023-07-03 Mon 04:17, Florian Schmaus wrote:
> >On 30/06/2023 13.33, Eray Aslan wrote:
> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
> >>>Why do we have to keep exporting the related variables that generally
> >>>cause these size issues to the environment?
> >>
> >>I really do not want to make a +1 response but this is an excellent
> >>question that we need to answer before implementing EGO_SUM.
> >
> >Could you please discuss why you make the reintroduction of EGO_SUM 
> >dependent on this question?
> 
> Just to be clear, I don't particularly care about EGO_SUM enough to gate
> its reintroduction (and don't have any leverage to do so anyway). I'm
> just tired of the circular discussions around env issues that all seem
> to avoid actual fixes, catering instead to functionality used by a
> vanishingly small subset of ebuilds in the main repo that compels a
> certain design mostly due to how portage functioned before EAPI 0.
> 
> Other than that, supporting EGO_SUM (or any other language ecosystem
> trending towards distro-unfriendly releases) is fine as long as devs are
> cognizant how the related global-scope eclass design affects everyone
> running or working on the raw repo. I hope devs continue leveraging the
> relatively recent benchmark tooling (and perhaps more future support) to
> improve their work. Along those lines, it could be nice to see sample
> benchmark data in commit messages for large, global-scope eclass work
> just to reinforce that it was taken into account.
> 
> Tim

Hi,

just to be curious about the whole discussion. I did not follow in the
deepest detail but what I got is:
- EGO_SUM blows up the Manifest file, since every little Go module needs
  to be respected. A lot of these Manifest files lead to a extremely
  increased Portage tree size. EGO_SUM is just one example (though the
  biggest one). Statically linked languages like Rust etc. have the same
  problem.
- The current solution is to prepackage all modules, put it somewhere on
  a webserver and just manifest that file. This make the Portage tree
  small in size again, but requires a webserver/mirror and is thus
  unfriendly for overlay devs.

I'm not sure if it was mentioned before but has anyone considered hash
trees / Merkle trees for the manifest file? The idea would be to hash
the standard manifest file a second time if it gets too big and write
down that hash as new manifest file and leave EGO_SUM as is.

When Portage tries to install the package, it can download all modules,
build the "normal" Manifest file like normally, but instead of directly
compare it to the Manifest in the tree it can hash it again and compare
that to the provided Manifest. With this, Portage should have more less
the same guarantees about the validity of the source code, but the
manifest file consists of just two hashes again.
What one would loose is the direct comparison of file names (they are
included in the "meta"-hash, though) or do I miss something?

Gerion


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-04 Thread Tim Harder

On 2023-07-03 Mon 04:17, Florian Schmaus wrote:

On 30/06/2023 13.33, Eray Aslan wrote:

On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:

Why do we have to keep exporting the related variables that generally
cause these size issues to the environment?


I really do not want to make a +1 response but this is an excellent
question that we need to answer before implementing EGO_SUM.


Could you please discuss why you make the reintroduction of EGO_SUM 
dependent on this question?


Just to be clear, I don't particularly care about EGO_SUM enough to gate
its reintroduction (and don't have any leverage to do so anyway). I'm
just tired of the circular discussions around env issues that all seem
to avoid actual fixes, catering instead to functionality used by a
vanishingly small subset of ebuilds in the main repo that compels a
certain design mostly due to how portage functioned before EAPI 0.

Other than that, supporting EGO_SUM (or any other language ecosystem
trending towards distro-unfriendly releases) is fine as long as devs are
cognizant how the related global-scope eclass design affects everyone
running or working on the raw repo. I hope devs continue leveraging the
relatively recent benchmark tooling (and perhaps more future support) to
improve their work. Along those lines, it could be nice to see sample
benchmark data in commit messages for large, global-scope eclass work
just to reinforce that it was taken into account.

Tim



Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-03 Thread Florian Schmaus

On 30/06/2023 13.33, Eray Aslan wrote:

On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:

Why do we have to keep exporting the related variables that generally
cause these size issues to the environment?


I really do not want to make a +1 response but this is an excellent
question that we need to answer before implementing EGO_SUM.


Could you please discuss why you make the reintroduction of EGO_SUM 
dependent on this question?


Portage will show you a warning message if the exported environment 
approaches the kernel limit, and it will show a detailed error message 
if executing an ebuild failed due to the limit being reached. There 
seems to be no reason why you should not be able to allow EGO_SUM again 
without first fixing, for example, https://bugs.gentoo.org/721088.


- Flow


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-07-03 Thread Florian Schmaus

On 30/06/2023 10.22, Sam James wrote:

Florian Schmaus  writes:

[[PGP Signed Part:Undecided]]
[in reply to a gentoo-project@ post, but it was asked to continue this
on gentoo-dev@]
On 28/06/2023 16.46, Sam James wrote:

and questions remain unanswered on the
ML (why not implement a check in pkgcheck similar to what is in Portage,
for example)?


On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for
the total package-directory size. I only care a little about the tool
that checks this limit, but pkgcheck is an obvious choice. I also
suggested that we review this policy once the number of Go packages
has doubled or two years after this policy was established (whatever
comes first).

But I fear you may be referring to another kind of check. You may be
talking about a check that forbids EGO_SUM in ::gentoo but allows it
overlays.


My position on this has been consistent:  > a check is needed to statically
determine when the environment size is too big. Copying the Portage
check into pkgcheck (in terms of the metrics) would satisfy this.


It is not as easy as merely copying existing portage code into pkgcheck 
(unless I am missing something).


I've talked to arthurzam, and there appears to be a .environment file 
created by pkgcheck, which we could use to approximate the exported 
environment.


Another option would be to have pkgcheck count the EGO_SUM entries. The 
tree-sitter API for Bash, which pkgcheck already uses, seems to allow 
for that. But that would be different from the check in portage. 
Although, IMHO, counting EGO_SUM entries would be sufficient.




That is, regardless of raw size, I'm asking for a calculation based on
the contents of EGO_SUM where, if exceeded, the package will not be
installable on some systems. You didn't have an issue implementing this
for Portage and I've mentioned this a bunch of times since, so I thought
it was clear what I was hoping to see.


So pkgcheck counting EGO_SUM entries would be sufficient for the purpose 
of having a static check that notices if the ebuild would likely run 
into the environment limit?


To find a common compromise, I would possibly invest my time in 
developing such a test. Even though I do not deem such a check a strict 
prerequisite to reintroduce EGO_SUM.




Intelligibly, EGO_SUM can be considered ugly. Compared to a
traditional Gentoo package, EGO_SUM-based ones are larger. The same is
true for Rust packages. However, looking at the bigger picture,
EGO_SUM's advantages outweigh its disadvantages.



Again, am on record as being fine with the general EGO_SUM approach,
even if I wish we didn't need it, as I see it as inevitable for things
like yarn, .NET, and of course Rust as we already have it.

Just ideally not huge ones, and certainly not huge ones which then
aren't even reliably installable because of environment size.


Talking about "reliably installable" makes it sound to me like there are 
cases where installing a EGO_SUM-based package sometimes works and 
sometimes not. But the kernel-limit is fixed and not even configurable, 
besides, of course patching the source (and in the absence of 
architectures with a page size below 4 KiB) [1].


Any developer testing whether or not an ebuild is installable would 
become immediately aware if the ebuild runs into the environment limit, 
or not.


That said, static code checks are always preferable over dynamic ones.

- Flow


1: 
https://elixir.bootlin.com/linux/v6.4.1/source/include/uapi/linux/binfmts.h#L15




OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-06-30 Thread Eray Aslan
On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
> Why do we have to keep exporting the related variables that generally
> cause these size issues to the environment?

I really do not want to make a +1 response but this is an excellent
question that we need to answer before implementing EGO_SUM.

-- 
Eray



Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-06-30 Thread Tim Harder
On 2023-06-30 Fri 02:22, Sam James wrote:
> My position on this has been consistent: a check is needed to statically
> determine when the environment size is too big. Copying the Portage
> check into pkgcheck (in terms of the metrics) would satisfy this.
> 
> That is, regardless of raw size, I'm asking for a calculation based on
> the contents of EGO_SUM where, if exceeded, the package will not be
> installable on some systems. You didn't have an issue implementing this
> for Portage and I've mentioned this a bunch of times since, so I thought
> it was clear what I was hoping to see.
> 
> I would also like (which is not what I was referring to here) some
> limit on the size, given that we already have a limit on the size of
> ${FILESDIR}, but this is less of a concern for me given it's bounded
> by the aforementioned environment size check.

Why do we have to keep exporting the related variables that generally
cause these size issues to the environment? I've asked as much on IRC
multiple times (nearly every time this discussion has been brought up)
and the answers I've gotten are some variation on "it's always been that
way" or "not exporting them would break using commands as external
programs" (e.g. calling via xargs).

The first response isn't a great argument and the second response, while
more valid, also feels less important than having a more minimalistic,
exported environment that causes less issues like this one and others
such as potentially affecting a package's build system in an unexpected
fashion. See bug #721088 for the related discussion on environment
variable exports.

>From my stance, the spec should state that the only variables to be
exported are ones already "semi-standard" and used externally of package
manager internals in the expected fashion, which probably only includes
HOME, TMPDIR, and maybe ROOT. This would of course currently break
packages that use `xargs` while calling internal commands depending on
some of those exported variables, but from a cursory glance at the
gentoo repo, there aren't many ebuilds using that functionality and in
general those that are could be written in an easier to understand
fashion without using xargs. It should also be possible to proxy the
required variables to those commands in various fashions without using
the environment if using commands externally is extremely important to
the few ebuild maintainers who make use of that functionality.

In short, adding checks to portage and pkgcheck feels like a ill-suited
workaround that foists hacking around the error onto users or developers
due to a poor decision made decades ago on environment handling.

Tim



Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-06-30 Thread Sam James

Florian Schmaus  writes:

> [[PGP Signed Part:Undecided]]
> [in reply to a gentoo-project@ post, but it was asked to continue this
> on gentoo-dev@]
>
> On 28/06/2023 16.46, Sam James wrote:
>> Florian Schmaus  writes:
>>> On 17/06/2023 10.37, Arthur Zamarin wrote:
 I also want to nominate people who I feel contribute a lot to Gentoo and
 I have a lot of interaction with (ordered by name, not priority):
 […]
 flow
>>>
>>> I apologize for the late reply, and thank you for the nomination. I am
>>> honored and accept.
>>>
>>> As many of you know, I am spending a lot of time on the EGO_SUM
>>> situation, as it is one of the most critical issues to solve.
>>>
>>> I have used the last few days to carefully consider whether a seat on
>>> the council is more harmful or beneficial to my efforts regarding
>>> EGO_SUM. On the one hand, council work means I have less time to
>>> improve the EGO_SUM situation. On the other hand, a seat in the
>>> council increases the probability of positively influencing Gentoo's
>>> future, also regarding EGO_SUM.
>>>
>> That's fine and it's great to see more people running!
>
> Excellent that we share this view. :)
>
>
>> But with regard to EGO_SUM: you didn't appear at the meeting where we 
>> discussed
>> your previous EGO_SUM proposal,
>
> Naively, as I am, I expected that the mailing list would be used for
> discussion and that the council meeting would be used chiefly for
> voting and intra-council discussion. And since the request to the
> council to vote on a concrete proposal was preceded by a
> multiple-week, if not month-long, mailing list discussion, I assumed
> that my presence in the council meeting was optional.
>
> Had I known that my presence was required, or that the absence in the
> meeting would be blamed on me afterward, I would have appeared if
> possible.

I'm not blaming you for anything. But you didn't speak in
#gentoo-council before the meeting (a few days before IIRC) when we
were discussing the problem, I pinged you during the meeting, and you
didn't appear there afterwards.

You also didn't seem to respond to the council decision (or
non-decision) in that meeting either, unless I've missed it.

It seems self-evident that discussion would happen in the meeting before
voting...? What am I misunderstanding?

We regularly discuss things before voting on them. Do you normally
observe council meetings? I don't think what we did in this instance
was at all unusual.

(Also: there's the issue of whether or not the council should really
be voting on overriding an eclass maintainer who would then be forced
to keep something working they don't want to. mgorny raised that.)

>
>
>> and questions remain unanswered on the
>> ML (why not implement a check in pkgcheck similar to what is in Portage,
>> for example)?
>
> On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for
> the total package-directory size. I only care a little about the tool
> that checks this limit, but pkgcheck is an obvious choice. I also
> suggested that we review this policy once the number of Go packages
> has doubled or two years after this policy was established (whatever
> comes first).
>
> But I fear you may be referring to another kind of check. You may be
> talking about a check that forbids EGO_SUM in ::gentoo but allows it
> overlays.

My position on this has been consistent: a check is needed to statically
determine when the environment size is too big. Copying the Portage
check into pkgcheck (in terms of the metrics) would satisfy this.

That is, regardless of raw size, I'm asking for a calculation based on
the contents of EGO_SUM where, if exceeded, the package will not be
installable on some systems. You didn't have an issue implementing this
for Portage and I've mentioned this a bunch of times since, so I thought
it was clear what I was hoping to see.

I would also like (which is not what I was referring to here) some
limit on the size, given that we already have a limit on the size of
${FILESDIR}, but this is less of a concern for me given it's bounded
by the aforementioned environment size check.

>
> Intelligibly, EGO_SUM can be considered ugly. Compared to a
> traditional Gentoo package, EGO_SUM-based ones are larger. The same is
> true for Rust packages. However, looking at the bigger picture,
> EGO_SUM's advantages outweigh its disadvantages.
>

Again, am on record as being fine with the general EGO_SUM approach,
even if I wish we didn't need it, as I see it as inevitable for things
like yarn, .NET, and of course Rust as we already have it.

Just ideally not huge ones, and certainly not huge ones which then
aren't even reliably installable because of environment size.



signature.asc
Description: PGP signature


[gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)

2023-06-30 Thread Florian Schmaus
[in reply to a gentoo-project@ post, but it was asked to continue this 
on gentoo-dev@]


On 28/06/2023 16.46, Sam James wrote:

Florian Schmaus  writes:

On 17/06/2023 10.37, Arthur Zamarin wrote:

I also want to nominate people who I feel contribute a lot to Gentoo and
I have a lot of interaction with (ordered by name, not priority):
[…]
flow


I apologize for the late reply, and thank you for the nomination. I am
honored and accept.

As many of you know, I am spending a lot of time on the EGO_SUM
situation, as it is one of the most critical issues to solve.

I have used the last few days to carefully consider whether a seat on
the council is more harmful or beneficial to my efforts regarding
EGO_SUM. On the one hand, council work means I have less time to
improve the EGO_SUM situation. On the other hand, a seat in the
council increases the probability of positively influencing Gentoo's
future, also regarding EGO_SUM.



That's fine and it's great to see more people running!


Excellent that we share this view. :)



But with regard to EGO_SUM: you didn't appear at the meeting where we discussed
your previous EGO_SUM proposal,


Naively, as I am, I expected that the mailing list would be used for 
discussion and that the council meeting would be used chiefly for voting 
and intra-council discussion. And since the request to the council to 
vote on a concrete proposal was preceded by a multiple-week, if not 
month-long, mailing list discussion, I assumed that my presence in the 
council meeting was optional.


Had I known that my presence was required, or that the absence in the 
meeting would be blamed on me afterward, I would have appeared if possible.




and questions remain unanswered on the
ML (why not implement a check in pkgcheck similar to what is in Portage,
for example)?


On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for 
the total package-directory size. I only care a little about the tool 
that checks this limit, but pkgcheck is an obvious choice. I also 
suggested that we review this policy once the number of Go packages has 
doubled or two years after this policy was established (whatever comes 
first).


But I fear you may be referring to another kind of check. You may be 
talking about a check that forbids EGO_SUM in ::gentoo but allows it 
overlays.


However, as stated before [2], this is not a viable approach. One reason 
why it is not practicable is auditability.




The blocker is not a council seat, it's about addressing people's
concerns...


Unfortunately, it appears that I am terrible at convincing everyone that 
the deprecation of EGO_SUM was a mistake. I tried to respond to every 
concern. Often, the response included arguments based on factual data. 
But eventually, I would only expect to convince some, as the EGO_SUM 
question touches the subjective realm of style.


I know that the EGO_SUM situation and the resulting discussion grew huge 
and left many understandably bored or confused, which then turned away. 
But that is a pity because it is a relevant discussion for Gentoo's 
long-term success.


The bottom line is that the EGO_SUM discussion yielded no evidence or 
even a slight indication that EGO_SUM was deprecated based on technical 
issues. Instead, it appears that EGO_SUM was deprecated because some 
deemed it unaesthetic.


Intelligibly, EGO_SUM can be considered ugly. Compared to a traditional 
Gentoo package, EGO_SUM-based ones are larger. The same is true for Rust 
packages. However, looking at the bigger picture, EGO_SUM's advantages 
outweigh its disadvantages.


- Flow


1: https://marc.info/?l=gentoo-dev=168546196902731 
<25308876-7ac4-8c90-8641-1034cc67c...@gentoo.org>
2: https://marc.info/?l=gentoo-dev=168569387514376 
<012fa74d-2910-ea90-6008-26cc23604...@gentoo.org>


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature