Re: [gentoo-dev] [Proposal] Split arch keywords for ppc64 & riscv

2024-08-05 Thread Andreas K. Huettel
Am Montag, 5. August 2024, 00:39:13 CEST schrieb Robin H. Johnson:

> > Step 2: Formally introduce the new keywords in ebuilds by duplication.
> > Any "ppc64" in keywords becomes "ppc64 ppc64le".
> > Any "riscv" becomes "riscv riscv32 riscv64".
> > No exceptions. Can be done automatically. Until the "lock" is removed,
> > any keywording operations always have to add and remove all of one set.
> How do we identify something that was labelled as ppc64 and was
> pre-split, vs something that is post-split, and ONLY supposed PPC64 big
> endian, and NOT ppc64le.

Different profile selection (already now, ppc64 and ppc64le are different
profile trees).

> Under this proposal, both of variants would have KEYWORDS="ppc64".
> 
> What if the ppc64 splits into ppc64be & ppc64le to be extremely clear?
> 
> ...
> > Step 8: Remove all riscv keywords (no 64 or 32)
> > 
> > Step 9: Remove riscv as arch.
> Remove ppc64 without le/be suffixes.

Possible but more work...

-- 
Andreas K. Hüttel
dilfri...@gentoo.org
Gentoo Linux developer
(council, toolchain, base-system, perl, libreoffice)

signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-project] Re: [gentoo-dev] [Proposal] Split arch keywords for ppc64 & riscv

2024-08-04 Thread Ulrich Mueller
> On Sun, 04 Aug 2024, Andreas K Huettel wrote:

> Step 4: Prepare and publish a migration guide for users.
> Right now I assume this will mostly mean "select new profile". However,
> I have no clue how portage reacts when $ARCH changes.

Presumably this will also require eselect to be updated (and stable):
https://gitweb.gentoo.org/proj/eselect.git/tree/libs/package-manager.bash.in#n70

Any good idea how to handle the transitional situation when old and new
arch keywords exist simultaneously?

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] [Proposal] Split arch keywords for ppc64 & riscv

2024-08-04 Thread Robin H. Johnson
On Sun, Aug 04, 2024 at 08:30:57PM +0200, Andreas K. Huettel wrote:
> Hi Arthur, 
> 
> > a. Splitting ppc64 keyword into ppc64 and ppc64le
> 
> > b. Splitting riscv keyword into riscv(64?) and riscv32
...
> Step 2: Formally introduce the new keywords in ebuilds by duplication.
> Any "ppc64" in keywords becomes "ppc64 ppc64le".
> Any "riscv" becomes "riscv riscv32 riscv64".
> No exceptions. Can be done automatically. Until the "lock" is removed,
> any keywording operations always have to add and remove all of one set.
How do we identify something that was labelled as ppc64 and was
pre-split, vs something that is post-split, and ONLY supposed PPC64 big
endian, and NOT ppc64le.

Under this proposal, both of variants would have KEYWORDS="ppc64".

What if the ppc64 splits into ppc64be & ppc64le to be extremely clear?

...
> Step 8: Remove all riscv keywords (no 64 or 32)
> 
> Step 9: Remove riscv as arch.
Remove ppc64 without le/be suffixes.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-dev] [Proposal] Split arch keywords for ppc64 & riscv

2024-08-04 Thread Andreas K. Huettel
Hi Arthur, 

> a. Splitting ppc64 keyword into ppc64 and ppc64le

> b. Splitting riscv keyword into riscv(64?) and riscv32

So in principle these steps both make sense.

The problem is mostly that such an operation on the living Gentoo
has not been attempted in recorded history. There is no precedent in
terms of steps or procedure. Also, it's work.

Which means, we really need to think out the details first and test.

In the following I'm brainstorming a bit, but please see this only as
a very first write-down of incoherent firing of neurons...
In particular, I've not put any thought into whether the tree state is
always formally correct (PMS / CI / ...)

Step 1: Formally introduce the new keywords as "arches".

Step 2: Formally introduce the new keywords in ebuilds by duplication.
Any "ppc64" in keywords becomes "ppc64 ppc64le".
Any "riscv" becomes "riscv riscv32 riscv64".
No exceptions. Can be done automatically. Until the "lock" is removed,
any keywording operations always have to add and remove all of one set.

Step 3: Make new profiles for the new keywords. This is mostly copy-paste, 
I can take care of it.

Step 4: Prepare and publish a migration guide for users.
Right now I assume this will mostly mean "select new profile". However,
I have no clue how portage reacts when $ARCH changes.

Step 5: Deprecate the old profiles, and give people a deadline for migration.
I.e. the LE profiles under ppc64, and all profiles under riscv

Step 6: Remove the old profiles.

Step 7: Lift the "lock" in ebuilds, meaning e.g. ppc64 and ppc64le can be
added and removed independently.

Step 8: Remove all riscv keywords (no 64 or 32)

Step 9: Remove riscv as arch.



-- 
Andreas K. Hüttel
dilfri...@gentoo.org
Gentoo Linux developer
(council, toolchain, base-system, perl, libreoffice)

signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] [Proposal] Split arch keywords for ppc64 & riscv

2024-08-02 Thread matoro

On 2024-08-02 15:05, Arthur Zamarin wrote:

Hi all

As continuation from previous arch changes and arch status [1], I want
to propose the next arch change for the near council meeting:

a. Splitting ppc64 keyword into ppc64 and ppc64le

Currently the ppc64 arch keyword matches both big endian (ppc64ul) and
little endian (ppc64le). While there are similarities, there is quite a
big gap in support level across both of them. If I understand the
history correctly, ppc64le is the "next gen" after ppc64ul, and it is
seen across upstream support, and as a result in the masks.

We have many masks on the ppc64 profile, which are there for ppc64ul,
and then unmasks for ppc64le. This split of keywords should make it
easier for ppc64 maintainers (since less ugliness in profiles), package
maintainers (simpler to mark ppc64le only), and for ppc64 users (easier
to request keyword for only one side, so no need to handle issues on the
other "arch").

I want both arches to be of same state (stable arches, with profiles
remaining at current state).


b. Splitting riscv keyword into riscv and riscv32

I'm not part of the riscv arch team, but I understood from dilfridge
that riscv64 and riscv32 are very different, and having both behind the
same keyword creates various issues. Since I already propose spliting
ppc64, we can also split riscv on the same wave.


[1]
https://public-inbox.gentoo.org/gentoo-dev/75654daa-c5fc-45c8-a104-fae43b9ca...@gentoo.org/T/


Agreed here, with the suggestion that riscv -> riscv64/riscv32 for 
consistency.




[gentoo-dev] [Proposal] Split arch keywords for ppc64 & riscv

2024-08-02 Thread Arthur Zamarin
Hi all

As continuation from previous arch changes and arch status [1], I want
to propose the next arch change for the near council meeting:

a. Splitting ppc64 keyword into ppc64 and ppc64le

Currently the ppc64 arch keyword matches both big endian (ppc64ul) and
little endian (ppc64le). While there are similarities, there is quite a
big gap in support level across both of them. If I understand the
history correctly, ppc64le is the "next gen" after ppc64ul, and it is
seen across upstream support, and as a result in the masks.

We have many masks on the ppc64 profile, which are there for ppc64ul,
and then unmasks for ppc64le. This split of keywords should make it
easier for ppc64 maintainers (since less ugliness in profiles), package
maintainers (simpler to mark ppc64le only), and for ppc64 users (easier
to request keyword for only one side, so no need to handle issues on the
other "arch").

I want both arches to be of same state (stable arches, with profiles
remaining at current state).


b. Splitting riscv keyword into riscv and riscv32

I'm not part of the riscv arch team, but I understood from dilfridge
that riscv64 and riscv32 are very different, and having both behind the
same keyword creates various issues. Since I already propose spliting
ppc64, we can also split riscv on the same wave.


[1]
https://public-inbox.gentoo.org/gentoo-dev/75654daa-c5fc-45c8-a104-fae43b9ca...@gentoo.org/T/

-- 
Arthur Zamarin
arthur...@gentoo.org
Gentoo Linux developer (Council, Python, pkgcore stack, QA, Arch Teams)


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-24 Thread Michael Orlitzky
On Sun, 2023-09-24 at 23:14 +0530, Siddhanth Rathod wrote:
> How does modifying the DTD with a git hook sound ?

That could work if we put the DTD, XML schema, and RELAX NG schema all
in the repo metadata. The remaining projects are programs and (given
access to ::gentoo) can probably parse the list themselves.

We're all sufficiently clever here to imagine some solution; it just
occurred to me that without a concrete proposal, it's hard to say
whether the end result would actually be simpler than copy/paste.




Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-24 Thread Siddhanth Rathod

How does modifying the DTD with a git hook sound ?

On 9/23/23 20:17, Michael Orlitzky wrote:

On Sat, 2023-09-23 at 15:39 +0100, Sam James wrote:

At the moment, we bundle the DTD in pkgcore. If we just shoved it in
metadata/ instead in the main repo, we don't have that kind of problem.


I might be missing something obvious, but what I mean is, suppose we
have this plain-text mapping of remote-id names to URLs. How do we get
the list of keys (valid remote-id names) into the DTD? Even if both
files are inside metadata/, there's another step that needs to happen.



Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-23 Thread Ionen Wolkens
On Sat, Sep 23, 2023 at 03:39:32PM +0100, Sam James wrote:
> 
> Michael Orlitzky  writes:
> 
> > On Sat, 2023-09-23 at 00:10 +0530, Siddhanth Rathod wrote:
> >> 
> >> By establishing a universal remote-ID file, we can streamline this 
> >> process. Your thoughts and feedback on this proposal would be greatly 
> >> appreciated.Also, Any preferences on format?
> >
> > Building the wiki page isn't too hard, but what's the plan to propagate
> > changes into those seven other repositories? If we're still
> > copy/pasting the output of some tool, then we haven't really saved a
> > step, we've only changed what we're copy/pasting.
> 
> At the moment, we bundle the DTD in pkgcore. If we just shoved it in
> metadata/ instead in the main repo, we don't have that kind of problem.

Likewise for iwdevtools which I'd update for this myself, loading from
::gentoo would be pretty trivial beside a little bit overhead to find
the repo.

At worst does mean that remote-ids won't resolve if it didn't manage
the repo (it's optional) but that's entirely fine for that tool and
beats updating these manually + making a release almost just for that
constantly (or fetching files at runtime).

If this was stored elsewhere (e.g. api.git), then wouldn't gain much.
-- 
ionen


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-23 Thread Michael Orlitzky
On Sat, 2023-09-23 at 15:39 +0100, Sam James wrote:
> 
> At the moment, we bundle the DTD in pkgcore. If we just shoved it in
> metadata/ instead in the main repo, we don't have that kind of problem.
> 

I might be missing something obvious, but what I mean is, suppose we
have this plain-text mapping of remote-id names to URLs. How do we get
the list of keys (valid remote-id names) into the DTD? Even if both
files are inside metadata/, there's another step that needs to happen.




Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-23 Thread Sam James


Michael Orlitzky  writes:

> On Sat, 2023-09-23 at 00:10 +0530, Siddhanth Rathod wrote:
>> 
>> By establishing a universal remote-ID file, we can streamline this 
>> process. Your thoughts and feedback on this proposal would be greatly 
>> appreciated.Also, Any preferences on format?
>
> Building the wiki page isn't too hard, but what's the plan to propagate
> changes into those seven other repositories? If we're still
> copy/pasting the output of some tool, then we haven't really saved a
> step, we've only changed what we're copy/pasting.

At the moment, we bundle the DTD in pkgcore. If we just shoved it in
metadata/ instead in the main repo, we don't have that kind of problem.



Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-23 Thread Michael Orlitzky
On Sat, 2023-09-23 at 00:10 +0530, Siddhanth Rathod wrote:
> 
> By establishing a universal remote-ID file, we can streamline this 
> process. Your thoughts and feedback on this proposal would be greatly 
> appreciated.Also, Any preferences on format?

Building the wiki page isn't too hard, but what's the plan to propagate
changes into those seven other repositories? If we're still
copy/pasting the output of some tool, then we haven't really saved a
step, we've only changed what we're copy/pasting.




Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-23 Thread Sam James

Ulrich Mueller  writes:

> [[PGP Signed Part:Undecided]]
>> On Fri, 22 Sep 2023, Siddhanth Rathod wrote:
>
>> I'm writing to propose the creation of a universal remote-ID file
>> within the api.git or gentoo.git in the metadata/ directory.
>> Currently, we have eight different locations that require manual
>> updates for any future changes, including my recent commit
>> (https://gitweb.gentoo.org/proj/gentoolkit.git/commit/?id=5146d35eb97e2c1a8f7691e59c755ed14e858dd4)
>> to gentoolkit and the rest seven as mentioned here
>> https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Upstream_remote-id_types.
>
>> By establishing a universal remote-ID file, we can streamline this
>> process. Your thoughts and feedback on this proposal would be greatly
>> appreciated.Also, Any preferences on format?
>
> My preference would be a simple text file with a table, similar to
> files/uid-gid.txt in api.git. Then we could just modify the existing
> tooling to generate the wiki page form it, and wouldn't need any special
> tools to create the other files.

Sounds ok.

>
> Alternatively, it could be in XML. While I'm not a large fan of XML, it
> seems a natural choice here, because metadata.xml, the DTD, and the XML
> and Relax-NG schemas are all from the XML world.
>
> Ulrich




signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-23 Thread Ulrich Mueller
> On Fri, 22 Sep 2023, Siddhanth Rathod wrote:

> I'm writing to propose the creation of a universal remote-ID file
> within the api.git or gentoo.git in the metadata/ directory.
> Currently, we have eight different locations that require manual
> updates for any future changes, including my recent commit
> (https://gitweb.gentoo.org/proj/gentoolkit.git/commit/?id=5146d35eb97e2c1a8f7691e59c755ed14e858dd4)
> to gentoolkit and the rest seven as mentioned here
> https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Upstream_remote-id_types.

> By establishing a universal remote-ID file, we can streamline this
> process. Your thoughts and feedback on this proposal would be greatly
> appreciated.Also, Any preferences on format?

My preference would be a simple text file with a table, similar to
files/uid-gid.txt in api.git. Then we could just modify the existing
tooling to generate the wiki page form it, and wouldn't need any special
tools to create the other files.

Alternatively, it could be in XML. While I'm not a large fan of XML, it
seems a natural choice here, because metadata.xml, the DTD, and the XML
and Relax-NG schemas are all from the XML world.

Ulrich


signature.asc
Description: PGP signature


[gentoo-dev] Proposal for a Universal Remote-ID File

2023-09-22 Thread Siddhanth Rathod
I'm writing to propose the creation of a universal remote-ID file within 
the api.git or gentoo.git in the metadata/ directory. Currently, we have 
eight different locations that require manual updates for any future 
changes, including my recent commit 
(https://gitweb.gentoo.org/proj/gentoolkit.git/commit/?id=5146d35eb97e2c1a8f7691e59c755ed14e858dd4) 
to gentoolkit and the rest seven as mentioned here 
https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Upstream_remote-id_types.


By establishing a universal remote-ID file, we can streamline this 
process. Your thoughts and feedback on this proposal would be greatly 
appreciated.Also, Any preferences on format?


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-10-12 Thread Florian Schmaus

On 17/06/2022 18.27, William Hubbs wrote:

On Mon, Jun 13, 2022 at 12:26:43PM +0200, Ulrich Mueller wrote:

On Mon, 13 Jun 2022, Florian Schmaus wrote:



Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.

1: 
https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa



Can this be done without requesting changes to package managers?



What is 'this' here?


Undeprecating EGO_SUM.


The patchset does not make changes to any package manager, just the
go-module eclass.



Note that this is not about finding about an alternative to dependency
tarballs. It is just about re-allowing EGO_SUM in addition to
dependency tarballs for packaging Go software in Gentoo.


Like I said on my earlier reply, there have been packages that break
using EGO_SUM.


Those packages can't obviously use EGO_SUM, but this should *not* mean 
that we generally ban EGO_SUM.




The most pressing concern about EGO_SUM is that it can make portage
crash because of the size of SRC_URI, so it definitely should not be
preferred over dependency tarballs.


I think an approach like my posted patch, which makes go-modules.eclass 
invoke 'die' if A exceeds a certain threshold, should make developers in 
most situations aware that it is time to switch their package to use a 
dependency tarball instead of EGO_SUM.


The remaining situations are the ones where a package initially exceeds 
the MAX_ARG_STRLEN limit, and where a certain USE-flag combination 
causes the limit to be exceeded. The former should not be real issue, as 
such ebuilds should never been committed, as they could never work. The 
later can be solved by exhaustive testing of all possible USE flag 
combinations.


- Flow



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-10-01 Thread William Hubbs
On Sat, Oct 01, 2022 at 07:21:13PM +0200, Florian Schmaus wrote:
> On 01/10/2022 18.36, Ulrich Mueller wrote:
> >> On Sat, 01 Oct 2022, Florian Schmaus wrote:
> > 
> >> Bug #719201 was triggered by dev-texlive/texlive-latexextra-2000. It
> >> appears that the ebuild had more than 6000 entries in SRC_URI [1],
> > 
> > That includes double counting and must be divided by the number of
> > developers in TEXLIVE_DEVS. AFAICS that number was two in 2020. So 3000
> > is more realistic as a number there.
> 
> That may be very well the case. I'd appreciate if you would elaborate on 
> the double counting. If someone knows a good and easy way to compute A 
> for an ebuild, then please let me know. That would help to get more 
> meaningful data.
> 
> 
> >> from which A is generated from. Hence even a EGO_SUM limit of 3000
> >> entries should provide enough safety margin to avoid any Golang ebuild
> >> running into this.
> > 
> > See above, with 3000 entries there may be zero safety margin. It also
> > depends on total filename length, because the limit is the Linux
> > kernel's MAX_ARG_STRLEN (which is 128 KiB).
> 
> Of course, this is a rough estimation assuming that the filename length 
> is roughly the same on average. That said, my proposed limit for EGO_SUM 
> is 1500, which is still half of 3000 and should still provide enough 
> safety margin.

Since EGO_SUM_SRC_URI is the variable that gets added to SRC_URI, I
would rather put the limitation there instead of EGO_SUM if we do end up
keeping this.

William



signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-10-01 Thread Florian Schmaus

On 01/10/2022 18.36, Ulrich Mueller wrote:

On Sat, 01 Oct 2022, Florian Schmaus wrote:



Bug #719201 was triggered by dev-texlive/texlive-latexextra-2000. It
appears that the ebuild had more than 6000 entries in SRC_URI [1],


That includes double counting and must be divided by the number of
developers in TEXLIVE_DEVS. AFAICS that number was two in 2020. So 3000
is more realistic as a number there.


That may be very well the case. I'd appreciate if you would elaborate on 
the double counting. If someone knows a good and easy way to compute A 
for an ebuild, then please let me know. That would help to get more 
meaningful data.




from which A is generated from. Hence even a EGO_SUM limit of 3000
entries should provide enough safety margin to avoid any Golang ebuild
running into this.


See above, with 3000 entries there may be zero safety margin. It also
depends on total filename length, because the limit is the Linux
kernel's MAX_ARG_STRLEN (which is 128 KiB).


Of course, this is a rough estimation assuming that the filename length 
is roughly the same on average. That said, my proposed limit for EGO_SUM 
is 1500, which is still half of 3000 and should still provide enough 
safety margin.


- Flow



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-10-01 Thread Ulrich Mueller
> On Sat, 01 Oct 2022, Florian Schmaus wrote:

> Bug #719201 was triggered by dev-texlive/texlive-latexextra-2000. It
> appears that the ebuild had more than 6000 entries in SRC_URI [1],

That includes double counting and must be divided by the number of
developers in TEXLIVE_DEVS. AFAICS that number was two in 2020. So 3000
is more realistic as a number there.

> from which A is generated from. Hence even a EGO_SUM limit of 3000
> entries should provide enough safety margin to avoid any Golang ebuild
> running into this.

See above, with 3000 entries there may be zero safety margin. It also
depends on total filename length, because the limit is the Linux
kernel's MAX_ARG_STRLEN (which is 128 KiB).

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-10-01 Thread Florian Schmaus

On 30/09/2022 21.49, Alec Warner wrote:

On Fri, Sep 30, 2022 at 7:53 AM Florian Schmaus  wrote:

And quite frankly, I don't see a problem with "large" Manifests and/or
ebuilds. Yes, it means our FTPs are hosting many files, in some cases
even many small files. And yes, it means that in some cases ebuild
parsing takes a bit longer. But I spoke with a few developers in the
past few months and was not presented with any real world issues that
EGO_SUM caused. If someone wants to fill in here, then now is a good
time to speak up. But my impression is that the arguments against
EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am
wrong.


I thought the problem was that EGO_SUM ends up in SRC_URI, which ends
up in A. A ends up in the environment, and then exec() fails with
E2BIG because there is an imposed limit on environment variables (and
also command line argument length.)

Did this get fixed?

https://bugs.gentoo.org/719202


Bug #719201 was triggered by dev-texlive/texlive-latexextra-2000. It 
appears that the ebuild had more than 6000 entries in SRC_URI [1], from 
which A is generated from. Hence even a EGO_SUM limit of 3000 entries 
should provide enough safety margin to avoid any Golang ebuild running 
into this.


- Flow


1: Estimated via
curl 
https://raw.githubusercontent.com/gentoo-mirror/gentoo/39474128bc64d6d4738c9647dbd3b0d1c1268fc4/metadata/md5-cache/dev-texlive/texlive-latexextra-2020 
| grep SRC_URI | awk -F" " '{print NF-1}'


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread William Hubbs
On Fri, Sep 30, 2022 at 12:49:02PM -0700, Alec Warner wrote:
> On Fri, Sep 30, 2022 at 7:53 AM Florian Schmaus  wrote:
> >
> > On 30/09/2022 02.36, William Hubbs wrote:
> > > On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
> > >>> On Wed, 28 Sep 2022, Florian Schmaus wrote:
> > >>> 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
> > >>> maintains the package
> > >>> 3.) the number of EGO_SUM entries exceeds 1500 and a proxied
> > >>> maintainer maintains the package
> > >>
> > >> These numbers seem quite large, compared to the mean number of 3.4
> > >> distfiles for packages in the Gentoo repository. (The median and the
> > >> 99-percentile are 1 and 22, respectively.)
> >
> > The numbers may appear large when compared to the whole tree, but I
> > think a fair comparison would be within the related programming language
> > ecosystem, e.g., Golang or Rust.
> >
> > For example, analyzing ::gentoo yields the following histogram for
> > 2022-01-01:
> > https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
> >
> >
> > > To stay with your example, restic has a 300k manifest, multiple 30k+
> > > ebuilds and897 distfiles.
> > >
> > > I'm thinking the limit would have to be much lower. Say, around 256
> > > entries in EGO_SUM_SRC_URI.
> >
> > A limit of 256 appears to be to low to be of any use. It is slightly
> > above the 50th percentile, half of the packages could not use it.
> >
> > We have to realize that programming language ecosystems that only build
> > static binaries tend to produce software projects that have a large
> > number of dependencies. For example, app-misc/broot, a tool written in
> > Rust, has currently 310 entries in its Manifest. Why should we threat
> > one programming language different from another? Will be see voices that
> > ask for banning Rust packages in ::gentoo in the future? With the rising
> > popularity of Golang and Rust, we will (hopefully) only ever see an
> > increase of such packages in ::gentoo. And most existing packages in
> > this category will at best keep their dependency count constant, but are
> > also likely to accumulate further dependencies over time.
> >
> > And quite frankly, I don't see a problem with "large" Manifests and/or
> > ebuilds. Yes, it means our FTPs are hosting many files, in some cases
> > even many small files. And yes, it means that in some cases ebuild
> > parsing takes a bit longer. But I spoke with a few developers in the
> > past few months and was not presented with any real world issues that
> > EGO_SUM caused. If someone wants to fill in here, then now is a good
> > time to speak up. But my impression is that the arguments against
> > EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am
> > wrong.
> 
> I thought the problem was that EGO_SUM ends up in SRC_URI, which ends
> up in A. A ends up in the environment, and then exec() fails with
> E2BIG because there is an imposed limit on environment variables (and
> also command line argument length.)
> 
> Did this get fixed?
> 
> https://bugs.gentoo.org/719202

You are correct this was part of the issue as well. I don't know what
the status of this bug is.

William


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread William Hubbs
On Fri, Sep 30, 2022 at 10:07:44PM +0200, Arsen Arsenović wrote:
> Hey,
> 
> On Friday, 30 September 2022 02:36:05 CEST William Hubbs wrote:
> > I don't know for certain about a vendor tarball, but I do know there
> > are instances where a vendor tarball wouldn't work.
> > app-containers/containerd is a good example of this, That is why the
> > vendor tarball idea was dropped.
> It is indeed not possible to verify vendor tarballs[1].  The proposed 
> solution Go people had would also require network access.
> 
> > Upstream doesn't need to provide a tarball, just an up-to-date
> > "vendor" directory at the top level of the project. Two examples that
> > do this are docker and kubernetes.
> Upstreams doing this sounds like a mess, because then they'd have to 
> maintain multiple source trees in their repositories, if I understand 
> what you mean.

Well, there isn't a lot of work involved in this for upstream, they just
run:

$ go mod vendor

at the top level of their project and keep that directory in sync in
their vcs. The down side is it can be big and some upstreams do not want
to do it.

> 
> An alternative to vendor tarballs is modcache tarballs. These are 
> absolutely massive (~20 times larger IIRC), though, they are verifiable.

The modcache tarballs are what I'm calling dependency tarballs, and yes
they are bigger than vendor tarballs and verifiable.
Also, the go-module eclass sets the GOMODCACHE environment variable to
point to the directory where the contents of the dependency tarball ends
up which makes it easy for the go tooling to just use the information in
that directory.

If we can get bug https://bugs.gentoo.org/833567 to happen in eapi 9,
that would solve all of this.

The next step after I got that to happen would be to put a shared go
module cache in, for example, "${DISTDIR}/go-mod", so that all go
modules from packages would be downloaded there, and they would be
consumed like all distfiles are.

> opinion: I see no way around it. Vendor tarballs are the way to go.  For 
> trivial cases, this can likely be EGO_SUM, but it scales exceedingly 
> poorly, to the point of the trivial case being a very small percentage 
> of Go packages.  I proposed authenticated automation on Gentoo 
> infrastructure as a solution to this, and implemented (a slow and 
> unreliable) proof of concept (posted previously).  The obvious question 
> of "how will proxy maintainers deal with this" is also relatively 
> simple: giving them authorization for a subset of packages that they'd 
> need to work on. This is an obvious increase in the barrier of entry for 
> fresh proxy maintainers, but it's still likely less than needing 
> maintainers to rework ebuilds to use vendor tarballs on dev.g.o.

Vendor tarballs are not complete.  The best example of this I see in the tree is
app-containers/containerd.  If you try to build that with a vendor tarball
instead of a dependency tarball, the build will break, but it works with
a dependency tarball.

William


> 
> 
> [1]: https://github.com/golang/go/issues/27348
> -- 
> Arsen Arsenović




signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Arsen Arsenović
Hey,

On Friday, 30 September 2022 02:36:05 CEST William Hubbs wrote:
> I don't know for certain about a vendor tarball, but I do know there
> are instances where a vendor tarball wouldn't work.
> app-containers/containerd is a good example of this, That is why the
> vendor tarball idea was dropped.
It is indeed not possible to verify vendor tarballs[1].  The proposed 
solution Go people had would also require network access.

> Upstream doesn't need to provide a tarball, just an up-to-date
> "vendor" directory at the top level of the project. Two examples that
> do this are docker and kubernetes.
Upstreams doing this sounds like a mess, because then they'd have to 
maintain multiple source trees in their repositories, if I understand 
what you mean.

An alternative to vendor tarballs is modcache tarballs. These are 
absolutely massive (~20 times larger IIRC), though, they are verifiable.

opinion: I see no way around it. Vendor tarballs are the way to go.  For 
trivial cases, this can likely be EGO_SUM, but it scales exceedingly 
poorly, to the point of the trivial case being a very small percentage 
of Go packages.  I proposed authenticated automation on Gentoo 
infrastructure as a solution to this, and implemented (a slow and 
unreliable) proof of concept (posted previously).  The obvious question 
of "how will proxy maintainers deal with this" is also relatively 
simple: giving them authorization for a subset of packages that they'd 
need to work on. This is an obvious increase in the barrier of entry for 
fresh proxy maintainers, but it's still likely less than needing 
maintainers to rework ebuilds to use vendor tarballs on dev.g.o.


[1]: https://github.com/golang/go/issues/27348
-- 
Arsen Arsenović


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Alec Warner
On Fri, Sep 30, 2022 at 7:53 AM Florian Schmaus  wrote:
>
> On 30/09/2022 02.36, William Hubbs wrote:
> > On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
> >>> On Wed, 28 Sep 2022, Florian Schmaus wrote:
> >>> 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
> >>> maintains the package
> >>> 3.) the number of EGO_SUM entries exceeds 1500 and a proxied
> >>> maintainer maintains the package
> >>
> >> These numbers seem quite large, compared to the mean number of 3.4
> >> distfiles for packages in the Gentoo repository. (The median and the
> >> 99-percentile are 1 and 22, respectively.)
>
> The numbers may appear large when compared to the whole tree, but I
> think a fair comparison would be within the related programming language
> ecosystem, e.g., Golang or Rust.
>
> For example, analyzing ::gentoo yields the following histogram for
> 2022-01-01:
> https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
>
>
> > To stay with your example, restic has a 300k manifest, multiple 30k+
> > ebuilds and897 distfiles.
> >
> > I'm thinking the limit would have to be much lower. Say, around 256
> > entries in EGO_SUM_SRC_URI.
>
> A limit of 256 appears to be to low to be of any use. It is slightly
> above the 50th percentile, half of the packages could not use it.
>
> We have to realize that programming language ecosystems that only build
> static binaries tend to produce software projects that have a large
> number of dependencies. For example, app-misc/broot, a tool written in
> Rust, has currently 310 entries in its Manifest. Why should we threat
> one programming language different from another? Will be see voices that
> ask for banning Rust packages in ::gentoo in the future? With the rising
> popularity of Golang and Rust, we will (hopefully) only ever see an
> increase of such packages in ::gentoo. And most existing packages in
> this category will at best keep their dependency count constant, but are
> also likely to accumulate further dependencies over time.
>
> And quite frankly, I don't see a problem with "large" Manifests and/or
> ebuilds. Yes, it means our FTPs are hosting many files, in some cases
> even many small files. And yes, it means that in some cases ebuild
> parsing takes a bit longer. But I spoke with a few developers in the
> past few months and was not presented with any real world issues that
> EGO_SUM caused. If someone wants to fill in here, then now is a good
> time to speak up. But my impression is that the arguments against
> EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am
> wrong.

I thought the problem was that EGO_SUM ends up in SRC_URI, which ends
up in A. A ends up in the environment, and then exec() fails with
E2BIG because there is an imposed limit on environment variables (and
also command line argument length.)

Did this get fixed?

https://bugs.gentoo.org/719202

>
> - Flow



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Sam James


> On 30 Sep 2022, at 15:53, Florian Schmaus  wrote:
> 
> On 30/09/2022 02.36, William Hubbs wrote:
>> On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
 On Wed, 28 Sep 2022, Florian Schmaus wrote:
 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
 maintains the package
 3.) the number of EGO_SUM entries exceeds 1500 and a proxied
 maintainer maintains the package
>>> 
>>> These numbers seem quite large, compared to the mean number of 3.4
>>> distfiles for packages in the Gentoo repository. (The median and the
>>> 99-percentile are 1 and 22, respectively.)
> 
> The numbers may appear large when compared to the whole tree, but I think a 
> fair comparison would be within the related programming language ecosystem, 
> e.g., Golang or Rust.
> 
> For example, analyzing ::gentoo yields the following histogram for 2022-01-01:
> https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
> 
> 
>> To stay with your example, restic has a 300k manifest, multiple 30k+
>> ebuilds and897 distfiles.
>> I'm thinking the limit would have to be much lower. Say, around 256
>> entries in EGO_SUM_SRC_URI.
> 
> A limit of 256 appears to be to low to be of any use. It is slightly above 
> the 50th percentile, half of the packages could not use it.
> 
> We have to realize that programming language ecosystems that only build 
> static binaries tend to produce software projects that have a large number of 
> dependencies. For example, app-misc/broot, a tool written in Rust, has 
> currently 310 entries in its Manifest. Why should we threat one programming 
> language different from another? Will be see voices that ask for banning Rust 
> packages in ::gentoo in the future? With the rising popularity of Golang and 
> Rust, we will (hopefully) only ever see an increase of such packages in 
> ::gentoo. And most existing packages in this category will at best keep their 
> dependency count constant, but are also likely to accumulate further 
> dependencies over time.
> 
> And quite frankly, I don't see a problem with "large" Manifests and/or 
> ebuilds. Yes, it means our FTPs are hosting many files, in some cases even 
> many small files. And yes, it means that in some cases ebuild parsing takes a 
> bit longer. But I spoke with a few developers in the past few months and was 
> not presented with any real world issues that EGO_SUM caused. If someone 
> wants to fill in here, then now is a good time to speak up. But my impression 
> is that the arguments against EGO_SUM are mostly of cosmetic nature. Again, 
> please correct me if I am wrong.
> 

I need to re-read the whole set of new messages in this thread, but there's 
still the issue of xargs/command length limits from huge variable contents.

Best,
sam


signature.asc
Description: Message signed with OpenPGP


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Georgy Yakovlev
On Wed, 2022-09-28 at 17:28 +0200, Florian Schmaus wrote:
> > I would like to continue discussing whether we should entirely >
> > deprecate 
> > EGO_SUM without the desire to offend anyone.
> > 
> > We now have a pending GitHub PR that bumps restic to 0.14 [1].
> > Restic > is 
> > a very popular backup software written in Go. The PR drops EGO_SUM
> > in
> > favor of a vendor tarball created by the proxied maintainer.
> > However, > I 
> > am unaware of any tool that lets you practically audit the 35 MiB >
> > source 
> > contained in the tarball. And even if such a tool exists, this
> > would 
> > mean another manual step is required, which is, potentially,
> > skipped 
> > most of the time, weakening our user's security. This is because I 
> > believe neither our tooling, e.g., go-mod.eclass, nor any Golang 
> > tooling, does authenticate the contents of the vendor tarball
> > against
> > upstream's go.sum. But please correct me if I am wrong.
> > 
> > I wonder if we can reach consensus around un-depreacting EGO_SUM,
> > but
> > discouraging its usage in certain situations. That is, provide >
> > EGO_SUM 
> > as option but disallow its use if
> > 1.) *upstream* provides a vendor tarball
> > 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo
> > developer
> > maintains the package
> > 3.) the number of EGO_SUM entries exceeds 1500 and a proxied >
> > maintainer 
> > maintains the package
> > 
> > In case of 3, I would encourage proxy maintainers to create and >
> > provide 
> > the vendor tarball.
> > 
> > The suggested EGO_SUM limits result from a histogram that I created
> > analyzing ::gentoo at 2022-01-01, i.e., a few months before EGO_SUM
> > > was 
> > deprecated.

I think those numbers are too large but overall I think bringing back
EGO_SUM in limited form is a good move, because it allows packaging go
ebuilds in an easy and audit-able way.
If you have vendor tarball - it's completely opaque before you unpack.
With EGO_SUM you could parse ebuilds using that and scan for vulnerable
go modules. and ofc vendored source hosting is a problem

>From rust's team perspective ( we use CRATES, which is EGO_SUM
inspiration, but _much_ more compact one) - I'd say take largest rust
ebuild and allow as much as that or slightly more.
x11-terms/alacritty is one of largest and CRATES number of lines is
about 210 per 1 ebuild.

So I'd say set maximum EGO_SUM size to 256 for ::gentoo, or maybe 512,
remove limit for overlays completely. and introduce a hard die() in
eclass if EGO_SUM is larger than that.
not sure if you can detect repo name in eclass.
In that case pkgcheck and CI could enforce that as fat warnings or
errors.

256/512 limitation will not impose limit on manifest directly, but if
you have
5 versions of max 256/512 EGO_SUM loc - it'll be more reasonable than
5 versions of max 1500 EGO_SUM loc.

rust/cargo ebuild will still produce more compact Manifest given same
amount of lines though, so it's still not directly comparable.

currently we have 3 versions of alacritty which uses 407 unique crates
across 3 versions. Manifest size is about 120K, which is 20th largest
in ::gentoo
It's nothing compared to 2.5MB manifests we used to have in some of the
largest go packages.

> > 
> > - Flow
> > 
> > 1: https://github.com/gentoo/gentoo/pull/27050
> > 





Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread William Hubbs
On Fri, Sep 30, 2022 at 04:53:39PM +0200, Florian Schmaus wrote:
> On 30/09/2022 02.36, William Hubbs wrote:
> > On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
> >>> On Wed, 28 Sep 2022, Florian Schmaus wrote:
> >>> 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
> >>> maintains the package
> >>> 3.) the number of EGO_SUM entries exceeds 1500 and a proxied
> >>> maintainer maintains the package
> >>
> >> These numbers seem quite large, compared to the mean number of 3.4
> >> distfiles for packages in the Gentoo repository. (The median and the
> >> 99-percentile are 1 and 22, respectively.)
> 
> The numbers may appear large when compared to the whole tree, but I 
> think a fair comparison would be within the related programming language 
> ecosystem, e.g., Golang or Rust.
> 
> For example, analyzing ::gentoo yields the following histogram for 
> 2022-01-01:
> https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
> 
> 
> > To stay with your example, restic has a 300k manifest, multiple 30k+
> > ebuilds and897 distfiles.
> > 
> > I'm thinking the limit would have to be much lower. Say, around 256
> > entries in EGO_SUM_SRC_URI.
> 
> A limit of 256 appears to be to low to be of any use. It is slightly 
> above the 50th percentile, half of the packages could not use it.
> 
> We have to realize that programming language ecosystems that only build 
> static binaries tend to produce software projects that have a large 
> number of dependencies. For example, app-misc/broot, a tool written in 
> Rust, has currently 310 entries in its Manifest. Why should we threat 
> one programming language different from another? Will be see voices that 
> ask for banning Rust packages in ::gentoo in the future? With the rising 
> popularity of Golang and Rust, we will (hopefully) only ever see an 
> increase of such packages in ::gentoo. And most existing packages in 
> this category will at best keep their dependency count constant, but are 
> also likely to accumulate further dependencies over time.

I tend to agree with you honestly. I worked with Zac to come up with a
different proposal which would allow upstream tooling for all languages
that do this to work, but so far it is meeting resistance [1].
I will go back and add more information to that bug, but it will be later
today before I can do that. I want to develop a poc to answer the
statement that these would be live ebuilds if we allowed that.

> And quite frankly, I don't see a problem with "large" Manifests and/or 
> ebuilds. Yes, it means our FTPs are hosting many files, in some cases 
> even many small files. And yes, it means that in some cases ebuild 
> parsing takes a bit longer. But I spoke with a few developers in the 
> past few months and was not presented with any real world issues that 
> EGO_SUM caused. If someone wants to fill in here, then now is a good 
> time to speak up. But my impression is that the arguments against 
> EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am 
> wrong.

I can't name any specific examples at the moment, but I have gotten some
complaints about how long it takes to download and build go
packages with hundreds of dependencies.

Other than that, I'm not the one who voiced the problem originally, so
we definitely need others to speak up.

William

[1] https://bugs.gentoo.org/833567


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Zoltan Puskas
Hi,

When the size of the repo is considered too big maybe we can revisit the option
of having the portage tree distributed as a compressed sqashfs image.

$ du -hs /var/db/repos/gentoo
536M.
$ gensquashfs -k -q -b 1M -D /var/db/repos/gentoo -c zstd -X level=22 
/tmp/gentoo-current.zstd.sqfs
$ du -h /tmp/gentoo-current.zstd.sqfs
47M /tmp/gentoo-current.zstd.sqfs

Though that would probably open another can of worms around incremental updates
to the portage tree, or more precisely the lack of it (i.e. increased bandwidth
requirements).

Regardless, as a proxied maintainer I agree with Flow's point of view here (I
think I have expressed these in detail too in the past here) and would prefer
undeprecating EGO_SUM.

Zoltan

On Fri, Sep 30, 2022 at 05:10:10PM +0200, Jaco Kroon wrote:
> Hi,
> 
> On 2022/09/30 16:53, Florian Schmaus wrote:
> > jkroon@plastiekpoot ~ $ du -sh /var/db/repos/gentoo/
> >> 644M    /var/db/repos/gentoo/
> >>
> >> I'm not against exploding this by another 200 or even 300 MB personally,
> >> but I do agree that pointless bloat is bad, and ideally we want to
> >> shrink the size requirements of the portage tree rather than enlarge.
> >
> > What is the problem if it is 400 MB more? ? What if we double the
> > size? Would something break for you? Does that mean we should not add
> > more packages to ::gentoo? Where do you draw the line? Would you
> > rather have interested persons contribute to Gentoo or drive them away
> > due the struggle that the EGO_SUM deprecation causes?
> How long is a piece of string?
> 
> I agree with you entirely.  But if the tree gets to 10GB?
> 
> At some point it may be worthwhile to split the tree similar to what
> Debian does (or did, haven't checked in a while) where there is a core,
> non-core repo etc ... except I suspect it may be better to split into
> classes of packages, eg, x11 (aka desktop) style packages etc, and keep
> ::gentoo primarily to system stuff (which is also getting harder and
> harder to define).  And this also makes it harder for maintainers.  And
> this is really already what separate overlays does except the don't (as
> far as I know) have the rigorous QA that ::gentoo has.
> 
> But again - at what point do you do this - and this also adds extra
> burden on maintainers and developers alike.
> 
> And of course I could set a filter to not even --sync say /x11-* at
> all.  For example.  Or /dev-go or /dev-php etc ...
> 
> So perhaps you're right, this is a moot discussion.  Perhaps we should
> just say let's solve the problem when (if?) people complain the tree is
> too big.  No, I'm not being sarcastic, just blunt (;
> 
> The majority of Gentoo users (in my experience) are probably of the
> developer oriented mindset either way, or have very specific itches that
> need scratching that's hard to scratch with other distributions.  Let's
> face it, Gentoo to begin with should probably not be considered an
> "easy" distribution.  But it is a highly flexible, pro-choice, extremely
> customizable, rolling release distribution.  Which scratches my itch.
> 
> Incidentally, the only categories currently to individually exceed 10MB
> are these:
> 
> 11M    media-libs
> 11M    net-misc
> 12M    dev-util
> 13M    dev-ruby
> 16M    dev-libs
> 30M    dev-perl
> 31M    dev-python
> 
> And by far the biggest consumer of space:
> 
> 124M    metadata
> 
> Kind Regards,
> Jaco
> 



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Jaco Kroon
Hi,

On 2022/09/30 16:53, Florian Schmaus wrote:
> jkroon@plastiekpoot ~ $ du -sh /var/db/repos/gentoo/
>> 644M    /var/db/repos/gentoo/
>>
>> I'm not against exploding this by another 200 or even 300 MB personally,
>> but I do agree that pointless bloat is bad, and ideally we want to
>> shrink the size requirements of the portage tree rather than enlarge.
>
> What is the problem if it is 400 MB more? ? What if we double the
> size? Would something break for you? Does that mean we should not add
> more packages to ::gentoo? Where do you draw the line? Would you
> rather have interested persons contribute to Gentoo or drive them away
> due the struggle that the EGO_SUM deprecation causes?
How long is a piece of string?

I agree with you entirely.  But if the tree gets to 10GB?

At some point it may be worthwhile to split the tree similar to what
Debian does (or did, haven't checked in a while) where there is a core,
non-core repo etc ... except I suspect it may be better to split into
classes of packages, eg, x11 (aka desktop) style packages etc, and keep
::gentoo primarily to system stuff (which is also getting harder and
harder to define).  And this also makes it harder for maintainers.  And
this is really already what separate overlays does except the don't (as
far as I know) have the rigorous QA that ::gentoo has.

But again - at what point do you do this - and this also adds extra
burden on maintainers and developers alike.

And of course I could set a filter to not even --sync say /x11-* at
all.  For example.  Or /dev-go or /dev-php etc ...

So perhaps you're right, this is a moot discussion.  Perhaps we should
just say let's solve the problem when (if?) people complain the tree is
too big.  No, I'm not being sarcastic, just blunt (;

The majority of Gentoo users (in my experience) are probably of the
developer oriented mindset either way, or have very specific itches that
need scratching that's hard to scratch with other distributions.  Let's
face it, Gentoo to begin with should probably not be considered an
"easy" distribution.  But it is a highly flexible, pro-choice, extremely
customizable, rolling release distribution.  Which scratches my itch.

Incidentally, the only categories currently to individually exceed 10MB
are these:

11M    media-libs
11M    net-misc
12M    dev-util
13M    dev-ruby
16M    dev-libs
30M    dev-perl
31M    dev-python

And by far the biggest consumer of space:

124M    metadata

Kind Regards,
Jaco



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Florian Schmaus

On 30/09/2022 16.36, Jaco Kroon wrote:

Hi All,

This doesn't directly affect me. Nor am I familiar with the mechanisms.

Perhaps it's worthwhile to suggest that EGO_SUM itself may be
externalized.  I don't know what goes in here, and this will likely
require help from portage itself, so may not be directly viable.

What if portage had a feature whereby a SRC_URI list could be downloaded
as a SRC_URI itself?  In other words:

SRC_URI_INDIRECT="https://wherever/lists_for_some_go_package.txt";


That idea pops-up every time this is discussed. I don't see something 
like that anytime soon implemented in portage (please correct me if 
wrong) and it means that the ebuild development workflow requires some 
adjustments, to keep it as convenient as it currently is (but nothing 
couldn't be abstracted away by good tooling, i.e., pkgdev).




jkroon@plastiekpoot ~ $ du -sh /var/db/repos/gentoo/
644M    /var/db/repos/gentoo/

I'm not against exploding this by another 200 or even 300 MB personally,
but I do agree that pointless bloat is bad, and ideally we want to
shrink the size requirements of the portage tree rather than enlarge.


What is the problem if it is 400 MB more? ? What if we double the size? 
Would something break for you? Does that mean we should not add more 
packages to ::gentoo? Where do you draw the line? Would you rather have 
interested persons contribute to Gentoo or drive them away due the 
struggle that the EGO_SUM deprecation causes?


- Flow


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Florian Schmaus

On 30/09/2022 02.36, William Hubbs wrote:

On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:

On Wed, 28 Sep 2022, Florian Schmaus wrote:

2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied
maintainer maintains the package


These numbers seem quite large, compared to the mean number of 3.4
distfiles for packages in the Gentoo repository. (The median and the
99-percentile are 1 and 22, respectively.)


The numbers may appear large when compared to the whole tree, but I 
think a fair comparison would be within the related programming language 
ecosystem, e.g., Golang or Rust.


For example, analyzing ::gentoo yields the following histogram for 
2022-01-01:

https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png



To stay with your example, restic has a 300k manifest, multiple 30k+
ebuilds and897 distfiles.

I'm thinking the limit would have to be much lower. Say, around 256
entries in EGO_SUM_SRC_URI.


A limit of 256 appears to be to low to be of any use. It is slightly 
above the 50th percentile, half of the packages could not use it.


We have to realize that programming language ecosystems that only build 
static binaries tend to produce software projects that have a large 
number of dependencies. For example, app-misc/broot, a tool written in 
Rust, has currently 310 entries in its Manifest. Why should we threat 
one programming language different from another? Will be see voices that 
ask for banning Rust packages in ::gentoo in the future? With the rising 
popularity of Golang and Rust, we will (hopefully) only ever see an 
increase of such packages in ::gentoo. And most existing packages in 
this category will at best keep their dependency count constant, but are 
also likely to accumulate further dependencies over time.


And quite frankly, I don't see a problem with "large" Manifests and/or 
ebuilds. Yes, it means our FTPs are hosting many files, in some cases 
even many small files. And yes, it means that in some cases ebuild 
parsing takes a bit longer. But I spoke with a few developers in the 
past few months and was not presented with any real world issues that 
EGO_SUM caused. If someone wants to fill in here, then now is a good 
time to speak up. But my impression is that the arguments against 
EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am 
wrong.


- Flow


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Jaco Kroon
Hi All,

This doesn't directly affect me. Nor am I familiar with the mechanisms.

Perhaps it's worthwhile to suggest that EGO_SUM itself may be
externalized.  I don't know what goes in here, and this will likely
require help from portage itself, so may not be directly viable.

What if portage had a feature whereby a SRC_URI list could be downloaded
as a SRC_URI itself?  In other words:

SRC_URI_INDIRECT="https://wherever/lists_for_some_go_package.txt";

Where that file itself contains lines for entries that would normally go
into SRC_URI (directly or indirectly via EGO_SUM from what I can
deduce).  Something like:

https://www.upstream.com/downloads/package-version.tar.gz =>
fneh.tar.gz|manifest portion goes here

Where manifest portion would assume DIST and fneh.tar.gz, so would start
with the filesize in bytes, followed by checksum value pairs as per
current Manifest files.

Since users may want to know how big the downloads for a specific ebuild
is, some process to generate these external manifests may be in order,
and to subsequently store the size of these indirect downloads
themselves in the local manifest, so in the local Manifest, something like:

IDIST lists_for_some_go_package.txt direct_size indirect_size CHECKSUM
value CHECKSUM value.

I realise this idea isn't immediately feasible, and perhaps not at all,
presented here since perhaps it could spark an idea for someone else. 
It sounds like this is the problem that the vendor tarball tries to
solve, but that that introduces a trust issue - not sure this exactly
goes away but at a minimum we're now verifying download locations again
(as per EGO_SUM or just SRC_URI in general) rather than code tarballs
containing many many times more code than download locations.

Given:

jkroon@plastiekpoot ~ $ du -sh /var/db/repos/gentoo/
644M    /var/db/repos/gentoo/

I'm not against exploding this by another 200 or even 300 MB personally,
but I do agree that pointless bloat is bad, and ideally we want to
shrink the size requirements of the portage tree rather than enlarge.

Kind Regards,
Jaco

On 2022/09/30 15:57, Florian Schmaus wrote:

> On 28/09/2022 23.23, John Helmert III wrote:
>> On Wed, Sep 28, 2022 at 05:28:00PM +0200, Florian Schmaus wrote:
>>> I would like to continue discussing whether we should entirely
>>> deprecate
>>> EGO_SUM without the desire to offend anyone.
>>>
>>> We now have a pending GitHub PR that bumps restic to 0.14 [1].
>>> Restic is
>>> a very popular backup software written in Go. The PR drops EGO_SUM in
>>> favor of a vendor tarball created by the proxied maintainer. However, I
>>> am unaware of any tool that lets you practically audit the 35 MiB
>>> source
>>> contained in the tarball. And even if such a tool exists, this would
>>> mean another manual step is required, which is, potentially, skipped
>>> most of the time, weakening our user's security. This is because I
>>> believe neither our tooling, e.g., go-mod.eclass, nor any Golang
>>> tooling, does authenticate the contents of the vendor tarball against
>>> upstream's go.sum. But please correct me if I am wrong.
>>>
>>> I wonder if we can reach consensus around un-depreacting EGO_SUM, but
>>> discouraging its usage in certain situations. That is, provide EGO_SUM
>>> as option but disallow its use if
>>> 1.) *upstream* provides a vendor tarball
>>> 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
>>> maintains the package
>>> 3.) the number of EGO_SUM entries exceeds 1500 and a proxied maintainer
>>> maintains the package
>>
>> I'm not sure I agree on these limits, given the authenticity problem
>> exists regardless of how many dependencies there are.
>
> It's not really about authentication, you always have to trust
> upstream to some degree (unless you audit every line of code). But I
> believe that code distributed via official channels is viewed by more
> eyes and significantly more secure.
>
> EGO_SUM entries are directly fetched from the official distribution
> channels of Golang. Hence, there is a higher chance that malicious
> code in one of those is detected faster, simply because they are
> consumed by more entities. Compared to the dependency tarball that is
> just used by Gentoo. In contrast to the official sources, "nobody" is
> looking at the code inside the tarball.
>
> For proxied packages, where the dependency tarball is published by the
> proxied maintainer, the tarball also allows another entity to inject
> code into the final result of the package. And compared to a few small
> patches in FILESDIR, such a dependency tarball requires more effort to
> review. This further weakens security in comparison to EGO_SUM.
>
> - Flow



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-30 Thread Florian Schmaus

On 28/09/2022 23.23, John Helmert III wrote:

On Wed, Sep 28, 2022 at 05:28:00PM +0200, Florian Schmaus wrote:

I would like to continue discussing whether we should entirely deprecate
EGO_SUM without the desire to offend anyone.

We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic is
a very popular backup software written in Go. The PR drops EGO_SUM in
favor of a vendor tarball created by the proxied maintainer. However, I
am unaware of any tool that lets you practically audit the 35 MiB source
contained in the tarball. And even if such a tool exists, this would
mean another manual step is required, which is, potentially, skipped
most of the time, weakening our user's security. This is because I
believe neither our tooling, e.g., go-mod.eclass, nor any Golang
tooling, does authenticate the contents of the vendor tarball against
upstream's go.sum. But please correct me if I am wrong.

I wonder if we can reach consensus around un-depreacting EGO_SUM, but
discouraging its usage in certain situations. That is, provide EGO_SUM
as option but disallow its use if
1.) *upstream* provides a vendor tarball
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied maintainer
maintains the package


I'm not sure I agree on these limits, given the authenticity problem
exists regardless of how many dependencies there are.


It's not really about authentication, you always have to trust upstream 
to some degree (unless you audit every line of code). But I believe that 
code distributed via official channels is viewed by more eyes and 
significantly more secure.


EGO_SUM entries are directly fetched from the official distribution 
channels of Golang. Hence, there is a higher chance that malicious code 
in one of those is detected faster, simply because they are consumed by 
more entities. Compared to the dependency tarball that is just used by 
Gentoo. In contrast to the official sources, "nobody" is looking at the 
code inside the tarball.


For proxied packages, where the dependency tarball is published by the 
proxied maintainer, the tarball also allows another entity to inject 
code into the final result of the package. And compared to a few small 
patches in FILESDIR, such a dependency tarball requires more effort to 
review. This further weakens security in comparison to EGO_SUM.


- Flow


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-29 Thread William Hubbs
On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
> > On Wed, 28 Sep 2022, Florian Schmaus wrote:
> 
> > I would like to continue discussing whether we should entirely
> > deprecate EGO_SUM without the desire to offend anyone.

Don't worry, I am not offended. I just haven't found a simple way to do
this. Sure, I will continue the discussion.

> > We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic
> > is a very popular backup software written in Go. The PR drops EGO_SUM
> > in favor of a vendor tarball created by the proxied maintainer.
> > However, I am unaware of any tool that lets you practically audit the
> > 35 MiB source contained in the tarball. And even if such a tool
> > exists, this would mean another manual step is required, which is,
> > potentially, skipped most of the time, weakening our user's security.
> > This is because I believe neither our tooling, e.g., go-mod.eclass,
> > nor any Golang tooling, does authenticate the contents of the vendor
> > tarball against upstream's go.sum. But please correct me if I am
> > wrong.

I don't know for certain about a vendor tarball, but I do know there are
instances where a vendor tarball wouldn't work.
app-containers/containerd is a good example of this, That is why the
vendor tarball idea was dropped.

Go modules are verified by go tooling. That is why I went with a
dependency tarball.

> > I wonder if we can reach consensus around un-depreacting EGO_SUM, but
> > discouraging its usage in certain situations. That is, provide EGO_SUM
> > as option but disallow its use if
> > 1.) *upstream* provides a vendor tarball

Upstream doesn't need to provide a tarball, just an up-to-date "vendor"
directory at the top level of the project. Two examples that do this are
docker and kubernetes.

If the "vendor" directory is in the project, EGO_SUM should not be used.
This is already documented in the eclass.

> > 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
> > maintains the package
> > 3.) the number of EGO_SUM entries exceeds 1500 and a proxied
> > maintainer maintains the package
> 
> These numbers seem quite large, compared to the mean number of 3.4
> distfiles for packages in the Gentoo repository. (The median and the
> 99-percentile are 1 and 22, respectively.)

There is no way from within portage to tell whether a proxied maintainer
or a developer maintains the package, and I don't think we should care.
We don't want different qa standards for packages in the tree based on
who maintains them.

I think we should settle on one limit. I could check for that limit inside
the eclass and make the ebuild process die if the limit is not observed.

The concern, as I understand it, is about the sizes of the ebuilds and
manifests for go software. Since the number of distfiles was mentioned,
I will add it here and show it in my example numbers below.

To stay with your example, restic has a 300k manifest, multiple 30k+
ebuilds and897 distfiles.

I'm thinking the limit would have to be much lower. Say, around 256
entries in EGO_SUM_SRC_URI.

William



signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-28 Thread John Helmert III
On Wed, Sep 28, 2022 at 05:28:00PM +0200, Florian Schmaus wrote:
> I would like to continue discussing whether we should entirely deprecate 
> EGO_SUM without the desire to offend anyone.
> 
> We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic is 
> a very popular backup software written in Go. The PR drops EGO_SUM in 
> favor of a vendor tarball created by the proxied maintainer. However, I 
> am unaware of any tool that lets you practically audit the 35 MiB source 
> contained in the tarball. And even if such a tool exists, this would 
> mean another manual step is required, which is, potentially, skipped 
> most of the time, weakening our user's security. This is because I 
> believe neither our tooling, e.g., go-mod.eclass, nor any Golang 
> tooling, does authenticate the contents of the vendor tarball against 
> upstream's go.sum. But please correct me if I am wrong.
> 
> I wonder if we can reach consensus around un-depreacting EGO_SUM, but 
> discouraging its usage in certain situations. That is, provide EGO_SUM 
> as option but disallow its use if
> 1.) *upstream* provides a vendor tarball
> 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer 
> maintains the package
> 3.) the number of EGO_SUM entries exceeds 1500 and a proxied maintainer 
> maintains the package

I'm not sure I agree on these limits, given the authenticity problem
exists regardless of how many dependencies there are.

> In case of 3, I would encourage proxy maintainers to create and provide 
> the vendor tarball.
> 
> The suggested EGO_SUM limits result from a histogram that I created 
> analyzing ::gentoo at 2022-01-01, i.e., a few months before EGO_SUM was 
> deprecated.
> 
> - Flow
> 
> 1: https://github.com/gentoo/gentoo/pull/27050
> 


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-28 Thread Ulrich Mueller
> On Wed, 28 Sep 2022, Florian Schmaus wrote:

> I would like to continue discussing whether we should entirely
> deprecate EGO_SUM without the desire to offend anyone.

> We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic
> is a very popular backup software written in Go. The PR drops EGO_SUM
> in favor of a vendor tarball created by the proxied maintainer.
> However, I am unaware of any tool that lets you practically audit the
> 35 MiB source contained in the tarball. And even if such a tool
> exists, this would mean another manual step is required, which is,
> potentially, skipped most of the time, weakening our user's security.
> This is because I believe neither our tooling, e.g., go-mod.eclass,
> nor any Golang tooling, does authenticate the contents of the vendor
> tarball against upstream's go.sum. But please correct me if I am
> wrong.

> I wonder if we can reach consensus around un-depreacting EGO_SUM, but
> discouraging its usage in certain situations. That is, provide EGO_SUM
> as option but disallow its use if
> 1.) *upstream* provides a vendor tarball
> 2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
> maintains the package
> 3.) the number of EGO_SUM entries exceeds 1500 and a proxied
> maintainer maintains the package

These numbers seem quite large, compared to the mean number of 3.4
distfiles for packages in the Gentoo repository. (The median and the
99-percentile are 1 and 22, respectively.)

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-09-28 Thread Florian Schmaus
I would like to continue discussing whether we should entirely deprecate 
EGO_SUM without the desire to offend anyone.


We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic is 
a very popular backup software written in Go. The PR drops EGO_SUM in 
favor of a vendor tarball created by the proxied maintainer. However, I 
am unaware of any tool that lets you practically audit the 35 MiB source 
contained in the tarball. And even if such a tool exists, this would 
mean another manual step is required, which is, potentially, skipped 
most of the time, weakening our user's security. This is because I 
believe neither our tooling, e.g., go-mod.eclass, nor any Golang 
tooling, does authenticate the contents of the vendor tarball against 
upstream's go.sum. But please correct me if I am wrong.


I wonder if we can reach consensus around un-depreacting EGO_SUM, but 
discouraging its usage in certain situations. That is, provide EGO_SUM 
as option but disallow its use if

1.) *upstream* provides a vendor tarball
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer 
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied maintainer 
maintains the package


In case of 3, I would encourage proxy maintainers to create and provide 
the vendor tarball.


The suggested EGO_SUM limits result from a histogram that I created 
analyzing ::gentoo at 2022-01-01, i.e., a few months before EGO_SUM was 
deprecated.


- Flow

1: https://github.com/gentoo/gentoo/pull/27050



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread William Hubbs
On Sat, Jul 16, 2022 at 10:20:01PM +0200, Ulrich Mueller wrote:
> > On Sat, 16 Jul 2022, William Hubbs wrote:
> > The only question is, is there a way to reliably tell whether or not
> > we are  in the main tree?
> 
> An eclass has no legitimate way to find out in which repository it is.
> The rationale is that users should be able to copy ebuilds and eclasses
> to their local overlays, and they should work there in the same way.
> 
> There is an internal (and undocumented) Portage variable, but that
> shouldn't be used.

In that case, I'm left with two options.

1) continue with deprecating and removing EGO_SUM.

2) (suggested on IRC) allow EGO_SUM as long as it has below a certain
low number of entries. It would need to be kept small to keep ebuilds
and manifests from bloating too much.

Thoughts?

William



signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread Ulrich Mueller
> On Sat, 16 Jul 2022, William Hubbs wrote:

> I could force this in the eclass with the following flow if I know how
> to tell if the ebuild inheriting it is in the main tree or not:

> # in_main_tree is a place holder for a test to see if the ebuld running
> # this is in the tree
>   if [[ -n ${EGO_SUM} && in_main_tree ]]; then
>   eqawarn "EGO_SUM is not allowed in the main tree"
>   eqawarn "This will become a fatal error in the future"
>   fi

>   The only question is, is there a way to reliably tell whether or not
>   we are  in the main tree?

An eclass has no legitimate way to find out in which repository it is.
The rationale is that users should be able to copy ebuilds and eclasses
to their local overlays, and they should work there in the same way.

There is an internal (and undocumented) Portage variable, but that
shouldn't be used.

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread William Hubbs
On Sat, Jul 16, 2022 at 06:46:40PM +, Robin H. Johnson wrote:
> On Sat, Jul 16, 2022 at 09:31:35PM +0300, Arthur Zamarin wrote:
> > I want to give another option. Both ways are allowed by eclass, but by
> > QA policy (or some other decision), it is prohibited to use EGO_SUM in
> > main ::gentoo tree.
> > 
> > As a result, overlays and ::guru can use the EGO_SUM or dist distfile
> > (remember, they don't have access to hosting on dev.g.o).
> Yes; this is the option I was trying to propose as an intermediate step
> until we have indirect Manifests that provide the best of both worlds
> (not bloating the tree, and not requiring creation of dep tarballs).

I could force this in the eclass with the following flow if I know how
to tell if the ebuild inheriting it is in the main tree or not:

# in_main_tree is a place holder for a test to see if the ebuld running
# this is in the tree
if [[ -n ${EGO_SUM} && in_main_tree ]]; then
eqawarn "EGO_SUM is not allowed in the main tree"
eqawarn "This will become a fatal error in the future"
fi

The only question is, is there a way to reliably tell whether or not
we are  in the main tree?

William



signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread Robin H. Johnson
On Sat, Jul 16, 2022 at 09:31:35PM +0300, Arthur Zamarin wrote:
> I want to give another option. Both ways are allowed by eclass, but by
> QA policy (or some other decision), it is prohibited to use EGO_SUM in
> main ::gentoo tree.
> 
> As a result, overlays and ::guru can use the EGO_SUM or dist distfile
> (remember, they don't have access to hosting on dev.g.o).
Yes; this is the option I was trying to propose as an intermediate step
until we have indirect Manifests that provide the best of both worlds
(not bloating the tree, and not requiring creation of dep tarballs).


-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread Arthur Zamarin
On 16/07/2022 20.51, William Hubbs wrote:
> On Sat, Jul 16, 2022 at 02:58:04PM +0300, Joonas Niilola wrote:
>> On 16.7.2022 14.24, Florian Schmaus wrote:
>>>
>>
>> ++ this sounds most sensible. This is also how I've understood your
>> proposal.
> 
> Remember that with EGO_SUM all of the bloated manifests and ebuilds are
> on every user's system.
> 
> I added mgorny as a cc to this message because he made it pretty clear
> at some point in the previous discussion that the size of these ebuilds
> and manifests is unacceptable.
> 
> William

I want to give another option. Both ways are allowed by eclass, but by
QA policy (or some other decision), it is prohibited to use EGO_SUM in
main ::gentoo tree.

As a result, overlays and ::guru can use the EGO_SUM or dist distfile
(remember, they don't have access to hosting on dev.g.o).

-- 
Arthur Zamarin
arthur...@gentoo.org
Gentoo Linux developer (Python, Arch Teams, pkgcore stack, GURU)


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread William Hubbs
On Sat, Jul 16, 2022 at 02:58:04PM +0300, Joonas Niilola wrote:
> On 16.7.2022 14.24, Florian Schmaus wrote:
> > 
> > That reads as if you wrote it under the assumption that we can only
> > either use dependency tarballs or use EGO_SUM. At the same time, I have
> > not seen an argument why we can not simply do *both*.
> > 
> > EGO_SUM has numerous advantages over dependency tarballs, but can not be
> > used if the size of the EGO_SUM value crosses a threshold. So why not
> > mandate dependency tarballs if a point is crossed and otherwise allow
> > EGO_SUM? That way, we could have the best of both worlds.
> > 
> > - Flow
> > 
> 
> ++ this sounds most sensible. This is also how I've understood your
> proposal.

Remember that with EGO_SUM all of the bloated manifests and ebuilds are
on every user's system.

I added mgorny as a cc to this message because he made it pretty clear
at some point in the previous discussion that the size of these ebuilds
and manifests is unacceptable.

William


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread Joonas Niilola
On 16.7.2022 14.24, Florian Schmaus wrote:
> 
> That reads as if you wrote it under the assumption that we can only
> either use dependency tarballs or use EGO_SUM. At the same time, I have
> not seen an argument why we can not simply do *both*.
> 
> EGO_SUM has numerous advantages over dependency tarballs, but can not be
> used if the size of the EGO_SUM value crosses a threshold. So why not
> mandate dependency tarballs if a point is crossed and otherwise allow
> EGO_SUM? That way, we could have the best of both worlds.
> 
> - Flow
> 

++ this sounds most sensible. This is also how I've understood your
proposal.

-- juippis


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-16 Thread Florian Schmaus

On 15/07/2022 23.34, William Hubbs wrote:

On Mon, Jun 27, 2022 at 01:43:19AM +0200, Zoltan Puskas wrote:

In summary, IMHO the EGO_SUM way of handling of go packages has more
benefits than drawbacks compared to the vendor tarballs.


EGO_SUM can cause portage to break; that is the primary reason support
is going away.

We attempted another solution that was refused, so the only option we
have currently is to build the dependency tarballs.


That reads as if you wrote it under the assumption that we can only 
either use dependency tarballs or use EGO_SUM. At the same time, I have 
not seen an argument why we can not simply do *both*.


EGO_SUM has numerous advantages over dependency tarballs, but can not be 
used if the size of the EGO_SUM value crosses a threshold. So why not 
mandate dependency tarballs if a point is crossed and otherwise allow 
EGO_SUM? That way, we could have the best of both worlds.


- Flow







Re: [gentoo-dev] proposal

2022-07-15 Thread William Hubbs
On Wed, Jul 06, 2022 at 02:42:34PM +0200, Florian Schmaus wrote:
> On 04/07/2022 17.27, David Seifert wrote:
> > Ultimately, all these things really matter when only the defaults
> > change. Turn-right-on-red in the US is such a thing, because unless
> > otherwise stated, it's the norm. Knowing our devbase, with roughly 75%
> > mostly AWOL and barely reading the MLs, I don't think this idea will
> > bring about the desired change.
> 
> This sounds like you assume that the majority of Gentoo devs are OK with 
> other people making changes to their packages. This very well could be 
> true, but without an indication you never know if the maintainer feels 
> this way.

I was on vacation when this thread started, so that's why I'm responding
now.

The default assumption according to the dev manual is that maintainers
are not ok with others touching their packages without permission except
for very trivial changes. IMO this is the safer default.

https://devmanual.gentoo.org/general-concepts/package-maintainers/index.html

> 
> 
> > Instead, we should really just go for
> > the  tag, because my feeling is that
> > the default will be that most maintainers don't mind non-maintainer
> > commits, except a select few territorial ones.
> 
> It appears that we have at least two options here:
> 
> A) Establish that the default is non-maintainer-commits-welcome, and 
> introduce a  metadata element.
 
 This would go against the default from the dev manual, so if we go with
 it, which I do not recommend, we should fix the dev manual.

> B) Declare the default to be unspecified and introduce two metadata 
> elements:  and 
> .
> 
> I think you are proposing A) here, but please correct me if I am wrong.
> 
> Personally I would tend to B). But I have no strong opinion on this, as 
> long as some kind of signalling is established.
> 
> How do others feel about this?

I would suggest the default be consistent with the dev manual and we add
a  element.

William


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-07-15 Thread William Hubbs
On Mon, Jun 27, 2022 at 01:43:19AM +0200, Zoltan Puskas wrote:

*snip*

> First of all one of the advantages of Gentoo is that it gets it's source 
> code from upstream (yes, I'm aware of mirrors acting as a cache layer), 
> which means that poisoning source code needs to be done at upstream 
> level (effectively means hacking GitHub, PyPi, or some standalone 
> project's Gitea/cgit/gitlab/etc. instance or similar), sources which 
> either have more scrutiny or have a limited blast radius.

I don't quite follow what you mean.
Upstream for go modules is actually proxy.golang.org, or some other
similar proxy, which the go tooling knows how to access [1].

> Additionally if an upstream dependency has a security issue it's easier 
> to scan all EGO_SUM content and find packages that potentially depend on 
> a broken dependency and force a re-pinning and rebuild. The tarball 
> magic hides this completely and makes searching very expensive.

I'm not comfortable at all with us changing the dependencies like this
downstream for the same reason the Debian folks ultimately were against
it for kubernetes. If you make these kinds of changes you are affectively
creating a fork, and that would mean we would be building packages with untested
libraries [2].

*snip*

> Considering that BTRFS (and possibly other filesystems) support on the 
> fly compression the physical cost of a few inflated ebuilds and 

The problem here is the size of SRC_URI when you add the EGO_SUM_SRC_URI
to it. SRC_URI gets exported to the environment, so it can crash portage
if it is too big.

> Manifests is actually way smaller than the logical size would indicate. 
> Compare that to the huge incompressible tarballs that now we need to 
> store.
> 
> As a proxied maintainer or overlay owner hosting these huge tarballs 
> also becomes problem (i.e. we need some public space with potentially 
> gigabytes of free space and enough bandwidth to push that to users). 
> Pushing toward vendor tarballs creates an extra expense on every level 
> (Gentoo infra, mirrors, proxy maintainers, overlay owners, users).

I agree that creating the dependency tarballs is not ideal. We asked for
another option [3], but as you can see from the bug this was refused by
the PMS team. That refusal is the only reason we have to worry about
dependency tarballs.

> It also breaks reproducibility. With EGO_SUM I can check out an older 
> version of portage tree (well to some extent) and rebuild packages since 
> dependency upstream is very likely to host old versions of their source. 
> With the tarballs this breaks since as soon as an ebuild is dropped from 
> mainline portage the vendor tarballs follow them too. There is no way 
> for the user to roll back a package a few weeks back (e.g. if new 
> version has bugs), unlike with EGO_SUM.

The contents of a dependency tarball is created using "go mod download",
which is controlled by the go.mod/go.sum files in the package. So, it is
possible to recreate the dependency tarball any time.

I do not see any advantage EGO_SUM offers over the dependency tarballs
in this space.

> Finally with EGO_SUM we had a nice tool get-ego-vendor which produced 
> the EGO_SUM for maintainers which has made maintenance easier. However I 
> haven't found any new guidance yet on how to maintain go packages with 
> the new tarball method (e.g. what needs to go into the vendor tarball, 
> what changes are needed in ebuilds). Overall this complifates further 
> ebuild development and verification of PRs.

The documentation for how to build dependency tarballs is in the eclass.
The GOMODCACHE environment variable is used in the eclass to point to
the location where the dependency tarball is unpacked, and that location
is read by the normal go tooling.

> In summary, IMHO the EGO_SUM way of handling of go packages has more 
> benefits than drawbacks compared to the vendor tarballs.

EGO_SUM can cause portage to break; that is the primary reason support
is going away.

We attempted another solution that was refused, so the only option we
have currently is to build the dependency tarballs.

> 
> Cheers,
> Zoltan
> 
> [1] 
> https://blogs.gentoo.org/mgorny/2021/02/19/the-modern-packagers-security-nightmare/
> 

[1] https://go.dev/ref/mod
[2] https://lwn.net/Articles/835599/
[3] https://bugs.gentoo.org/833567


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal

2022-07-07 Thread Florian Schmaus

On 07.07.22 09:45, Michal Prívozník wrote:

I think that rejecting a contribution (regardless of the flag) should be
based on technical merit, rather than individual maintainers personal
preferences. I do understand some packages are like your babies, you
watch them grow, fine tune everything. But in the end, if somebody finds
a bug in the ebuild/eclass/... and is even willing to provide a fix, we
should have a discussion about the proposed fix rather than refer to a
flag (or lack of thereof) when closing the MR (unmerged).


It was never the intention to create a scenario where maintainers reject 
a contribution based on such a flag. Gentoo, being free and open source 
software, if fully aligned with the spirit of FOSS, which *everyone* can 
use, study, share and *improve*.


With the replies in mind, I gave this some more thought. I think the 
best default is "everyone can propose changes to the maintainer, on 
which the maintainer has to act within a reasonable amount of time".


However, there are maybe cases where trivial fixes for low-maintenance 
packages are for some reason not merged into ::gentoo and the maintainer 
is unresponsive. If those packages where flagged with 
, then I would be less reluctant to 
commit them to ::gentoo. After committing, I would always inform the 
maintainer.


On the other hand, there is the situation where seemingly innocent 
changes could cause some fallout, because this is the kind of package 
where you assume you know what is going on from reading the ebuild and 
playing with it, but you actually don't. Such packages could carry a 
flag indicating that all changes require review by the maintainer. It 
appears that  gives the wrong 
impression, so maybe ?


- Flow







Re: [gentoo-dev] proposal

2022-07-07 Thread Joonas Niilola
On 7/6/22 18:04, Fabian Groffen wrote:
> - please do not needlessly change style: if you do not "maintain" the
>   ebuild, respect the style of the maintainer, so only add the changes
>   you need, keep it minimal, respect the original even though you don't
>   like it (and don't use QA as an excuse to change style)

QA actually states to honor the maintainer's style as much as possible,
https://devmanual.gentoo.org/general-concepts/package-maintainers/index.html
"Respect developers' coding preferences. Unnecessarily changing the
syntax of an ebuild can cause complications for others. Syntax changes
should only be done if there is a real benefit, such as faster
compilation, improved information for the end user, or compliance with
Gentoo policies. "

Of course there are some "strict" rules, but very much is left to the
maintainer.
https://projects.gentoo.org/qa/policy-guide/ebuild-format.html#pg0101


> - when you make a change, make sure you check for bugs in the following
>   days, so you can cleanup yourself should there be fallout
> 

ago does a good job CCing the commit author too in his bug reports, if
the person is not the maintainer. This only applies to tinderbox bugs
though. Obviously you should manually CC the author if you see the bug
coming from their commit.

-- juippis



OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] proposal

2022-07-07 Thread Michal Prívozník
On 7/4/22 16:19, Florian Schmaus wrote:
> I'd like to propose a new metadata XML element for packages:
> 
>     
> 
> Maintainers can signal to other developers (and of course contributors
> in general) that they are happy with others to make changes to the
> ebuilds without prior consultation of the maintainer.
> 
> Of course, this is not a free ticket to always make changes to packages
> that you do not maintain without prior consultation of the maintainer. I
> would expect people to use their common sense to decide if a change may
> require maintainer attention or not. In general, it is always a good
> idea to communicate changes in every case.
> 
> The absence of the flag does not automatically allow the conclusion that
> the maintainer is opposed to non-maintainer commits. It just means that
> the maintainer's stance is not known. I do not believe that we need a
>  flag, but if the need arises, we
> could always consider adding one. Although, in my experience, people
> mostly like to communicate the "non-maintainer commits welcome" policy
> with others.

I worry that this might send wrong signal. My understanding is that just
like any OSS also Gentoo struggles with attracting new contributors and
telling anybody "hey, your contribution is not welcome" does not help.

I think that rejecting a contribution (regardless of the flag) should be
based on technical merit, rather than individual maintainers personal
preferences. I do understand some packages are like your babies, you
watch them grow, fine tune everything. But in the end, if somebody finds
a bug in the ebuild/eclass/... and is even willing to provide a fix, we
should have a discussion about the proposed fix rather than refer to a
flag (or lack of thereof) when closing the MR (unmerged).

Michal




Re: [gentoo-dev] proposal

2022-07-07 Thread Jaco Kroon
Hi All,

On 2022/07/06 15:50, Michał Górny wrote:

> On Mon, 2022-07-04 at 16:19 +0200, Florian Schmaus wrote:
>> I'd like to propose a new metadata XML element for packages:
>>
>>  
>>
>> Maintainers can signal to other developers (and of course contributors 
>> in general) that they are happy with others to make changes to the 
>> ebuilds without prior consultation of the maintainer.
> I don't think adding such an element is a good idea.  In my opinion,
> this will strengthen the assumption that "unless otherwise noted, you
> don't dare touch anything" (even though that's not your goal).  "Common
> sense" should really be good enough for almost everything.

I agree, but also note that what I consider to be "common sense" isn't
always "your common sense".

I also agree that having some way to indicate the preference on the
specific package may be a good thing.  With various possible levels of
sensitivity.

For example, net-misc/asterisk and net-libs/pjproject is very sensitive
for me.  net-misc/dahdi{,-tools} and x11-wm/evilwm less so.  In all
cases I'd still prefer to be kept in the loop at a minimum.

As such, it looks like there is multiple options, and there are
suggestions for various tags, this is a sensible way to indicate
preference.  Eg, also, what kind of fixes don't require communication,
eg, I've seen drive-by's on the above packages to fix dependencies based
on slots because depended on packages changed their structure, or
because LUA became slotted, or adding := etc ...  This makes sense to
allow these, but if you're going to mess with my ./configure on asterisk
or pjproject without consulting with me I'm going to be upset.  A simple
code fix to fix some compile error (specific to say llvm), probably
fine, but I'd still appreciate communication as I'd like to submit that
upstream kind of thing as well.

If this does go live, then there should be a single tag where the value
indicates the level of "sensitivity", or multiple tags of which only one
is allowed.  Since some of these options may be orthogonal to each
other, a single tag with multiple attributes may be more appropriate, I
don't know, nor do I personally care that much, so far I've been
respected, and the drive-by's that has been made were all either part of
global fixes, or in the one or two cases where it was specific, was put
into the tree as ~ so were all just fine.  In one particular case it was
also masked specifically because the change depended on another change
to happen simultaneously/close together (lua slotting) - the experience
of which was most refreshing.  Obviously nothing is set in stone w.r.t.
specifics, but if the initial course is at least somewhat in the right
direction it's easier to course-correct.

I thus have no strong opinion one way or the other, but just wanted to
state the above.


Kind Regards,
Jaco



Re: [gentoo-dev] proposal

2022-07-06 Thread Fabian Groffen
On 06-07-2022 15:50:30 +0200, Michał Górny wrote:
> On Mon, 2022-07-04 at 16:19 +0200, Florian Schmaus wrote:
> > I'd like to propose a new metadata XML element for packages:
> > 
> >  
> > 
> > Maintainers can signal to other developers (and of course contributors 
> > in general) that they are happy with others to make changes to the 
> > ebuilds without prior consultation of the maintainer.
> 
> I don't think adding such an element is a good idea.  In my opinion,
> this will strengthen the assumption that "unless otherwise noted, you
> don't dare touch anything" (even though that's not your goal).  "Common
> sense" should really be good enough for almost everything.

Right, "common sense".  The problem with that one, is that "common" is
not as "common" as you think it is.  Ask a bunch of people, and you'll
find that what they consider "common" isn't the same.

So, if you do this, then please define clearly what you think is OK.
For example for me:

- feel free to add patches necessary for operation
- feel free to fix constructs (like an if or case that should apply for
  something else/different/mode)
- feel free to fix typos
- please do not needlessly change style: if you do not "maintain" the
  ebuild, respect the style of the maintainer, so only add the changes
  you need, keep it minimal, respect the original even though you don't
  like it (and don't use QA as an excuse to change style)
- when you make a change, make sure you check for bugs in the following
  days, so you can cleanup yourself should there be fallout

> I mean, I do realize that 10 years ago, in the golden years of Gentoo,
> it was considered normal for developers to be like "my package, my
> fortress, don't you dare add systemd unit or I will retire" but today I
> think we're aiming for a more welcoming developer base, and I think
> we're actually going in that direction.  What I'm afraid is that some
> people will use this as an excuse to push back once again.

Not sure I have the same memories of how it used to be 10 years ago.  I
actually think it is pretty much the same as it is now, as it was then.
Different and fewer people, but still different
preferences/opinions/common sense.

> Can you really think of a case when common sense really, really wouldn't
> work?  I mean, sure, we all make mistakes but we should be able to learn
> from them and do better next time.  This also implies package
> maintainers learning that they're not the only people who will ever
> touch the package in question and starting to document the pitfalls.

Honestly, I've never been a fan of "maintainership".  It basically is
some sort of "sign" that says "beware for the dog, stay away".  However,
it's true that sometimes people really delve into a package, and thereby
know very much how it works, and what you should/should not do.
Something like LLVM is a good example, maybe.  Anyway, in such
situation, I think extreme care should be taken by non-maintainers.
Dunno how to best indicate that, and/or if that's feasible -- like you
said, it quickly ends up being an excuse for declaring a package to be
off-limits.

Fabian

-- 
Fabian Groffen
Gentoo on a different level


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal

2022-07-06 Thread Michał Górny
On Mon, 2022-07-04 at 16:19 +0200, Florian Schmaus wrote:
> I'd like to propose a new metadata XML element for packages:
> 
>  
> 
> Maintainers can signal to other developers (and of course contributors 
> in general) that they are happy with others to make changes to the 
> ebuilds without prior consultation of the maintainer.

I don't think adding such an element is a good idea.  In my opinion,
this will strengthen the assumption that "unless otherwise noted, you
don't dare touch anything" (even though that's not your goal).  "Common
sense" should really be good enough for almost everything.

I mean, I do realize that 10 years ago, in the golden years of Gentoo,
it was considered normal for developers to be like "my package, my
fortress, don't you dare add systemd unit or I will retire" but today I
think we're aiming for a more welcoming developer base, and I think
we're actually going in that direction.  What I'm afraid is that some
people will use this as an excuse to push back once again.

Can you really think of a case when common sense really, really wouldn't
work?  I mean, sure, we all make mistakes but we should be able to learn
from them and do better next time.  This also implies package
maintainers learning that they're not the only people who will ever
touch the package in question and starting to document the pitfalls.

-- 
Best regards,
Michał Górny




Re: [gentoo-dev] proposal

2022-07-06 Thread Rich Freeman
On Wed, Jul 6, 2022 at 8:42 AM Florian Schmaus  wrote:
>
>
> It appears that we have at least two options here:
>
> A) Establish that the default is non-maintainer-commits-welcome, and
> introduce a  metadata element.
>
> B) Declare the default to be unspecified and introduce two metadata
> elements:  and
> .
>
> I think you are proposing A) here, but please correct me if I am wrong.
>
> Personally I would tend to B). But I have no strong opinion on this, as
> long as some kind of signalling is established.
>
> How do others feel about this?

What about ?  I
guess that is what we're calling "disallowed" but that seems to have a
connotation that devs don't want contributions, when they just want to
be aware of what is going into their packages before it happens.

Deferring maintenance to bumps is the one use case that came up in the
thread that seems likely to be pretty common.

-- 
Rich



Re: [gentoo-dev] proposal

2022-07-06 Thread Florian Schmaus

On 04/07/2022 17.27, David Seifert wrote:

Ultimately, all these things really matter when only the defaults
change. Turn-right-on-red in the US is such a thing, because unless
otherwise stated, it's the norm. Knowing our devbase, with roughly 75%
mostly AWOL and barely reading the MLs, I don't think this idea will
bring about the desired change.


This sounds like you assume that the majority of Gentoo devs are OK with 
other people making changes to their packages. This very well could be 
true, but without an indication you never know if the maintainer feels 
this way.




Instead, we should really just go for
the  tag, because my feeling is that
the default will be that most maintainers don't mind non-maintainer
commits, except a select few territorial ones.


It appears that we have at least two options here:

A) Establish that the default is non-maintainer-commits-welcome, and 
introduce a  metadata element.


B) Declare the default to be unspecified and introduce two metadata 
elements:  and 
.


I think you are proposing A) here, but please correct me if I am wrong.

Personally I would tend to B). But I have no strong opinion on this, as 
long as some kind of signalling is established.


How do others feel about this?

- Flow



Re: [gentoo-dev] proposal

2022-07-04 Thread waebbl-gentoo

On Mon, 4 Jul 2022 22:49:25 -0400
Ionen Wolkens  wrote:


On Mon, Jul 04, 2022 at 04:19:12PM +0200, Florian Schmaus wrote:
> I'd like to propose a new metadata XML element for packages:
>
>  
>
> Maintainers can signal to other developers (and of course contributors
> in general) that they are happy with others to make changes to the
> ebuilds without prior consultation of the maintainer.
>
> Of course, this is not a free ticket to always make changes to packages
> that you do not maintain without prior consultation of the maintainer. I
> would expect people to use their common sense to decide if a change may
> require maintainer attention or not. In general, it is always a good
> idea to communicate changes in every case.
>
> The absence of the flag does not automatically allow the conclusion that
> the maintainer is opposed to non-maintainer commits. It just means that
> the maintainer's stance is not known. I do not believe that we need a
>  flag, but if the need arises, we
> could always consider adding one. Although, in my experience, people
> mostly like to communicate the "non-maintainer commits welcome" policy
> with others.
>
> WDYT?

Personally I think something per-maintainer rather than per package
would be simpler, and allow to say more as needed.

Think like devaway instructions, but something more permanent and
not for being away, e.g.
"feel free to touch my packages except this big important one, and
or do or do not do this to them"



I think it would be more efficient if we use a flag on a per-maintainer
basis. But it adds extra overhead, if the maintainer doesn't want some
special packages to be touched, or if special cases, like bumps need to
be avoided.

We could add a central file, something like the metadata/AUTHORS file
with this information. Possibly in a structured format like xml or json
to make it machine readable as well and the information can be
extracted and shown e.g. on the wiki or p.g.o site.

Something like the devaway
instructions could lock out proxy maintainers, which don't have access
to the Gentoo infrastructure.


>
> - Flow
>



-BEGIN PGP SIGNATURE-

iQGTBAEBCgB9FiEExIg3+Emk70nqAQ2ybb4K1Uo7McYFAmJOvDBfFIAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEM0
ODgzN0Y4NDlBNEVGNDlFQTAxMERCMjZEQkUwQUQ1NEEzQjMxQzYACgkQbb4K1Uo7
McbyjggA40un5MrP8DyyVmJXRUTSrxNHbCFck/7vCRHuHfync2bCFYk3JqvfFcu3
D5ms7sH+3ZBxGgUtCG7LwWOQZ89pSkQFXPCu+4Pb0LlVgz6x+lFyGNXdT1g4RyXu
1TzqQlok5gOmlxJ+aK+C6CmzN7e+0Mfe8lGHVLfcukzjlCglochavIXuiG7KNiTB
4rUSeVLJ6OaLkeqYQ4EYrhU8gkiA8nsH4TqXWxmB6cFhfy0e1wOGlkK31Q9jmmZp
R4qpRF2QwH7CAKlIW9TT8fZ0kw06UVoGosm8lxMA2wQ2WycTnPp7kbRaTvdENEqb
UggCF7hh2g7Y6LSh33f2l6TNlxH0tA==
=G4KG
-END PGP SIGNATURE-


Re: [gentoo-dev] proposal

2022-07-04 Thread Sam James


> On 5 Jul 2022, at 00:21, Robin H. Johnson  wrote:
> 
> On Mon, Jul 04, 2022 at 05:27:03PM +0200, David Seifert wrote:
>> On Mon, 2022-07-04 at 16:19 +0200, Florian Schmaus wrote:
>>> I'd like to propose a new metadata XML element for packages:
>>> 
>>>  
> ...
>> Ultimately, all these things really matter when only the defaults
>> change. Turn-right-on-red in the US is such a thing, because unless
>> otherwise stated, it's the norm. Knowing our devbase, with roughly 75%
>> mostly AWOL and barely reading the MLs, I don't think this idea will
>> bring about the desired change. Instead, we should really just go for
>> the  tag, because my feeling is that
>> the default will be that most maintainers don't mind non-maintainer
>> commits, except a select few territorial ones.
> 
> I had a rough draft similar proposal to this before that was never
> completed into GLEP.

https://marc.info/?l=gentoo-dev&m=137184067920894&w=2
and possibly a later version too if anyone is interested like I was.

The discussions there are pretty interesting if nothing else.

Best,
sam


signature.asc
Description: Message signed with OpenPGP


Re: [gentoo-dev] proposal

2022-07-04 Thread Sam James


> On 5 Jul 2022, at 03:49, Ionen Wolkens  wrote:
> 
> On Mon, Jul 04, 2022 at 04:19:12PM +0200, Florian Schmaus wrote:
>> I'd like to propose a new metadata XML element for packages:
>> 
>> 
>> 
>> Maintainers can signal to other developers (and of course contributors
>> in general) that they are happy with others to make changes to the
>> ebuilds without prior consultation of the maintainer.
>> 
>> Of course, this is not a free ticket to always make changes to packages
>> that you do not maintain without prior consultation of the maintainer. I
>> would expect people to use their common sense to decide if a change may
>> require maintainer attention or not. In general, it is always a good
>> idea to communicate changes in every case.
>> 
>> The absence of the flag does not automatically allow the conclusion that
>> the maintainer is opposed to non-maintainer commits. It just means that
>> the maintainer's stance is not known. I do not believe that we need a
>>  flag, but if the need arises, we
>> could always consider adding one. Although, in my experience, people
>> mostly like to communicate the "non-maintainer commits welcome" policy
>> with others.
>> 
>> WDYT?
> 
> Personally I think something per-maintainer rather than per package
> would be simpler, and allow to say more as needed.
> 
> Think like devaway instructions, but something more permanent and
> not for being away, e.g.
> "feel free to touch my packages except this big important one, and
> or do or do not do this to them"

On this, some prior art from zx2c4: 
https://marc.info/?l=gentoo-dev&m=147816820903115&w=2).

Best,
sam


signature.asc
Description: Message signed with OpenPGP


Re: [gentoo-dev] proposal

2022-07-04 Thread Sam James


> On 5 Jul 2022, at 04:24, Ionen Wolkens  wrote:
> 
> On Mon, Jul 04, 2022 at 10:53:41PM -0400, Ionen Wolkens wrote:
>> On Mon, Jul 04, 2022 at 10:49:25PM -0400, Ionen Wolkens wrote:
>>> On Mon, Jul 04, 2022 at 04:19:12PM +0200, Florian Schmaus wrote:
 I'd like to propose a new metadata XML element for packages:
 
 
 
 Maintainers can signal to other developers (and of course contributors
 in general) that they are happy with others to make changes to the
 ebuilds without prior consultation of the maintainer.
 
 Of course, this is not a free ticket to always make changes to packages
 that you do not maintain without prior consultation of the maintainer. I
 would expect people to use their common sense to decide if a change may
 require maintainer attention or not. In general, it is always a good
 idea to communicate changes in every case.
 
 The absence of the flag does not automatically allow the conclusion that
 the maintainer is opposed to non-maintainer commits. It just means that
 the maintainer's stance is not known. I do not believe that we need a
  flag, but if the need arises, we
 could always consider adding one. Although, in my experience, people
 mostly like to communicate the "non-maintainer commits welcome" policy
 with others.
 
 WDYT?
>>> 
>>> Personally I think something per-maintainer rather than per package
>>> would be simpler, and allow to say more as needed.
>> 
>> ... and that could also extend to projects so can clarify policy in
>> a single place that's easy to find.
>> 
>> Like base-system@ probably do not want random uninformed commits,
>> but games@, sound@, and such?
> 
> Also, for projects, presenting it more as exception rules makes sense.
> Especially for all these semi-random understaffed projects, there's
> really a handful that would have some "do nots".
> 
>> 
>>> 
>>> Think like devaway instructions, but something more permanent and
>>> not for being away, e.g.
>>> "feel free to touch my packages except this big important one, and
>>> or do or do not do this to them"
> 
> -'or do' :eyes:
> 
> To add more as an example, personally I don't mind non-maintainer commits
> but don't particularly want people to start full on bumping my packages
> when I /do/ intend to handle them in a timely fashion and probably had
> plans for them, perhaps even already a local WIP ebuild and such. So
> I'd likely have something along these lines. A simple tag feels like a
> bit too "free for all".
> 

You've nailed something I was wondering about but couldn't
really articulate.

The only time I really care/don't want someone to do it:
- a package genuinely is snowflakey (which is the exception), like it's somehow 
fragile
- the situation is as you described

Almost makes one wonder about per-package notes again, although
it wouldn't fix the issue wrt projects.

Best,
sam


signature.asc
Description: Message signed with OpenPGP


Re: [gentoo-dev] proposal

2022-07-04 Thread Ionen Wolkens
On Mon, Jul 04, 2022 at 10:53:41PM -0400, Ionen Wolkens wrote:
> On Mon, Jul 04, 2022 at 10:49:25PM -0400, Ionen Wolkens wrote:
> > On Mon, Jul 04, 2022 at 04:19:12PM +0200, Florian Schmaus wrote:
> > > I'd like to propose a new metadata XML element for packages:
> > > 
> > >  
> > > 
> > > Maintainers can signal to other developers (and of course contributors 
> > > in general) that they are happy with others to make changes to the 
> > > ebuilds without prior consultation of the maintainer.
> > > 
> > > Of course, this is not a free ticket to always make changes to packages 
> > > that you do not maintain without prior consultation of the maintainer. I 
> > > would expect people to use their common sense to decide if a change may 
> > > require maintainer attention or not. In general, it is always a good 
> > > idea to communicate changes in every case.
> > > 
> > > The absence of the flag does not automatically allow the conclusion that 
> > > the maintainer is opposed to non-maintainer commits. It just means that 
> > > the maintainer's stance is not known. I do not believe that we need a 
> > >  flag, but if the need arises, we 
> > > could always consider adding one. Although, in my experience, people 
> > > mostly like to communicate the "non-maintainer commits welcome" policy 
> > > with others.
> > > 
> > > WDYT?
> > 
> > Personally I think something per-maintainer rather than per package
> > would be simpler, and allow to say more as needed.
> 
> ... and that could also extend to projects so can clarify policy in
> a single place that's easy to find.
> 
> Like base-system@ probably do not want random uninformed commits,
> but games@, sound@, and such?

Also, for projects, presenting it more as exception rules makes sense.
Especially for all these semi-random understaffed projects, there's
really a handful that would have some "do nots".

> 
> > 
> > Think like devaway instructions, but something more permanent and
> > not for being away, e.g.
> > "feel free to touch my packages except this big important one, and
> > or do or do not do this to them"

-'or do' :eyes:

To add more as an example, personally I don't mind non-maintainer commits
but don't particularly want people to start full on bumping my packages
when I /do/ intend to handle them in a timely fashion and probably had
plans for them, perhaps even already a local WIP ebuild and such. So
I'd likely have something along these lines. A simple tag feels like a
bit too "free for all".

On a related note, perhaps QA team could even be allowed to give
instructions themselves when a maintainer is generally unresponsive
and is giving no instructions to go with that.

> > 
> > > 
> > > - Flow
> > > 
> > 
> > -- 
> > ionen
> 
> 
> 
> -- 
> ionen



-- 
ionen


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal

2022-07-04 Thread Ionen Wolkens
On Mon, Jul 04, 2022 at 10:49:25PM -0400, Ionen Wolkens wrote:
> On Mon, Jul 04, 2022 at 04:19:12PM +0200, Florian Schmaus wrote:
> > I'd like to propose a new metadata XML element for packages:
> > 
> >  
> > 
> > Maintainers can signal to other developers (and of course contributors 
> > in general) that they are happy with others to make changes to the 
> > ebuilds without prior consultation of the maintainer.
> > 
> > Of course, this is not a free ticket to always make changes to packages 
> > that you do not maintain without prior consultation of the maintainer. I 
> > would expect people to use their common sense to decide if a change may 
> > require maintainer attention or not. In general, it is always a good 
> > idea to communicate changes in every case.
> > 
> > The absence of the flag does not automatically allow the conclusion that 
> > the maintainer is opposed to non-maintainer commits. It just means that 
> > the maintainer's stance is not known. I do not believe that we need a 
> >  flag, but if the need arises, we 
> > could always consider adding one. Although, in my experience, people 
> > mostly like to communicate the "non-maintainer commits welcome" policy 
> > with others.
> > 
> > WDYT?
> 
> Personally I think something per-maintainer rather than per package
> would be simpler, and allow to say more as needed.

... and that could also extend to projects so can clarify policy in
a single place that's easy to find.

Like base-system@ probably do not want random uninformed commits,
but games@, sound@, and such?

> 
> Think like devaway instructions, but something more permanent and
> not for being away, e.g.
> "feel free to touch my packages except this big important one, and
> or do or do not do this to them"
> 
> > 
> > - Flow
> > 
> 
> -- 
> ionen



-- 
ionen


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal

2022-07-04 Thread Ionen Wolkens
On Mon, Jul 04, 2022 at 04:19:12PM +0200, Florian Schmaus wrote:
> I'd like to propose a new metadata XML element for packages:
> 
>  
> 
> Maintainers can signal to other developers (and of course contributors 
> in general) that they are happy with others to make changes to the 
> ebuilds without prior consultation of the maintainer.
> 
> Of course, this is not a free ticket to always make changes to packages 
> that you do not maintain without prior consultation of the maintainer. I 
> would expect people to use their common sense to decide if a change may 
> require maintainer attention or not. In general, it is always a good 
> idea to communicate changes in every case.
> 
> The absence of the flag does not automatically allow the conclusion that 
> the maintainer is opposed to non-maintainer commits. It just means that 
> the maintainer's stance is not known. I do not believe that we need a 
>  flag, but if the need arises, we 
> could always consider adding one. Although, in my experience, people 
> mostly like to communicate the "non-maintainer commits welcome" policy 
> with others.
> 
> WDYT?

Personally I think something per-maintainer rather than per package
would be simpler, and allow to say more as needed.

Think like devaway instructions, but something more permanent and
not for being away, e.g.
"feel free to touch my packages except this big important one, and
or do or do not do this to them"

> 
> - Flow
> 

-- 
ionen


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal

2022-07-04 Thread Sam James


> On 5 Jul 2022, at 00:49, Rich Freeman  wrote:
> 
> On Mon, Jul 4, 2022 at 7:21 PM Robin H. Johnson  wrote:
>> 
>> It had 3 states however:
>> a) go ahead and touch it, no additional approvals needed
>> b) please get a maintainer to approve it
>> c) do not touch it
>> 
> 
> ++
> 
> Though to be fair b is really no different from what just about
> anybody can do via a pull request.  I don't think most maintainers are
> going to be hovering between a vs c.  I suspect most are going to be
> divided between a vs b.  I guess I could see an argument for c if some
> package is really finicky and tends to get a lot of repetitive
> requests for changes that won't work for reasons that might not be
> obvious, but I'm not sure if that is really a concern.
> 

Right. Difference between b and c might make more sense if c became "run it by 
another developer as a sanity-check" (who is not necessarily a maintainer of 
the package, or obviously there's pretty much no point).

> --
> Rich

Best,
sam


signature.asc
Description: Message signed with OpenPGP


Re: [gentoo-dev] proposal

2022-07-04 Thread Rich Freeman
On Mon, Jul 4, 2022 at 7:21 PM Robin H. Johnson  wrote:
>
> It had 3 states however:
> a) go ahead and touch it, no additional approvals needed
> b) please get a maintainer to approve it
> c) do not touch it
>

++

Though to be fair b is really no different from what just about
anybody can do via a pull request.  I don't think most maintainers are
going to be hovering between a vs c.  I suspect most are going to be
divided between a vs b.  I guess I could see an argument for c if some
package is really finicky and tends to get a lot of repetitive
requests for changes that won't work for reasons that might not be
obvious, but I'm not sure if that is really a concern.

-- 
Rich



Re: [gentoo-dev] proposal

2022-07-04 Thread Robin H. Johnson
On Mon, Jul 04, 2022 at 05:27:03PM +0200, David Seifert wrote:
> On Mon, 2022-07-04 at 16:19 +0200, Florian Schmaus wrote:
> > I'd like to propose a new metadata XML element for packages:
> > 
> >  
...
> Ultimately, all these things really matter when only the defaults
> change. Turn-right-on-red in the US is such a thing, because unless
> otherwise stated, it's the norm. Knowing our devbase, with roughly 75%
> mostly AWOL and barely reading the MLs, I don't think this idea will
> bring about the desired change. Instead, we should really just go for
> the  tag, because my feeling is that
> the default will be that most maintainers don't mind non-maintainer
> commits, except a select few territorial ones.

I had a rough draft similar proposal to this before that was never
completed into GLEP.

It had 3 states however:
a) go ahead and touch it, no additional approvals needed
b) please get a maintainer to approve it
c) do not touch it

With b) being the proposed default as status-quo at the time.

That however was years ago, and I'll entirely agree that the devbase
isn't as watchful anymore.

With that said, I stand behind the intent of making the default a), with
a migration period.

Something like this for the migration period:
July 1 to Sep 30: default is still b), to allow developers time to update
their metadata.
Oct 1 onwards: default becomes a)


-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal

2022-07-04 Thread David Seifert
On Mon, 2022-07-04 at 16:19 +0200, Florian Schmaus wrote:
> I'd like to propose a new metadata XML element for packages:
> 
>  
> 
> Maintainers can signal to other developers (and of course contributors
> in general) that they are happy with others to make changes to the 
> ebuilds without prior consultation of the maintainer.
> 
> Of course, this is not a free ticket to always make changes to
> packages 
> that you do not maintain without prior consultation of the maintainer.
> I 
> would expect people to use their common sense to decide if a change
> may 
> require maintainer attention or not. In general, it is always a good 
> idea to communicate changes in every case.
> 
> The absence of the flag does not automatically allow the conclusion
> that 
> the maintainer is opposed to non-maintainer commits. It just means
> that 
> the maintainer's stance is not known. I do not believe that we need a 
>  flag, but if the need arises, we 
> could always consider adding one. Although, in my experience, people 
> mostly like to communicate the "non-maintainer commits welcome" policy
> with others.
> 
> WDYT?
> 
> - Flow
> 

Ultimately, all these things really matter when only the defaults
change. Turn-right-on-red in the US is such a thing, because unless
otherwise stated, it's the norm. Knowing our devbase, with roughly 75%
mostly AWOL and barely reading the MLs, I don't think this idea will
bring about the desired change. Instead, we should really just go for
the  tag, because my feeling is that
the default will be that most maintainers don't mind non-maintainer
commits, except a select few territorial ones.

David



[gentoo-dev] proposal

2022-07-04 Thread Florian Schmaus

I'd like to propose a new metadata XML element for packages:



Maintainers can signal to other developers (and of course contributors 
in general) that they are happy with others to make changes to the 
ebuilds without prior consultation of the maintainer.


Of course, this is not a free ticket to always make changes to packages 
that you do not maintain without prior consultation of the maintainer. I 
would expect people to use their common sense to decide if a change may 
require maintainer attention or not. In general, it is always a good 
idea to communicate changes in every case.


The absence of the flag does not automatically allow the conclusion that 
the maintainer is opposed to non-maintainer commits. It just means that 
the maintainer's stance is not known. I do not believe that we need a 
 flag, but if the need arises, we 
could always consider adding one. Although, in my experience, people 
mostly like to communicate the "non-maintainer commits welcome" policy 
with others.


WDYT?

- Flow



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-27 Thread Zoltan Puskas
Hey,

>
> Rephrasing this just to ensure I'm understanding it correctly: you're
> suggesting to move _everything_ that uses Go into its own overlay. Let's
> call it gentoo-go for the sake of the example.
>
> If the above is accurate, then I hard disagree.

Yes, that was the suggestion, you understood it correctly.

>
> The biggest package that I have that uses Go is docker (and accompanying
> tools). Personal distaste of docker aside, it's a very popular piece of
> software, and I don't think it's fair to require all the people who want
> to use it to first enable and sync gentoo-go before they can install it.

It could be enabled by default for everyone, and people would have the choice to
disable it or mask everything except what they are using in that case, so the
extra user toil could be avoided by a creaful rollout. I'm not saying it would
be an elegant solution though.

>
> And what about transitive dependencies? Suppose app-misc/cool-package is
> written in some language that isn't Go, but it has a dependency on
> sys-apps/cool-util which has a dependency on something written in Go.
> Should a user wanting to install cool-package have to enable the
> gentoo-go overlay now too? Even though app-misc/cool-package would look
> like it doesn't need the overlay unless you dig into the deps.

This is however a valid point, something I did not consider.

Any reverse dependencies (i.e. packages in main portage tree depending on
gentoo-go) would be anithetical to the overlay philosopy (the other direction of
dependencies is okay though). This invalidates my separate overlay
suggestion, consider it withdrawn.

However I think that my other points still stand, until someone convinces
me otherwise.

>
> Not a dev, just a user who really likes Gentoo :)

Thanks for your perspective, it was a valueable observation. :)

>
> - Oskari
>

Cheers,
Zoltan


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-26 Thread Oskari Pirhonen
On Mon, Jun 27, 2022 at 01:43:19 +0200, Zoltan Puskas wrote:
> Hi,
> 
> I've been working on adding a go based ebuild to Gentoo yesterday and I 
> got this warning form portage saying that EGO_SUM is deprecated and 
> should be avoided. Since I remember there was an intense discussion 
> about this on the ML I went back and have re-read the threads before 
> writing this piece. I'd like to provide my perspective as user, a 
> proxied maintainer, and overlay owner. I also run a private mirror on my 
> LAN to serve my hosts in order to reduce load on external mirrors.
> 
> Before diving in I think it's worth reading mgorny's blog post "The 
> modern packager’s security nightmare"[1] as it's relevant to the 
> discussion, and something I deeply agree with.
> 
> With all that being said, I feel that the tarball idea is a bad due to 
> many reasons.
> 
>  From security point of view, I understand that we still have to trust 
> maintainers not to do funky stuff, but I think this issue goes beyond 
> that.
> 
> First of all one of the advantages of Gentoo is that it gets it's source 
> code from upstream (yes, I'm aware of mirrors acting as a cache layer), 
> which means that poisoning source code needs to be done at upstream 
> level (effectively means hacking GitHub, PyPi, or some standalone 
> project's Gitea/cgit/gitlab/etc. instance or similar), sources which 
> either have more scrutiny or have a limited blast radius.
> 
> Additionally if an upstream dependency has a security issue it's easier 
> to scan all EGO_SUM content and find packages that potentially depend on 
> a broken dependency and force a re-pinning and rebuild. The tarball 
> magic hides this completely and makes searching very expensive.
> 
> In fact using these vendor tarballs is the equivalent of "static 
> linking" in the packaging space. Why are we introducing the same issue 
> in the repository space? This kills the reusability of already 
> downloaded dependencies and bloats storage requirements. This is 
> especially bad on laptops, where SSD free space might be limited, in 
> case the user does not nuke their distfiles after each upgrade.
> 
> Considering that BTRFS (and possibly other filesystems) support on the 
> fly compression the physical cost of a few inflated ebuilds and 
> Manifests is actually way smaller than the logical size would indicate. 
> Compare that to the huge incompressible tarballs that now we need to 
> store.
> 
> As a proxied maintainer or overlay owner hosting these huge tarballs 
> also becomes problem (i.e. we need some public space with potentially 
> gigabytes of free space and enough bandwidth to push that to users). 
> Pushing toward vendor tarballs creates an extra expense on every level 
> (Gentoo infra, mirrors, proxy maintainers, overlay owners, users).
> 
> If bloating portage is a big issue and we frown upon go stuff anyway (or 
> only a few users need these packages), why not consider moving all go 
> packages into an officially supported go packages only overlay? I 
> understand that this would not solve the kernel buffer issue where we 
> run out of environment variable space, but it would debloat the main 
> portage tree.
> 

Rephrasing this just to ensure I'm understanding it correctly: you're
suggesting to move _everything_ that uses Go into its own overlay. Let's
call it gentoo-go for the sake of the example.

If the above is accurate, then I hard disagree.

The biggest package that I have that uses Go is docker (and accompanying
tools). Personal distaste of docker aside, it's a very popular piece of
software, and I don't think it's fair to require all the people who want
to use it to first enable and sync gentoo-go before they can install it.

And what about transitive dependencies? Suppose app-misc/cool-package is
written in some language that isn't Go, but it has a dependency on
sys-apps/cool-util which has a dependency on something written in Go.
Should a user wanting to install cool-package have to enable the
gentoo-go overlay now too? Even though app-misc/cool-package would look
like it doesn't need the overlay unless you dig into the deps.

Not a dev, just a user who really likes Gentoo :)

- Oskari

> It also breaks reproducibility. With EGO_SUM I can check out an older 
> version of portage tree (well to some extent) and rebuild packages since 
> dependency upstream is very likely to host old versions of their source. 
> With the tarballs this breaks since as soon as an ebuild is dropped from 
> mainline portage the vendor tarballs follow them too. There is no way 
> for the user to roll back a package a few weeks back (e.g. if new 
> version has bugs), unlike with EGO_SUM.
> 
> In fact I feel this goes against the spirit of portage too, since now 
> instead of "just describing" how to obtain sources and build them, now 
> it now depends on essentially ephemeral blobs, which happens to be 
> externalized from the portage tree itself. I'm aware that we have 
> ebuilds that pull in 

Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-17 Thread William Hubbs
On Mon, Jun 13, 2022 at 12:26:43PM +0200, Ulrich Mueller wrote:
> > On Mon, 13 Jun 2022, Florian Schmaus wrote:
> 
>  Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
>  where some voices where in agreement that EGO_SUM has its raison d'être,
>  while there where no arguments in favor of eventually removing EGO_SUM,
>  I hereby propose to undeprecate EGO_SUM.
>  
>  1: 
>  https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
> 
> >> Can this be done without requesting changes to package managers?
> 
> > What is 'this' here?
> 
> Undeprecating EGO_SUM.
> 
> > The patchset does not make changes to any package manager, just the
> > go-module eclass.
> 
> > Note that this is not about finding about an alternative to dependency
> > tarballs. It is just about re-allowing EGO_SUM in addition to
> > dependency tarballs for packaging Go software in Gentoo.

Like I said on my earlier reply, there have been packages that break
using EGO_SUM. Also, Robin's proposal will not be happening, if it does,
for some time since it will require an eapi bump and doesn't have a
working implementation.

The most pressing concern about EGO_SUM is that it can make portage
crash because of the size of SRC_URI, so it definitely should not be
preferred over dependency tarballs.

If you want to chat more about this on the list we can, but for now,
let's not undeprecate EGO_SUM in the eclass.

William


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-14 Thread Florian Schmaus

On 14/06/2022 11.37, Michał Górny wrote:

On Mon, 2022-06-13 at 10:29 +0200, Michał Górny wrote:

On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:

Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.

1: 
https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa



"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".


First of all, I am sorry for my tone.


No worries and no offense taken. I can easily see how this could be 
considered rehashing a old discussion, but the truth is simply that the 
deprecation of EGO_SUM cough me by surprise.




I have been thinking about it and I was wrong to oppose this change.
I have been conflating two problem: EGO_SUM and Manifest sizes.
However, while EGO_SUM might be an important factor contributing to
the latter, I think we shouldn't single it out and instead focus
on addressing the actual problem.


Exactly my line of though. Especially since it is not unlikely that we 
will run into this problem with other programming language ecosystems 
too (where the "dependency tarball" solution may not be easily viable).




That said, I believe it's within maintainer's right to decide what API
to deprecate and what API to support.  So I'd suggest getting William's
approval for this rather than changing the supported API of that eclass
via drive-by commits.


That was never my intention, hence the subject starts with "Proposal to" 
and I explicitly but William in CC. I believed that one week after the 
discussion around my initial gentoo-dev@ post, which gave me the 
impression that un-deprecating EGO_SUM has some supporters and no 
opposer, it was time to post a concrete proposal in form of a suggested 
code change.


Looking forward to William's take on this. :)

- Flow



Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-14 Thread Michał Górny
On Mon, 2022-06-13 at 10:29 +0200, Michał Górny wrote:
> On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
> > Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
> > where some voices where in agreement that EGO_SUM has its raison d'être,
> > while there where no arguments in favor of eventually removing EGO_SUM,
> > I hereby propose to undeprecate EGO_SUM.
> > 
> > 1: 
> > https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
> > 
> 
> "We've been rehashing the discussion until all opposition got tired
> and stopped replying, then we claim everyone agrees".

First of all, I am sorry for my tone.

I have been thinking about it and I was wrong to oppose this change.
I have been conflating two problem: EGO_SUM and Manifest sizes. 
However, while EGO_SUM might be an important factor contributing to
the latter, I think we shouldn't single it out and instead focus
on addressing the actual problem.

That said, I believe it's within maintainer's right to decide what API
to deprecate and what API to support.  So I'd suggest getting William's
approval for this rather than changing the supported API of that eclass
via drive-by commits.

-- 
Best regards,
Michał Górny




Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-13 Thread Michał Górny
On Mon, 2022-06-13 at 11:30 +0200, Florian Schmaus wrote:
> On 13/06/2022 10.29, Michał Górny wrote:
> > On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
> > > Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
> > > where some voices where in agreement that EGO_SUM has its raison d'être,
> > > while there where no arguments in favor of eventually removing EGO_SUM,
> > > I hereby propose to undeprecate EGO_SUM.
> > > 
> > > 1: 
> > > https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
> > > 
> > 
> > "We've been rehashing the discussion until all opposition got tired
> > and stopped replying, then we claim everyone agrees".
> 
> I understand this comment so that there was already a discussion about 
> deprecating and removing EGO_SUM. I usually try to follow what's going 
> on Gentoo and I remember the discussion about introducing dependency 
> tarballs. But I apparently have missed the part where EGO_SUM was slated 
> for removal. And it appears I am not the only one, at least Ionen also 
> wrote "Missed bits and pieces but was never quite sure why this went 
> toward full deprecation, just discouraged may have been fair enough, …".
> 
> In any case, I am sorry for bringing this discussion up again. But since 
> I started rehashing this, no arguments why EGO_SUM should be removed 
> have been provided. And so far, I failed to find the old discussions 
> where I'd hope to find some rationale behind the deprecation of EGO_SUM. :/
> 

I disagree.  Robin has made a pretty complete summary in his mail, with
numbers that prove how bad EGO_SUM is/was [1].  While he may have
disagreed with dependency tarballs, he brought pretty clear arguments
how EGO_SUM is even worse.  Multiplied by all the Gentoo systems that
won't ever install 95% of Go packages, yet all have to carry their
overhead.

[1]
https://archives.gentoo.org/gentoo-dev/message/8e2a4002bfc6258d65dcf725db347cb9

-- 
Best regards,
Michał Górny




Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-13 Thread Ulrich Mueller
> On Mon, 13 Jun 2022, Florian Schmaus wrote:

 Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
 where some voices where in agreement that EGO_SUM has its raison d'être,
 while there where no arguments in favor of eventually removing EGO_SUM,
 I hereby propose to undeprecate EGO_SUM.
 
 1: 
 https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa

>> Can this be done without requesting changes to package managers?

> What is 'this' here?

Undeprecating EGO_SUM.

> The patchset does not make changes to any package manager, just the
> go-module eclass.

> Note that this is not about finding about an alternative to dependency
> tarballs. It is just about re-allowing EGO_SUM in addition to
> dependency tarballs for packaging Go software in Gentoo.

OK. Thanks for the clarification.

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-13 Thread Florian Schmaus

On 13/06/2022 10.49, Ulrich Mueller wrote:

On Mon, 13 Jun 2022, Michał Górny wrote:



On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:

Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.

1: 
https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa


Can this be done without requesting changes to package managers?


What is 'this' here? The patchset does not make changes to any package 
manager, just the go-module eclass.


Note that this is not about finding about an alternative to dependency 
tarballs. It is just about re-allowing EGO_SUM in addition to dependency 
tarballs for packaging Go software in Gentoo.


- Flow




Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-13 Thread Florian Schmaus

On 13/06/2022 10.29, Michał Górny wrote:

On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:

Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.

1: 
https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa



"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".


I understand this comment so that there was already a discussion about 
deprecating and removing EGO_SUM. I usually try to follow what's going 
on Gentoo and I remember the discussion about introducing dependency 
tarballs. But I apparently have missed the part where EGO_SUM was slated 
for removal. And it appears I am not the only one, at least Ionen also 
wrote "Missed bits and pieces but was never quite sure why this went 
toward full deprecation, just discouraged may have been fair enough, …".


In any case, I am sorry for bringing this discussion up again. But since 
I started rehashing this, no arguments why EGO_SUM should be removed 
have been provided. And so far, I failed to find the old discussions 
where I'd hope to find some rationale behind the deprecation of EGO_SUM. :/


- Flow





Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-13 Thread Ulrich Mueller
> On Mon, 13 Jun 2022, Michał Górny wrote:

> On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
>> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
>> where some voices where in agreement that EGO_SUM has its raison d'être,
>> while there where no arguments in favor of eventually removing EGO_SUM,
>> I hereby propose to undeprecate EGO_SUM.
>> 
>> 1: 
>> https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa

Can this be done without requesting changes to package managers?
Previous examples are unexporting variables because their size exceeds
the limit of the Linux kernel [2], or introduction of additional phase
functions that bypass Manifest validation [3].

> "We've been rehashing the discussion until all opposition got tired
> and stopped replying, then we claim everyone agrees".

[2] https://bugs.gentoo.org/721088
[3] https://bugs.gentoo.org/833567


signature.asc
Description: PGP signature


Re: [gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-13 Thread Michał Górny
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
> Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
> where some voices where in agreement that EGO_SUM has its raison d'être,
> while there where no arguments in favor of eventually removing EGO_SUM,
> I hereby propose to undeprecate EGO_SUM.
> 
> 1: 
> https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
> 

"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".

-- 
Best regards,
Michał Górny




[gentoo-dev] Proposal to undeprecate EGO_SUM

2022-06-13 Thread Florian Schmaus
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.

1: 
https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa





Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-20 Thread Jason A. Donenfeld
Hi Robin,

On Wed, Apr 06, 2022 at 05:31:09PM +, Robin H. Johnson wrote:
> On Wed, Apr 06, 2022 at 07:06:30PM +0200, Jason A. Donenfeld wrote:
> > No, you're still missing the point.
> > 
> > If SHA-512 breaks, the security of the system fails, regardless of
> > what change we make. This is because GnuPG uses SHA-512 for its
> > signatures.
> Question directly for you Jason, because you make a professional study
> of this: does the type of breakage/successful attack against against
> SHA-512 matter?
> 
> e.g. is it possible that some type of attack would only work against the
> Manifest entry, but NOT against the GPG signature's embedded SHA-512 (or
> the opposite).
> 
> The best hypothetical idea I had was that there exists some large
> special input that lets an attacker reset the output to an arbitrary
> hash after their malicious payload: but it wouldn't fit in the GPG
> signature space.
 
Generally speaking, the more control an attacker has over the input, the
easier certain types of attacks might be. So maybe in the most general
sense that applies. I wouldn't model a security analysis around that,
though. Rather, the usual way to apply that sort of thinking is to
design algorithms that rely on certain properties of hash functions, but
not others; for example, Ed25519 does not rely on the hash function
being collision resistant due to its construction.

Jason



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-20 Thread Jason A. Donenfeld
Hey Robin,

Sorry for the delay in getting back to you. As mentioned on IRC, both of
your messages bounced earlier, and I was at a conference all last week.
Catching up with this thread now...

On Wed, Apr 06, 2022 at 05:23:25PM +, Robin H. Johnson wrote:
> On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote:
> > 2) Comparability: other distros use SHA2-512, as well as various
> > upstreams, which means we can compare our hashes to theirs easily.
> Can we expand on this specific thread for a moment?
> 
> I was the author of GLEP59 about changing the Manifest hashes, and I
> noted at the time, with references, that the effective strength of a set
> of hashes is only that of the strongest hash.
> 
> One of my regrets from GLEP59 is that it's made it harder for use cases
> outside of the normal user distfile workflow.
> 
> The use case that impacted me the most was being able to compare our
> distfiles were over time vs external sources, esp. if the file goes
> missing or was fetch-restricted and we can't produce a new hash of it.
> Maybe upstream only ever published SHA1/SHA256, and we only ever
> calculated SHA512/BLAKE2b on the file. Since we never had hashes from
> both sides at the same time, we cannot prove it was the same file.
> 
> We need to be able to ship one or more hashes to users, for the specific
> use case of validating the distfiles they download.
> 
> As a developer, I'd like to be able to track the other hashes for a
> file, without forcing ourselves to retain the file. This might be to
> compare with upstream published hashes, or to compare with other
> distros.
> 
> In fact it would be really nice to have a semi-automated pipeline to
> plug in signed upstream hashes to our Manifests, and make it possibly to
> prove our new SHA512/BLAKE2B hash was taken over the correct input in
> the first place, and there wasn't any subtle supply-chain attack early
> in the packaging process.
> 
> Where would those hashes go? They don't need to be in the Manifest, or
> at the very least they don't need to be distributed via rsync to users
> (it only costs a small amount of bytes to do so).
> 
> Where else could they go? 
> - Commit messages could work.
> - Git notes to a lesser degree.
> - alternate repos?

Interesting idea. This seems orthogonal to my proposal ("just use one
hash in the manifest and call it a day; make it the same as what gpg
uses for signing to minimize moving pieces"), and so I'm hesitant to
indulge too much in this thread, for fear of it being derailed with this
different thing you want.

With that said, I'm not quite sure I understood everything you're asking
for. You said that you want "to have a semi-automated pipeline to plug
in signed upstream hashes to our Manifests, and make it possibly to
prove our new SHA512/BLAKE2B hash was taken over the correct input", but
at the same time you also said that you want "to be able to track the
other hashes for a file, without forcing ourselves to retain the file."
What I'm wondering is: how do you propose that we calculate a SHA-512
hash of a file and "prove it correct" using, e.g., a signed SHA-256
hash, if we don't download the whole file?

It sounds like the thing that would be interesting to you would be for
infra to manage some sort of master hash database collecting all the
hashes from all over the internet of every file that hits distfiles,
verifying and then generating a bunch more hash variants of all kinds,
and then cross-verifying those with the hashes extracted from every
other distro, making for a wild hash verification aggregator machine. I
think I can see the utility of it. It would also unburden manifest
files, as those could then just have a SHA-512 hash and nothing else,
making things a bit lighter.


> > A reason why some people might prefer BLAKE2b over SHA2-512 is a
> > performance improvement. However, seeing as right now we're opening
> > the file, reading it, computing BLAKE2b, closing the file, opening the
> > file again, reading it again, computing SHA2-512, closing the file, I
> > don't think performance is actually something people care about. Seen
> > differently, removing either one of them will already give us a
> > performance "boost" or sorts.
> Or just only verifying the "strongest" hash gives you that boost.
> 
> I do want to check into the code that you pointed out, because I'm
> really sure much older versions of Portage did the CORRECT thing of only
> reading the file in a single pass.

Let me know if your findings are different from mine...

Jason



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-19 Thread Robin H. Johnson
On Wed, Apr 06, 2022 at 05:23:25PM +, Robin H. Johnson wrote:
> On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote:
> > 2) Comparability: other distros use SHA2-512, as well as various
> > upstreams, which means we can compare our hashes to theirs easily.
> Can we expand on this specific thread for a moment?
> 
> I was the author of GLEP59 about changing the Manifest hashes, and I
> noted at the time, with references, that the effective strength of a set
> of hashes is only that of the strongest hash.
Bump for my parent message, that I'm very surprised at the lack of
responses to two messages in this thread.

https://archives.gentoo.org/gentoo-dev/message/18216da0128ee79733fa68bb77fa8b69
https://archives.gentoo.org/gentoo-dev/message/a9974ec34dfb25810dab47e3fa322a52

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-12 Thread Mike Gilbert
On Mon, Apr 11, 2022 at 7:14 PM Joshua Kinard  wrote:
>
> On 4/5/2022 17:49, Jason A. Donenfeld wrote:
> > Hi Matt,
> >
> > On Tue, Apr 5, 2022 at 10:38 PM Matt Turner  wrote:
> >>
> >> On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld  
> >> wrote:
> >>> By the way, we're not currently _checking_ two hash functions during
> >>> src_prepare(), are we?
> >>
> >> I don't know, but the hash-checking is definitely checked before 
> >> src_prepare().
> >
> > Er, during the builtin fetch phase. Anyway, you know what I meant. :)
> >
> > Anyway, looking at the portage source code, to answer my own question,
> > it looks like the file is actually being read twice and both hashes
> > computed. I would have at least expected an optimization like:
> >
> > hash1_init(&hash1);
> > hash2_init(&hash2);
> > for chunks in file:
> > hash1_update(&hash1, chunk);
> > hash2_update(&hash2, chunk);
> > hash1_final(&hash1, out1);
> > hash2_final(&hash2, out2);
> >
> > But actually what's happening is the even less efficient:
> >
> > hash1_init(&hash1);
> > for chunks in file:
> > hash1_update(&hash1, chunk);
> > hash1_final(&hash1, out1);
> > hash2_init(&hash2);
> > for chunks in file:
> > hash2_update(&hash2, chunk);
> > hash1_final(&hash2, out2);
> >
> > So the file winds up being open and read twice. For huge tarballs like
> > chromium or libreoffice...
> >
> > But either way you do it - the missed optimization above or the
> > unoptimized reality below - there's still twice as much work being
> > done. This is all unless I've misread the source code, which is
> > possible, so if somebody knows this code well and I'm wrong here,
> > please do speak up.
>
> Not to go off-topic, but where in Portage's source is this logic at?  It
> seems like an easy fix for a slightly more efficient Portage.

I believe it's the portage.checksum.verify_all() function.

https://gitweb.gentoo.org/proj/portage.git/tree/lib/portage/checksum.py?h=portage-3.0.30#n471



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-11 Thread Joshua Kinard
On 4/5/2022 17:49, Jason A. Donenfeld wrote:
> Hi Matt,
> 
> On Tue, Apr 5, 2022 at 10:38 PM Matt Turner  wrote:
>>
>> On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld  wrote:
>>> By the way, we're not currently _checking_ two hash functions during
>>> src_prepare(), are we?
>>
>> I don't know, but the hash-checking is definitely checked before 
>> src_prepare().
> 
> Er, during the builtin fetch phase. Anyway, you know what I meant. :)
> 
> Anyway, looking at the portage source code, to answer my own question,
> it looks like the file is actually being read twice and both hashes
> computed. I would have at least expected an optimization like:
> 
> hash1_init(&hash1);
> hash2_init(&hash2);
> for chunks in file:
> hash1_update(&hash1, chunk);
> hash2_update(&hash2, chunk);
> hash1_final(&hash1, out1);
> hash2_final(&hash2, out2);
> 
> But actually what's happening is the even less efficient:
> 
> hash1_init(&hash1);
> for chunks in file:
> hash1_update(&hash1, chunk);
> hash1_final(&hash1, out1);
> hash2_init(&hash2);
> for chunks in file:
> hash2_update(&hash2, chunk);
> hash1_final(&hash2, out2);
> 
> So the file winds up being open and read twice. For huge tarballs like
> chromium or libreoffice...
> 
> But either way you do it - the missed optimization above or the
> unoptimized reality below - there's still twice as much work being
> done. This is all unless I've misread the source code, which is
> possible, so if somebody knows this code well and I'm wrong here,
> please do speak up.

Not to go off-topic, but where in Portage's source is this logic at?  It
seems like an easy fix for a slightly more efficient Portage.

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
rsa6144/5C63F4E3F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And
our lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-07 Thread Marek Szuba

On 2022-04-06 19:34, Rich Freeman wrote:


This is one of those low cost, low risk, high reward situations IMO.


*puts on Council hat*

The above pretty much covers my own opinion on the subject.

--
Marecki


OpenPGP_signature
Description: OpenPGP digital signature


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Rich Freeman
On Wed, Apr 6, 2022 at 1:29 PM Jason A. Donenfeld  wrote:
>
> Sort of. The security between infra and users relies on SHA2-512. The
> security between devs and infra relies on SHA-1. I guess the "full
> system" depends on both, but I've been focused on the more likely
> issue of a community-run mirror serving bogus files.

Well, that depends on how you're syncing the tree.  If you're using
rsync then there is a signed manifest in the root, so I agree in that
case it is just SHA2-512.  If you're syncing using git then the
manifests only reference distfiles, and the only link between the
commit and the tree/objects are their SHA-1 hashes until git adopts a
different hash function.

> Yea I see this argument, but I don't quite buy it. Maintaining two
> sets of hashes for the unlikely event that one gets broken AND we
> absolutely cannot incrementally transition gradually to an unbroken
> one seems rather overblown.

It is very much a hand-waving judgement call.  This is one of those
low cost, low risk, high reward situations IMO.  The cost of
calculating hashes is fairly low (especially if done in a more sane
way).  The odds it will ever have a benefit are low.  If it does have
a benefit, it will be in a situation where the world is on fire and
we'll be very happy to not have to go verify a gazillion distfiles on
top of everything else we have to fix.  I'll defer to those wiser than
me to make the call.  :)

-- 
Rich



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Ulrich Mueller
> On Wed, 06 Apr 2022, Jason A Donenfeld wrote:

> So I'll spell out the different possibilities:

> 1) GPG uses SHA-512. Manifest uses SHA-512 and BLAKE2b.
> 1a) Possibility: SHA-512 is broken. Result: system broken.
> 1b) Possibility: BLAKE2b is broken. Result: nothing.

> 2) GPG uses SHA-512. Manifest uses SHA-512.
> 2a) Possibility: SHA-512 is broken. Result: system broken.
> 2b) Possibility: BLAKE2b is broken. Result: nothing.

> 3) GPG uses SHA-512. Manifest uses BLAKE2b.
> 3a) Possibility: SHA-512 is broken. Result: system broken.
> 3b) Possibility: BLAKE2b is broken. Result: system broken.

> See how from a security perspective, (2) is not worse than (1), but
> (3) is worse than both (1) and (2)?

No it isn't. We can replace the top-level signature easily, but
replacing all Manifest hashes in the tree is hard (i.e. 1a and 3a are
trivial to fix, but 2a and 3b aren't).

I've said this multiple times now, so I'm out of here.

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Robin H. Johnson
On Wed, Apr 06, 2022 at 07:06:30PM +0200, Jason A. Donenfeld wrote:
> No, you're still missing the point.
> 
> If SHA-512 breaks, the security of the system fails, regardless of
> what change we make. This is because GnuPG uses SHA-512 for its
> signatures.
Question directly for you Jason, because you make a professional study
of this: does the type of breakage/successful attack against against
SHA-512 matter?

e.g. is it possible that some type of attack would only work against the
Manifest entry, but NOT against the GPG signature's embedded SHA-512 (or
the opposite).

The best hypothetical idea I had was that there exists some large
special input that lets an attacker reset the output to an arbitrary
hash after their malicious payload: but it wouldn't fit in the GPG
signature space.

> 
> So I'll spell out the different possibilities:
> 1) GPG uses SHA-512. Manifest uses SHA-512 and BLAKE2b.
score -1 + 0 = -1
> 2) GPG uses SHA-512. Manifest uses SHA-512.
score -1 + 0 = -1
> 3) GPG uses SHA-512. Manifest uses BLAKE2b.
score -1 + -1 = -2
> See how from a security perspective, (2) is not worse than (1), but
> (3) is worse than both (1) and (2)?
Yes, (2) is not worse than (1) for the overall security perspective.
That leaves the discussion does (1) have other benefits / value
propositions that make it worth less than (2). (see my other thread)

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Jason A. Donenfeld
Hi Rich,

On 4/6/22, Rich Freeman  wrote:
> On Tue, Apr 5, 2022 at 8:05 PM Sam James  wrote:
> Our security fails currently if EITHER SHA2-512 or a hardened version
> of SHA-1 are defeated.  Our top gpg signature is bound to a git commit
> record by SHA2-512, and the git commit record is bound to everything
> else in the repository (including the manifest objects) by SHA-1,
> because git hasn't transitioned away from that (as far as I'm aware it
> is still a work in progress - the SHA-1 algorithm it uses is hardened
> against known attacks).

Sort of. The security between infra and users relies on SHA2-512. The
security between devs and infra relies on SHA-1. I guess the "full
system" depends on both, but I've been focused on the more likely
issue of a community-run mirror serving bogus files.

> I agree that this is an unlikely scenario, so it is a judgement call
> as to whether the ease of recovery in the event of a failure is worth
> the cost to maintain the second hash.  I agree that we'd need double
> algorithms in the whole stack to prevent a failure, but in the current
> state we do have advantages for recovering from a failure after the
> fact.
>
> It seems that the likely scenario is that we get advance warning of
> weaknesses in a hash function, but without a practical exploit being
> readily available.  In that case we could do a  more orderly
> transition.  We'd still save time with the double hashed manifests,
> and whether this makes a difference is hard to say.

Yea I see this argument, but I don't quite buy it. Maintaining two
sets of hashes for the unlikely event that one gets broken AND we
absolutely cannot incrementally transition gradually to an unbroken
one seems rather overblown.

Jason



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Robin H. Johnson
On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote:
> 2) Comparability: other distros use SHA2-512, as well as various
> upstreams, which means we can compare our hashes to theirs easily.
Can we expand on this specific thread for a moment?

I was the author of GLEP59 about changing the Manifest hashes, and I
noted at the time, with references, that the effective strength of a set
of hashes is only that of the strongest hash.

One of my regrets from GLEP59 is that it's made it harder for use cases
outside of the normal user distfile workflow.

The use case that impacted me the most was being able to compare our
distfiles were over time vs external sources, esp. if the file goes
missing or was fetch-restricted and we can't produce a new hash of it.
Maybe upstream only ever published SHA1/SHA256, and we only ever
calculated SHA512/BLAKE2b on the file. Since we never had hashes from
both sides at the same time, we cannot prove it was the same file.

We need to be able to ship one or more hashes to users, for the specific
use case of validating the distfiles they download.

As a developer, I'd like to be able to track the other hashes for a
file, without forcing ourselves to retain the file. This might be to
compare with upstream published hashes, or to compare with other
distros.

In fact it would be really nice to have a semi-automated pipeline to
plug in signed upstream hashes to our Manifests, and make it possibly to
prove our new SHA512/BLAKE2B hash was taken over the correct input in
the first place, and there wasn't any subtle supply-chain attack early
in the packaging process.

Where would those hashes go? They don't need to be in the Manifest, or
at the very least they don't need to be distributed via rsync to users
(it only costs a small amount of bytes to do so).

Where else could they go? 
- Commit messages could work.
- Git notes to a lesser degree.
- alternate repos?

> A reason why some people might prefer BLAKE2b over SHA2-512 is a
> performance improvement. However, seeing as right now we're opening
> the file, reading it, computing BLAKE2b, closing the file, opening the
> file again, reading it again, computing SHA2-512, closing the file, I
> don't think performance is actually something people care about. Seen
> differently, removing either one of them will already give us a
> performance "boost" or sorts.
Or just only verifying the "strongest" hash gives you that boost.

I do want to check into the code that you pointed out, because I'm
really sure much older versions of Portage did the CORRECT thing of only
reading the file in a single pass.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Jason A. Donenfeld
Hi Ulrich,

On Wed, Apr 6, 2022 at 6:38 PM Ulrich Mueller  wrote:
> > Why? Then we're dependent on two things, either of which could break,
> > rather than one.
>
> See? If either of these should happen, then we'll be happy that we still
> have both hashes in our Manifest files.
>
> OTOH, if that argument is not relavant because the probability of both
> is close to zero, then (from a security POV) it doesn't matter which of
> the two hashes we remove.

No, you're still missing the point.

If SHA-512 breaks, the security of the system fails, regardless of
what change we make. This is because GnuPG uses SHA-512 for its
signatures.

So I'll spell out the different possibilities:

1) GPG uses SHA-512. Manifest uses SHA-512 and BLAKE2b.
1a) Possibility: SHA-512 is broken. Result: system broken.
1b) Possibility: BLAKE2b is broken. Result: nothing.

2) GPG uses SHA-512. Manifest uses SHA-512.
2a) Possibility: SHA-512 is broken. Result: system broken.
2b) Possibility: BLAKE2b is broken. Result: nothing.

3) GPG uses SHA-512. Manifest uses BLAKE2b.
3a) Possibility: SHA-512 is broken. Result: system broken.
3b) Possibility: BLAKE2b is broken. Result: system broken.

See how from a security perspective, (2) is not worse than (1), but
(3) is worse than both (1) and (2)?

Jason



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Ulrich Mueller
> On Wed, 06 Apr 2022, Jason A Donenfeld wrote:

> Why? Then we're dependent on two things, either of which could break,
> rather than one.

See? If either of these should happen, then we'll be happy that we still
have both hashes in our Manifest files.

OTOH, if that argument is not relavant because the probability of both
is close to zero, then (from a security POV) it doesn't matter which of
the two hashes we remove.

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-06 Thread Jason A. Donenfeld
Hi Ulrich,

On 4/6/22, Ulrich Mueller  wrote:
>> On Wed, 06 Apr 2022, Jason A Donenfeld wrote:
>
>> I think actually the argument I'm making this time might be subtly
>> different from the motions that folks went through last year.
>> Specifically, the idea last year was to switch to using BLAKE2b only.
>> I think what the arguments I'm making now point to is switching to
>> SHA2-512 only.
>
> Still, I think that if we drop one of the hashes then we should proceed
> with the original plan. That is, keep the more modern BLAKE2B (which was
> a participant of the SHA-3 competition [1]) and drop the older SHA512.

Why? Then we're dependent on two things, either of which could break,
rather than one.

To be clear, I'm a big fan of BLAKE2 myself and have used it in a
number of projects. And either one breaking would be a big deal. So
maybe it doesn't really matter that much. But strictly formally, it
seems like SHA512 is the most sound decision? I spelled out two
reasons for that to Sam; if you still disagree, maybe you can address
why you think my two reasons aren't very meaningful?

> I also think that the argument about the OpenPGP signature isn't very
> strong, because replacing that signature by another one using a
> different hash is trivial. As I said before, replacing all Manifest
> files in the tree isn't.

I looked into changing gnupg to use BLAKE2b for signatures, but it
doesn't appear to be supported. It's in gcrypt but not gpg. From
--version: `Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224`.
Since my argument rests on minimizing probability of a break, changing
the signature hash algo after it's broken doesn't help with much, so I
think this is something we'd want to happen now, rather than later, if
we're to use BLAKE2b exclusively.

I could potentially send a patch to gnupg for this if you want to take
the long path. But also: don't forget there's also the
interoperability argument that favors SHA512 too.

Jason



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-05 Thread Ulrich Mueller
> On Wed, 06 Apr 2022, Jason A Donenfeld wrote:

> I think actually the argument I'm making this time might be subtly
> different from the motions that folks went through last year.
> Specifically, the idea last year was to switch to using BLAKE2b only.
> I think what the arguments I'm making now point to is switching to
> SHA2-512 only.

Still, I think that if we drop one of the hashes then we should proceed
with the original plan. That is, keep the more modern BLAKE2B (which was
a participant of the SHA-3 competition [1]) and drop the older SHA512.

Back then, we had the choice between adding SHA3_512 and BLAKE2B, and we
preferred BLAKE2B for performance reasons.

I also think that the argument about the OpenPGP signature isn't very
strong, because replacing that signature by another one using a
different hash is trivial. As I said before, replacing all Manifest
files in the tree isn't.

Ulrich

[1] https://en.wikipedia.org/wiki/NIST_hash_function_competition


signature.asc
Description: PGP signature


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-05 Thread Rich Freeman
On Tue, Apr 5, 2022 at 8:05 PM Sam James  wrote:
> > On 5 Apr 2022, at 22:13, Jonas Stein  wrote:
> >
> >> In other words, what are we actually getting by having _both_ SHA2-512
> >> and BLAKE2b for every file in every Manifest?
> >
> > Implementations are often broken and we have to expect zero day attacks on 
> > hashes and on signatures. Hence it does not hurt to have a second hash.
>
> I don't think this is the case. They're not broken often, it's a very very 
> big deal when they do, and we'd also have far bigger problems in such a case 
> (as already pointed out, TLS would be an issue, but also GPG signatures, git 
> commit hashes, ...).

Our security fails currently if EITHER SHA2-512 or a hardened version
of SHA-1 are defeated.  Our top gpg signature is bound to a git commit
record by SHA2-512, and the git commit record is bound to everything
else in the repository (including the manifest objects) by SHA-1,
because git hasn't transitioned away from that (as far as I'm aware it
is still a work in progress - the SHA-1 algorithm it uses is hardened
against known attacks).

That said, I think there is still an argument for having two hashes in
the manifests.  If we have two independent manifests, then if either
SHA-1 or SHA2-512 are defeated all we need to do is update git+gpg to
the patched version (which no doubt would be rushed into a release
quickly), and then do a commit to the repo and sign it with the Gentoo
key.  The new commit would have a full set of new hashes using a
secure hash function, and then a back-reference to the previous commit
using SHA-1 (assuming we didn't rebase the entire tree and lose all
our historical gpg signatures - we might consider creating a new repo
and saving a historical one).   That would have new hashes all the way
from the top commit down to all the objects it references, so the top
commit would now be secure.  When signed with an updated gpg the
signature would be attached with a secure hash.  So now we're secure
again.  If we're concerned about old signatures getting recycled in
preimage attacks we could of course revoke the key and issue a new
one.

What we don't need to do is redo all the manifests, and that is
important because we don't actually have the ability to redo those
centrally.  Anybody can add a commit to the repo and re-sign it, but
we'd need all the maintainers to go through and generate new manifests
for anything that is fetch-restricted, or aggressively treeclean.

So it isn't that having two hashes can't fail, but rather that if it
does fail it is easier to recover.

>
> >
> > It is very likely that we can not trust in X for a while in the next years, 
> > but it is very unlikely that two different implementations are affected.
> >
>
> I don't think it is likely that e.g. SHA512 will be broken in the next few 
> years, no, but if it is going to be, we have far bigger issues and we'd need 
> to have double algorithms in our whole stack, which we don't have.

I agree that this is an unlikely scenario, so it is a judgement call
as to whether the ease of recovery in the event of a failure is worth
the cost to maintain the second hash.  I agree that we'd need double
algorithms in the whole stack to prevent a failure, but in the current
state we do have advantages for recovering from a failure after the
fact.

It seems that the likely scenario is that we get advance warning of
weaknesses in a hash function, but without a practical exploit being
readily available.  In that case we could do a  more orderly
transition.  We'd still save time with the double hashed manifests,
and whether this makes a difference is hard to say.

>
> > Additionally calculating a second hash does not cost anything.
>
> It does have a cost at both Manifest-generation time and emerge-time.

This is certainly true, though if the current algorithm is reading the
files twice we could at least fix that.

I don't really have a strong opinion here.  I just wanted to point out
the recovery benefit of having two hashes on just the manifests, given
that it isn't easy to access all the distfiles.  I also wanted to
point out that we have SHA-1 exposure today, at least in git.

-- 
Rich



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-05 Thread Sam James


> On 6 Apr 2022, at 01:15, Jason A. Donenfeld  wrote:
> 
> Hi Sam,
> 
> On Wed, Apr 6, 2022 at 2:02 AM Sam James  wrote:
>> This matches my views and recollection. We could revisit it
>> if there was a passionate advocate (which it looks like there may well be).
>> 
>> While I wasn't against it before, I was sort of ambivalent given
>> we had no strong reason to, but I'm more willing now given
>> we're also cleaning out other Portage cruft at the same time.
> 
> I think actually the argument I'm making this time might be subtly
> different from the motions that folks went through last year.
> Specifically, the idea last year was to switch to using BLAKE2b only.
> I think what the arguments I'm making now point to is switching to
> SHA2-512 only.

Oh, right. I see!

(Aside: I should've been clearer in my first email, what I meant was: I'm
fine with revisiting this, but I remember us feeling kind of lacklustre because
even the proposer (mgorny) ended up not having the oomph to push it through
given (small) opposition. I don't recall who had the stiff opposition at the 
time,
but I do recall it was only small, but nobody really felt like it was worth the 
hassle.

The overall Council feeling was "meh" without some momentum.)


> There are two reasons for this.
> 
> 1) Security: since the GPG signatures use SHA2-512, then the whole
> system breaks if SHA2-512 breaks. If we choose BLAKE2b as our only
> hash, then if either SHA2-512 or BLAKE2b break, then the system
> breaks. But if we choose SHA2-512 as our only hash, then we only need
> to worry about SHA2-512 breaking.
> 
> 2) Comparability: other distros use SHA2-512, as well as various
> upstreams, which means we can compare our hashes to theirs easily.
> 
> A reason why some people might prefer BLAKE2b over SHA2-512 is a
> performance improvement. However, seeing as right now we're opening
> the file, reading it, computing BLAKE2b, closing the file, opening the
> file again, reading it again, computing SHA2-512, closing the file, I
> don't think performance is actually something people care about. Seen
> differently, removing either one of them will already give us a
> performance "boost" or sorts.
> 

I think this seems pretty reasonable and I don't have any objection to it.

2) is a nice point and it's something Robin raised last time around too.

> Jason

best,
sam



signature.asc
Description: Message signed with OpenPGP


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-05 Thread Jason A. Donenfeld
Hi Sam,

On Wed, Apr 6, 2022 at 2:02 AM Sam James  wrote:
> This matches my views and recollection. We could revisit it
> if there was a passionate advocate (which it looks like there may well be).
>
> While I wasn't against it before, I was sort of ambivalent given
> we had no strong reason to, but I'm more willing now given
> we're also cleaning out other Portage cruft at the same time.

I think actually the argument I'm making this time might be subtly
different from the motions that folks went through last year.
Specifically, the idea last year was to switch to using BLAKE2b only.
I think what the arguments I'm making now point to is switching to
SHA2-512 only.

There are two reasons for this.

1) Security: since the GPG signatures use SHA2-512, then the whole
system breaks if SHA2-512 breaks. If we choose BLAKE2b as our only
hash, then if either SHA2-512 or BLAKE2b break, then the system
breaks. But if we choose SHA2-512 as our only hash, then we only need
to worry about SHA2-512 breaking.

2) Comparability: other distros use SHA2-512, as well as various
upstreams, which means we can compare our hashes to theirs easily.

A reason why some people might prefer BLAKE2b over SHA2-512 is a
performance improvement. However, seeing as right now we're opening
the file, reading it, computing BLAKE2b, closing the file, opening the
file again, reading it again, computing SHA2-512, closing the file, I
don't think performance is actually something people care about. Seen
differently, removing either one of them will already give us a
performance "boost" or sorts.

Jason



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-05 Thread Sam James


> On 5 Apr 2022, at 22:13, Jonas Stein  wrote:
> 
> Hi
> 
>> I'd like to propose the following for portage:
>> - Only support one "secure" hash function (such as sha2, sha3, blake2, etc)
>> - Only generate and parse one hash function in Manifest files
>> - Remove support for multiple hash functions
> 
> No, this has no benefit.

Which part has no benefit? I could see a case (although I don't think it's a 
super strong one)
for keeping support for multiple hash types in Portage, but only 1 in a 
Manifest.

I think Jason's made a fair case for dropping it.

> 
>> In other words, what are we actually getting by having _both_ SHA2-512
>> and BLAKE2b for every file in every Manifest?
> 
> Implementations are often broken and we have to expect zero day attacks on 
> hashes and on signatures. Hence it does not hurt to have a second hash.

I don't think this is the case. They're not broken often, it's a very very big 
deal when they do, and we'd also have far bigger problems in such a case (as 
already pointed out, TLS would be an issue, but also GPG signatures, git commit 
hashes, ...).

> 
> It is very likely that we can not trust in X for a while in the next years, 
> but it is very unlikely that two different implementations are affected.
> 

I don't think it is likely that e.g. SHA512 will be broken in the next few 
years, no, but if it is going to be, we have far bigger issues and we'd need to 
have double algorithms in our whole stack, which we don't have.

> Additionally calculating a second hash does not cost anything.

It does have a cost at both Manifest-generation time and emerge-time.

Thanks,
sam



signature.asc
Description: Message signed with OpenPGP


Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-05 Thread Jason A. Donenfeld
Hi Matt,

On Tue, Apr 5, 2022 at 10:38 PM Matt Turner  wrote:
>
> On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld  wrote:
> > By the way, we're not currently _checking_ two hash functions during
> > src_prepare(), are we?
>
> I don't know, but the hash-checking is definitely checked before 
> src_prepare().

Er, during the builtin fetch phase. Anyway, you know what I meant. :)

Anyway, looking at the portage source code, to answer my own question,
it looks like the file is actually being read twice and both hashes
computed. I would have at least expected an optimization like:

hash1_init(&hash1);
hash2_init(&hash2);
for chunks in file:
hash1_update(&hash1, chunk);
hash2_update(&hash2, chunk);
hash1_final(&hash1, out1);
hash2_final(&hash2, out2);

But actually what's happening is the even less efficient:

hash1_init(&hash1);
for chunks in file:
hash1_update(&hash1, chunk);
hash1_final(&hash1, out1);
hash2_init(&hash2);
for chunks in file:
hash2_update(&hash2, chunk);
hash1_final(&hash2, out2);

So the file winds up being open and read twice. For huge tarballs like
chromium or libreoffice...

But either way you do it - the missed optimization above or the
unoptimized reality below - there's still twice as much work being
done. This is all unless I've misread the source code, which is
possible, so if somebody knows this code well and I'm wrong here,
please do speak up.

Jason



Re: [gentoo-dev] proposal: use only one hash function in manifest files

2022-04-05 Thread Jason A. Donenfeld
Hi Jonas,

On Tue, Apr 5, 2022 at 11:20 PM Jonas Stein  wrote:
> > In other words, what are we actually getting by having _both_ SHA2-512
> > and BLAKE2b for every file in every Manifest?
>
> Implementations are often broken and we have to expect zero day attacks
> on hashes and on signatures. Hence it does not hurt to have a second hash.
>
> It is very likely that we can not trust in X for a while in the next
> years, but it is very unlikely that two different implementations are
> affected.

This is the part that doesn't really make any sense to me. The
security of the system reduces to the SHA512 used by those GPG
signatures. If SHA512 breaks, the fact that our Manifest files also
use BLAKE2b isn't going to help us, since an attacker could
presumably, in that case, forge the signatures that we're using as a
root of trust. I don't see what a second hash buys us from a security
perspective here. What attack model do you have where it makes sense?

> Additionally calculating a second hash does not cost anything.

How is that possible? Doesn't calculating two things always cost more
than calculating one? If what you actually mean is, "performance is
not important," we can discuss that, but it sounds like you're saying
that there's zero performance impact. How does that work exactly? Is
only one calculated at emerge time or something clever like that?

Jason



  1   2   3   4   5   >