Re: [gentoo-dev] [pre-GLEP r1] Gentoo binary package container format

2018-11-19 Thread Roy Bamford
On 2018.11.19 19:33, Rich Freeman wrote:
> On Mon, Nov 19, 2018 at 2:21 PM Roy Bamford 
> wrote:
> >
> > "The archive members support optional OpenPGP signatures.
> > The implementations must allow the user to specify whether OpenPGP
> > signatures are to be expected in remotely fetched packages."
> >
> > Or can the user specify that only some elements need to be signed?
> >
> > Is it a problem if not all elements are signed with the same key?
> > That could happen if one person makes a  binpackage and someone
> > else updates the metadata.
> >
> 
> IMO this is going a bit into PM details for a GLEP that is about
> container formats.
> 

Rich,

Not really. The GLEP needs to be clear about the signing.
Is it every element or none?
The GLEP hints that a mix of is possible with
 
If the implementation needs to manipulate archive members, it must
either create a new signature or discard the existing signature.

An individual binpackage could start life with all elements signed
by the same key.

Some element could be updated and the key for the signature of 
that element changed.

Later still, another element can be changed an have its signature
dropped.   

Should some combinations have no practical value, they should
not be permitted by the GLEP.

> -- 
> Rich
> 
> 
> 

-- 
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods


pgpaUbIZBgWaT.pgp
Description: PGP signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-19 Thread Kent Fredric
On Sun, 18 Nov 2018 12:00:48 +0100
Fabian Groffen  wrote:

> Your point is that the format is broken (== relies on obscure compressor
> feature).  My point is that the format simply requires a special tool.
> The fact that we prefer to use existing tools doesn't imply in any way
> that the format is broken to me.
> I think you should rewrite your point to mention that you don't want to
> use a tool that doesn't exist in @system (?) to unpack a binpkg.  My
> guess is that you could use some head/tail magic in a script if the
> trailing block is upsetting the decompressor.

The existing design to the best of my understanding poses problems when
it comes to adding new features, as the dependency on a "special tool"
becomes the bottleneck, as in order to add the new feature, the special
tool has to be adjusted to handle it, and potentially introduce serious
incompatible changes.

The alternative proposal stated in this pre-GLEP seems infinitely more
extensible, which means more room for 3rd-parties to add their own
features, while retaining basic portage interop.

For instance, I think a "nice" feature that could be added one day
would be the ability for the automated package builder to bundle:

- The ebuild that was used to build it
- All the eclasses that were used by the ebuild
- All the sources and patches that were used

And therein creating a fat bin/src hybrid, potentially allowing
rebuilding the exact same package with minor changes, independently of
portage repository changes.

And this may be useful for people who don't want the option set in the
binary build, but otherwise want the exact same material in a different
configuration.

In terms of user-friendliness, this could empower Gentoo in new ways,
in ways that compete with existing binary distributions wherein
upstreams publish .deb files for people to "just install".

Presently, the amount of additional hand-holding required (namely:
install this overlay, make sure you sync it right, etc, etc, etc) makes
it a little too "hands on" for some.

Now, I'm not saying Gentoo *should* do exactly this, but I like that
this approach gives us the *potential* to do this, and resultingly,
some downstream derivatives of Gentoo may be motivated to do something
like this, proving usable stand-alone bin-packages which interop nicely
with standard Gentoo installations, while also working nicely with
downstreams customizations.

Achieving this as it is requires downstream to develop their own
format, which is likely not going to work with standard Gentoo installs.


pgpLToo8zKb7q.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] [pre-GLEP r1] Gentoo binary package container format

2018-11-19 Thread Rich Freeman
On Mon, Nov 19, 2018 at 2:40 PM Zac Medico  wrote:
>
> On 11/19/18 11:33 AM, Rich Freeman wrote:
> > On Mon, Nov 19, 2018 at 2:21 PM Roy Bamford  wrote:
> >>
> >> "The archive members support optional OpenPGP signatures.
> >> The implementations must allow the user to specify whether OpenPGP
> >> signatures are to be expected in remotely fetched packages."
> >>
> >> Or can the user specify that only some elements need to be signed?
> >>
> >> Is it a problem if not all elements are signed with the same key?
> >> That could happen if one person makes a  binpackage and someone
> >> else updates the metadata.
> >>
> >
> > IMO this is going a bit into PM details for a GLEP that is about
> > container formats.
> >
> > Presumably any package manager is going to need to figure out what
> > keys are/aren't valid and allow the user to configure this behavior.
> > Users who want to go editing package innards will presumably adjust
> > their package manager settings to accept their modifications, whether
> > it means accepting their own sigs or disabling them.
>
> With the GLEP as it is, the user *must* use a local signing key to sign
> installed packages during the installation process if they want to be
> able to verify signatures for installed packages at some point in the
> future, since the binary package format does not provide a way to use
> binary package signatures for this purpose.

I think we might be talking about different signatures?

I think you're referring to signatures of the package files after they
are installed on the local filesystem, while I'm talking about
verifying the integrity of the package file themselves.

If these signatures are applied to different data then obviously you
couldn't just have the one signature serve double duty (unless you
hung onto the binary package, verified the signature on it, then
verified the package contents against the live filesystem).

The simplest solution would be to do as you seem to be suggesting -
verify the signature on the package before installing it, and then
during installation capture whatever metadata is already supported by
portage and sign that using a user's trusted key.

This seems like the most practical solution in any case since we
aren't likely to ever go down the route of using a single signed
squashfs for /usr like a release-based binary distro might.

-- 
Rich



Re: [gentoo-dev] [pre-GLEP r1] Gentoo binary package container format

2018-11-19 Thread Zac Medico
On 11/19/18 11:33 AM, Rich Freeman wrote:
> On Mon, Nov 19, 2018 at 2:21 PM Roy Bamford  wrote:
>>
>> "The archive members support optional OpenPGP signatures.
>> The implementations must allow the user to specify whether OpenPGP
>> signatures are to be expected in remotely fetched packages."
>>
>> Or can the user specify that only some elements need to be signed?
>>
>> Is it a problem if not all elements are signed with the same key?
>> That could happen if one person makes a  binpackage and someone
>> else updates the metadata.
>>
> 
> IMO this is going a bit into PM details for a GLEP that is about
> container formats.
> 
> Presumably any package manager is going to need to figure out what
> keys are/aren't valid and allow the user to configure this behavior.
> Users who want to go editing package innards will presumably adjust
> their package manager settings to accept their modifications, whether
> it means accepting their own sigs or disabling them.

With the GLEP as it is, the user *must* use a local signing key to sign
installed packages during the installation process if they want to be
able to verify signatures for installed packages at some point in the
future, since the binary package format does not provide a way to use
binary package signatures for this purpose.
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] [pre-GLEP r1] Gentoo binary package container format

2018-11-19 Thread Rich Freeman
On Mon, Nov 19, 2018 at 2:21 PM Roy Bamford  wrote:
>
> "The archive members support optional OpenPGP signatures.
> The implementations must allow the user to specify whether OpenPGP
> signatures are to be expected in remotely fetched packages."
>
> Or can the user specify that only some elements need to be signed?
>
> Is it a problem if not all elements are signed with the same key?
> That could happen if one person makes a  binpackage and someone
> else updates the metadata.
>

IMO this is going a bit into PM details for a GLEP that is about
container formats.

Presumably any package manager is going to need to figure out what
keys are/aren't valid and allow the user to configure this behavior.
Users who want to go editing package innards will presumably adjust
their package manager settings to accept their modifications, whether
it means accepting their own sigs or disabling them.

-- 
Rich



Re: [gentoo-dev] [pre-GLEP r1] Gentoo binary package container format

2018-11-19 Thread Roy Bamford
On 2018.11.19 18:35, Michał Górny wrote:
> Hi,
> 
> On Sat, 2018-11-17 at 12:21 +0100, Michał Górny wrote:
> > Here's a pre-GLEP draft based on the earlier discussion on gentoo-
> > portage-dev mailing list.  The specification uses GLEP form as it
> > provides for cleanly specifying the motivation and rationale.
> 
> Changes in -r1: took into account the feedback and restructured
> the motivation into pointing out advantages of the existing format,
> and focusing on the two real issues of non-transparency and OpenPGP
> implementations deficiencies.  Also added a section on why there's no
> explicit version number.
> 
> > Also available via HTTPS:
> > 
> > rst:  https://dev.gentoo.org/~mgorny/tmp/glep-0078.rst
> > html: https://dev.gentoo.org/~mgorny/tmp/glep-0078.html
> > 
> 
[snip]

Team,

Looks good to me. I can manually unpick the binpackage with tar.
Choose, if I will check the signatures or not, then spray files all
over my broken Gentoo with tar in the same way as I do now.

Implementation detail question. 
It appears that all members must be signed, or none of them since
  
"The archive members support optional OpenPGP signatures. 
The implementations must allow the user to specify whether OpenPGP 
signatures are to be expected in remotely fetched packages."

Or can the user specify that only some elements need to be signed?

Is it a problem if not all elements are signed with the same key?
That could happen if one person makes a  binpackage and someone
else updates the metadata.


> -- 
> Best regards,
> Michał Górny
> 

-- 
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods


pgpX6ueFyt3EF.pgp
Description: PGP signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-19 Thread Zac Medico
On 11/18/18 6:51 PM, Rich Freeman wrote:
> On Sun, Nov 18, 2018 at 5:40 PM Zac Medico  wrote:
>>
>> On 11/18/18 1:55 PM, Rich Freeman wrote:
>>>
>>> My idea is to basically have portage generate a tag with all the info
>>> needed to identify the "right" package, take a hash of it, and then
>>> stick that in the filename.  Then when portage is looking for a binary
>>> package to use at install time it generates the same tag using the
>>> same algorithm and looks for a matching hash.
>>
>> We've already had this handled for a couple years now, via
>> FEATURES=binpkg-multi-instance.
> 
> According to the make.conf manpage this simply numbers builds.  So, if
> you build something twice with the same config you end up with two
> duplicate files (wasteful).  Presumably if you had a large collection
> of these packages portage would have to read the metadata within each
> one to figure out which one is appropriate to install.  That would be
> expensive if IO is slow, such as when fetching packages online
> on-demand.
> 
> But, it obviously is somewhat of an improvement for Roy's use case.
> 
> IMO using a content-hash of certain metadata would eliminate
> duplication, and based on filename alone it would be clear whether the
> sought-after binary package exists or not.  As with the build numbers
> you couldn't tell from filename inspection what packages you have, but
> if you know what you want you could immediately find it.  IMO trying
> to cram all that metadata into a filename to make them more
> transparent isn't a good idea, and using hashes lets the user set
> their own policy regarding flexibility.  Heck, you could auto-gen
> symlinks for subsets of metadata (ie, the same file could be linked
> from a file that specifies its USE flags but not its CFLAGS, so it
> would be found if either an exact hit on CFLAGS was sought or if
> CFLAGS were considered unimportant).
> 
> But, I'm certainly not suggesting that you're not allowed to go to bed
> until you've built it.  :)

The existing ${PKGDIR}/Packages file optimizes metadata access for both
local an remote access, and performs well for reasonable numbers of
packages.

If you insist on mixing binary packages in the same ${PKGDIR} for a
large number of alternative configurations, then it will not scale
unless you create a way to send your local configuration to the server
so that it can select the relevant package list for you.

However, bear in mind that mixing alternative configurations in the same
${PKGDIR} might lead to undesirable results if there is anything
relevant that is unaccounted for in the package metadata. Possible
unaccounted things may include:

1) glibc version the package was built against
2) symbols and/or sonames not accounted for by slot operator dependencies
3) soname dependencies (--usepkgonly + --ignore-soname-deps=n handles this)
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] [pre-GLEP r1] Gentoo binary package container format

2018-11-19 Thread Michał Górny
Hi,

On Sat, 2018-11-17 at 12:21 +0100, Michał Górny wrote:
> Here's a pre-GLEP draft based on the earlier discussion on gentoo-
> portage-dev mailing list.  The specification uses GLEP form as it
> provides for cleanly specifying the motivation and rationale.

Changes in -r1: took into account the feedback and restructured
the motivation into pointing out advantages of the existing format,
and focusing on the two real issues of non-transparency and OpenPGP
implementations deficiencies.  Also added a section on why there's no
explicit version number.

> Also available via HTTPS:
> 
> rst:  https://dev.gentoo.org/~mgorny/tmp/glep-0078.rst
> html: https://dev.gentoo.org/~mgorny/tmp/glep-0078.html
> 

---
GLEP: 
Title: Gentoo binary package container format
Author: Michał Górny 
Type: Standards Track
Status: Draft
Version: 1
Created: 2018-11-15
Last-Modified: 2018-11-16
Post-History: 2018-11-17
Content-Type: text/x-rst
---

Abstract


This GLEP proposes a new binary package container format for Gentoo.
The current tbz2/XPAK format is shortly described, and its deficiences
are explained.  Accordingly, the requirements for a new format are set
and a gpkg format satisfying them is proposed.  The rationale for
the design decisions is provided.


Motivation
==

The current Portage binary package format
-

The historical ``.tbz2`` binary package format used by Portage is
a concatenation of two distinct formats: header-oriented compressed .tar
format (used to hold package files) and trailer-oriented custom XPAK
format (used to hold metadata)  [#MAN-XPAK]_.  The format has already
been extended incompatibly twice.

The first time, support for storing multiple successive builds of binary
package for a single ebuild version has been added.  This feature relies
on appending additional hyphen, followed by an integer to the package
filename.  It is disabled by default (preserving backwards
compatibility) and controlled by ``binpkg-multi-instance`` feature.

The second time, support for additional compression formats has been
added.  When format other than bzip2 is used, the ``.tbz2`` suffix
is replaced by ``.xpak`` and Portage relies on magic bytes to detect
compression used.  For backwards compatibility, Portage still defaults
to using bzip2; compression program can be switched using
``BINPKG_COMPRESS`` configuration variable.

Additionally, there have been minor changes to the stored metadata
and file storage policies.  In particular, behavior regarding
``INSTALL_MASK``, controllable file compression and stripping has
changed over time.


The advantages of tbz2/XPAK format
--

The tbz2/XPAK format used by Portage has three interesting features:

1. **Each binary package is fully contained within a single file.**
   While this might seem unnecessary, it makes it easier for the user
   to transfer binary packages without having to be concerned about
   finding all the necessary files to transfer.

2. **The binary packages are compatible with regular compressed
   tarballs, most of the time.**  With notable exceptions of historical
   versions of pbzip2 and the recent zstd compressor, tbz2/XPAK packages
   can be extracted using regular tar utility with a compressor
   implementation that discards trailing garbage.

3. **The metadata is uncompressed, and can be efficiently accessed
   without decompressing package contents.**  This includes
   the possibility of rewriting it (e.g. as a result of package moves)
   without the necessity of repacking the files.


Transparency problem with the current binary package format
---

Notwithstanding its advantages, the tbz2/XPAK format has a significant
design fault that consists of two issues:

1. **The XPAK format is a custom binary format with explicit use
   of binary-encoded file offsets and field lengths.**  As such, it is
   non-trivial to read or edit without specialized tools.  Such tools
   are currently implemented separately from the package manager,
   as part of the portage-utils toolkit, written in C [#PORTAGE-UTILS]_.

2. **The tarball compatibility feature relies on obscure feature of
   ignoring trailing garbage in compressed files**.  While this is
   implemented consistently in most of the compressors, this feature
   is not really a part of specification but rather traditional
   behavior.  Given that the original reasons for this no longer apply,
   new compressor implementations are likely to miss support for this.

Both of the issues make the format hard to use without dedicated tools,
or when the tools misbehave.  This impacts the following scenarios:

A. **Using binary packages for system recovery.**  In case of serious
   breakage, it is really preferable that the format depends on as few
   tools a possible, and especially not on Gentoo-specific tools.

B. **Inspecting binary packages in detail exceeding 

Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-19 Thread M. J. Everitt
On 18/11/18 22:40, Zac Medico wrote:
> On 11/18/18 1:55 PM, Rich Freeman wrote:
>> On Sun, Nov 18, 2018 at 4:10 PM Roy Bamford  wrote:
>>> Replying off list because I am not on the whitelist.
>> That seems odd.
>>
>>> 1) append a uuid to each filename. Generated when the bin package file is 
>>> generated.
>>> 2) encode the hostname of the machine that generated the file
>>> 3) encode the use flags in the filename.
>> So, I brought up this same issue in the earlier discussion and it was
>> considered out of scope, and I think this is fair.  The GLEP does not
>> specify filename, and IMO the standard for what goes INSIDE the file
>> will work just fine with any future enhancements that address exactly
>> this use case.
>>
>> Besides your case of building for a cluster, another use case is
>> having a central binary repo that portage could check and utilize when
>> a user's preferences happen to match what is pre-built.
>>
>> I suggest we start a different thread for any additional discussion of
>> this use case.  I was thinking and it probably wouldn't be super-hard
>> to actually start building something like this.  But, I don't want to
>> derail this GLEP as I don't see any reason designing something like
>> this needs to hold up the binary package format.  Both the existing
>> and proposed binary package formats will encode any metadata needed by
>> the package manager inside the file, and the only extension we need is
>> to encode identifying info in the filename.
>>
>> My idea is to basically have portage generate a tag with all the info
>> needed to identify the "right" package, take a hash of it, and then
>> stick that in the filename.  Then when portage is looking for a binary
>> package to use at install time it generates the same tag using the
>> same algorithm and looks for a matching hash.  If a hit is found then
>> it reads the complete metadata in the file and applies all the sanity
>> checks it already does.  Generating of binary packages with the hash
>> cold be made optional, and portage could also be configured to first
>> look for the matching hash, then fall back to the existing naming
>> convention, so that it would be compatible with existing generic
>> names.  So, users would get a choice as to whether they want to build
>> up a library of these packages, or just have each build overwrite the
>> last.
>>
>> Then the next step would be to allow these files to be fetched from a
>> binary repo optionally, and then finally we'd need tools to create the
>> repo.  But, this step isn't needed for your use case.  With the proper
>> optional switches you could utilize as much of this scheme as you
>> like.
>>
>> Also, you could optionally choose how much you want portage to encode
>> in the tag and look for.  Are you very fussy and only want a binary
>> package with matching CFLAGS/USE/whatever?  Or is just matching
>> USE/arch/etc enough?  Some of the existing portage options could
>> potentially be re-used here.
> We've already had this handled for a couple years now, via
> FEATURES=binpkg-multi-instance.
Working fine for me for catalyst ARM runs ...



signature.asc
Description: OpenPGP digital signature