Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-27 Thread Fabian Groffen
On 26-11-2018 21:13:53 +, Andrey Utkin wrote:
> On Wed, Nov 21, 2018 at 11:45:54AM +0100, Fabian Groffen wrote:
> > We agree it is hackish, and we agree we can do without.  You simply
> > exaggerate the problem, IMO, which mostly isn't there, because it works
> > fine today.  It can also be solved today using shell tools.
> 
> I am sad that you don't see it as a productivity impediment that the
> user is required to know the custom tooling to do even such a trivial
> non-standard action as manual extraction.

Huh?  tar -jxf doesn't do the trick for you?

> Maybe I will make myself look bad by admitting this, but I'm not meeting
> your expectations. I use Gentoo for ~11 years, and for about one year I
> am using my private binpkgs distributed to all my machines (i.e. I have
> read binary package guide fair number of times, but I stopped rereading
> it when I satisfied my needs). When in need, I still reached to trusty
> tar, and I did not even know what are the names of special tools (a
> toolchain?) qtbz2 and qxpak.
> 
> Just few days ago I messed with binpkgs for investigation purpose. I
> just wanted to extract few to somewhere (definitely not into system
> root), and read a core dump with GDB asking it to use those extracted
> files for debug symbols.
> 
> Of course I used `tar xaf`, because what I know is that it's honest tbz2
> just with metadata appended.
> 
>  # tar xaf boost-1.65.0.tbz2
> 
> bzip2: (stdin): trailing garbage after EOF ignored
> 
> Exit code is 0.
> But the notice is annoying (on subconscious level), because Silence Is
> Golden - "when a program has nothing interesting or surprising to say,
> it should shut up".

You seem to contradict yourself.  You didn't know the tools, yet you say
you needed to, to unpack the files.  But you show here you just unpacked
the files without said knowledge.

> > % head -c `grep -abo 'XPAKPACK' 
> > $EPREFIX/usr/portage/packages/sys-apps/sed-4.5.tbz2 | sed 's/:.*$//'` 
> > $EPREFIX/usr/portage/packages/sys-apps/sed-4.5.tbz2 | tar -jxf -
> > 
> > results in no warnings/errors from bzip about trailing garbage, possible
> > thanks to the spec being smart enough about this.
> 
> Thanks, this is a very concise **custom tool** to handle current binpkg
> format.

As is tar followed by tar.  The obvious advantage of the latter is that
you don't get a warning which could trigger you into thinking something
is wrong.  So, in my opinion, that is a better way of doing it compared
to the current way.

> > Not having to do this, when under stress and pressure to restore a
> > system to get it back into production, is a plus.  Though, in that
> > scenario the trailing garbage warning wouldn't have been that bad
> > either.
> 
> When understress and pressure, the irrelevant warning is not bad?
> I am sure it is really bad for operator's attention.

I've been using Gentoo binpkgs for a long while, I think something like
~14 years ago when I used them extensively.  Perhaps I'm an exception,
but back then I knew already there was an extra bit attached to the
tars, as were all my collegues around me back then.  The fact it comes
up now (as a surprise?) maybe means the knowledge has gone.  So good
thing we're replacing it with something easier to infer from inspecting
it.

Fabian


-- 
Fabian Groffen
Gentoo on a different level


signature.asc
Description: PGP signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-26 Thread Andrey Utkin
On Wed, Nov 21, 2018 at 11:45:54AM +0100, Fabian Groffen wrote:
> We agree it is hackish, and we agree we can do without.  You simply
> exaggerate the problem, IMO, which mostly isn't there, because it works
> fine today.  It can also be solved today using shell tools.

I am sad that you don't see it as a productivity impediment that the
user is required to know the custom tooling to do even such a trivial
non-standard action as manual extraction.

Maybe I will make myself look bad by admitting this, but I'm not meeting
your expectations. I use Gentoo for ~11 years, and for about one year I
am using my private binpkgs distributed to all my machines (i.e. I have
read binary package guide fair number of times, but I stopped rereading
it when I satisfied my needs). When in need, I still reached to trusty
tar, and I did not even know what are the names of special tools (a
toolchain?) qtbz2 and qxpak.

Just few days ago I messed with binpkgs for investigation purpose. I
just wanted to extract few to somewhere (definitely not into system
root), and read a core dump with GDB asking it to use those extracted
files for debug symbols.

Of course I used `tar xaf`, because what I know is that it's honest tbz2
just with metadata appended.

 # tar xaf boost-1.65.0.tbz2

bzip2: (stdin): trailing garbage after EOF ignored

Exit code is 0.
But the notice is annoying (on subconscious level), because Silence Is
Golden - "when a program has nothing interesting or surprising to say,
it should shut up".

> % head -c `grep -abo 'XPAKPACK' 
> $EPREFIX/usr/portage/packages/sys-apps/sed-4.5.tbz2 | sed 's/:.*$//'` 
> $EPREFIX/usr/portage/packages/sys-apps/sed-4.5.tbz2 | tar -jxf -
> 
> results in no warnings/errors from bzip about trailing garbage, possible
> thanks to the spec being smart enough about this.

Thanks, this is a very concise **custom tool** to handle current binpkg
format.

> Not having to do this, when under stress and pressure to restore a
> system to get it back into production, is a plus.  Though, in that
> scenario the trailing garbage warning wouldn't have been that bad
> either.

When understress and pressure, the irrelevant warning is not bad?
I am sure it is really bad for operator's attention.


signature.asc
Description: Digital signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-21 Thread Michał Górny
On Wed, 2018-11-21 at 11:45 +0100, Fabian Groffen wrote:
> > > > > > 5. **Metadata is not compressed.**  This is not a significant 
> > > > > > problem,
> > > > > >it is just listed for completeness.
> > > > > > 
> > > > > > 
> > > > > > Goals for a new container format
> > > > > > 
> > > > > > 
> > > > > > The following goals have been set for a replacement format:
> > > > > > 
> > > > > > 1. **The packages must remain contained in a single file.**  As a 
> > > > > > matter
> > > > > >of user convenience, it should be possible to transfer binary
> > > > > >packages without having to use multiple files, and to install 
> > > > > > them
> > > > > >from any location.
> > > > > > 
> > > > > > 2. **The file format must be entirely based on common file formats,
> > > > > >respecting best practices, with as little customization as 
> > > > > > necessary
> > > > > >to satisfy the requirements.**  In particular, it is unacceptable
> > > > > >to create new binary formats.
> > > > > 
> > > > > I take this as your personal opinion.  I don't quite get why it is
> > > > > unacceptable to create a new binary format though.  In particular when
> > > > > you're looking for efficiency, such format could serve your purposes.
> > > > > As long as it's clearly defined, I don't see the problem with a binary
> > > > > format either.
> > > > > Could you add why it is you think binary formats are unacceptable 
> > > > > here?
> > > > 
> > > > Because custom binary formats require specialized tooling, and are
> > > > a royal PITA when the user wants to do something that the author of
> > > > specialized tooling just happened not to think worthwhile, or when
> > > > the tooling is not available for some reason.  And before you ask really
> > > > silly questions, yes, I did fight binary packages over hex editor
> > > > at some point.
> > > 
> > > Which I still don't understand, to be frank.  I think even Portage
> > > exposes python APIs to get to the data.
> > 
> > Compare the time needed to make a trivial (but unforeseen) change
> > on a format that's transparent vs a format that requires you to learn
> > its spec and/or API, write a program and debug it.
> 
> I was under the impression you could unpack a tbz2 into data and xpak,
> then unpack both, modify the contents with an editor or whatever, and
> then pack the whole stuff back into a tbz2 again.  This can be done
> worst case scenario by emerge -k , modifying the vdb and quickpkg
>  afterwards.

In the described example, the whole necessity of modifying the binary
package arises from it being broken, therefore unsuitable for
'emerge -k'.

> I know that with portage-utils you can do this easily with the qtbz2 and
> qxpak commands.  No need to do anything with a hex editor, or know
> anything about how it's done.

Actually, you need to:

a. know that portage-utils has the appropriate tools (it's non-obvious),

b. know how to use portage-utils.

This is non-obvious.  It took me a while to figure out that I need to
use qtbz2 before using qxpak (why would it work only on split data when
the format is explicitly written to be used on top of compressed
archive?!).

> Obvious advantage of your approach is that you don't need q* tools, but
> can use tar instead.  The editting is as trivial though.  In your case
> you need a special procedure to reconstruct the binpkg should you want
> to keep your special properties (label, order) which equates to q* tools
> somewhat.

Except you don't need to keep them.  The spec is quite explicit that
they're optimizations and that the package must work even if they're
lost as a part of editing exercise.

> 
> > > > The most trivial case is an attempted recovery of a broken system.
> > > > If you don't have Portage working and don't have portage-utils
> > > > installed, do you really prefer a custom format which will require you
> > > > to fetch and compile special tools?  Or is one that can be processed
> > > > with tools you're quite likely to have on every system, like tar?
> > > 
> > > Well, I think the idea behind the original binpkg format was to use tar
> > > directly on the files in emergency scenarios like these...
> > > The assumption was bzip2 decompressor and tar being available.
> > > I think it is an example of how you add something, while still allowing
> > > to fallback on existing tools.
> > 
> > Except progress in compressors has made it work less and less reliably. 
> > It's mostly an example how to be *clever*.  However, being clever
> > usually doesn't pay off in the long term, compared to doing things *in a
> > simple way*.
> 
> We agree it is hackish, and we agree we can do without.  You simply
> exaggerate the problem, IMO, which mostly isn't there, because it works
> fine today.  It can also be solved today using shell tools.
> 
> % head -c `grep -abo 'XPAKPACK' 
> $EPREFIX/usr/portage/packages/sys-apps/sed-4.5.tbz2 | sed 's/:.*$//'` 
> $EPREFIX/usr/portage/package

Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-21 Thread Fabian Groffen
On 21-11-2018 10:33:18 +0100, Michał Górny wrote:
> > > > > 2. **The format relies on obscure compressor feature of ignoring
> > > > >trailing garbage**.  While this behavior is traditionally 
> > > > > implemented
> > > > >by many compressors, the original reasons for it have become long
> > > > >irrelevant and it is not surprising that new compressors do not
> > > > >support it.  In particular, Portage already hit this problem twice:
> > > > >once when users replaced bzip2 with parallel-capable pbzip2
> > > > >implementation [#PBZIP2]_, and the second time when support for 
> > > > > zstd
> > > > >compressor was added [#ZSTD]_.
> > > > 
> > > > I think this is actually the result of a rather opportunistic
> > > > implementation.  The fault is that we chose to use an extension that
> > > > suggests the file is a regular compressed tarball.
> > > > When one detects that a file is xpak padded, it is trivial to feed the
> > > > decompressor just the relevant part of the datastream.  The format
> > > > itself isn't bad, and doesn't rely on obscure behaviour.
> > > 
> > > Except if you don't have the proper tools installed.  In which case
> > > the 'opportunistic' behavior made it possible to extract the contents
> > > without special tools... except when it actually happens not to work
> > > anymore.  Roy's reply indicates that there is actually interest in this
> > > design feature.
> > 
> > Your point is that the format is broken (== relies on obscure compressor
> > feature).  My point is that the format simply requires a special tool.
> > The fact that we prefer to use existing tools doesn't imply in any way
> > that the format is broken to me.
> > I think you should rewrite your point to mention that you don't want to
> > use a tool that doesn't exist in @system (?) to unpack a binpkg.  My
> > guess is that you could use some head/tail magic in a script if the
> > trailing block is upsetting the decompressor.
> > 
> > I'm not saying this may look ugly, I'm just saying that your point seems
> > biased.
> 
> I've spent a significant effort rewriting those point to make it clear
> what the problem is, and separating it from other changes 'worth doing
> while we're changing stuff'.  Hope that satisfies your nitpicking.

Yes it does, thank you.

> > > > > 3. **Placing metadata at the end of file makes partial fetches
> > > > >complex.**  While it is technically possible to obtain package
> > > > >metadata remotely without fetching the whole package, it usually
> > > > >requires e.g. 2-3 HTTP requests with rather complex driver.  For
> > > > >comparison, if metadata was placed at the beginning of the file,
> > > > >early-terminated pipeline with a single fetch request would 
> > > > > suffice.
> > > > 
> > > > I think this point needs to be quantified somewhat why it is so
> > > > important.
> > > > I may be wrong, but the average binpkg is small, <1MiB, bigger packages
> > > > are <50MiB.
> > > > So what is the gain to be saved here?  A "few" MiBs for what operation
> > > > exactly?  I say "few" because I know for some users this is actually not
> > > > just a blib before it's downloaded.  So if this is possible to achieve,
> > > > in what scenarios is this going to be used (and is this often?).
> > > 
> > > Last I checked, Gentoo aimed to support more users than the 'majority'
> > > of people with high-throughput Internet access.  If there's no cost
> > > in doing things better, why not do them better?
> > 
> > You didn't address the critical question, but instead just repeated what
> > I said.
> > So again, why do you need to read just the metadata?
> 
> The original idea was to provide the ability of indexing remote packages
> without having a server-side cache available (or up-to-date).  In order
> to do that, the package manager would need to fetch the metadata of all
> packages (but there's no necessity in fetching the whole packages). 
> However, that's merely a possible future idea.  It's not worth debating
> today.
> 
> Today I really understood the point of avoiding premature optimization. 
> Even if the change is practically zero-cost and harmless (as it's simply
> reordering files), it's going to cost you a lot of time because someone
> will keep nitpicking on it, even though any other order will not change
> anything.

Perhaps next time don't put as much emphasize on it.  I can see now what
you aim for, but it simply raises more questions and concerns to me than
it resolves.  There is nothing wrong with putting in such future
possibility though, if easily possible and not colliding with anything
else.

> > > > > 4. **Extending the format with OpenPGP signatures is non-trivial.**
> > > > >Depending on the implementation details, it either requires 
> > > > > fetching
> > > > >additional detached signature, breaking backwards compatibility or
> > > > >introducing more custom logic to reassemble OpenPGP packets.
> > > > 
> > > > I think one 

Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-21 Thread Michał Górny
On Sun, 2018-11-18 at 12:00 +0100, Fabian Groffen wrote:
> On 18-11-2018 10:38:51 +0100, Michał Górny wrote:
> > On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote:
> > > On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> > > > Problems with the current binary package format
> > > > ---
> > > > 
> > > > The following problems were identified with the package format currently
> > > > in use:
> > > > 
> > > > 1. **The packages rely on custom binary archive format to store
> > > >metadata.**  It is entirely Gentoo invented, and requires dedicated
> > > >tooling to work with it.  In fact, the reference implementation
> > > >in Portage does not even include a CLI tool to work with tbz2
> > > >packages; an unofficial implementation is provided as part
> > > >of portage-utils toolkit [#PORTAGE-UTILS]_.
> > > 
> > > I think you should rewrite this section to the argument that the
> > > metadata is hard to edit, and that there is only one tool to do so
> > > (except a python interface from Portage?).
> > > On a separate note, I don't think portage-utils can be considered
> > > "unofficial", it is a Gentoo official project as far as I am aware.
> > 
> > In this context, Portage is 'official'.  Portage-utils is a project
> > that's developed entirely separately from Portage and doesn't use
> > Portage APIs but instead reinvents everything.  As such, it is easy for
> > the two to go out of sync.  Or for one of them to have bugs that
> > the other one doesn't have (say, with endianness).
> 
> I'm not sure if it's actually true, I was under the impression the same
> author(s) worked on the Portage as well as portage-utils code.  Anyway,
> aren't quickpkg and emerge enough from a user's perspective?

Gentoo users have a wide perspective.  Assuming that you can think of
all things the users need and you don't need to care beyond that
is plain wrong and results in Windows.

> > > > 2. **The format relies on obscure compressor feature of ignoring
> > > >trailing garbage**.  While this behavior is traditionally implemented
> > > >by many compressors, the original reasons for it have become long
> > > >irrelevant and it is not surprising that new compressors do not
> > > >support it.  In particular, Portage already hit this problem twice:
> > > >once when users replaced bzip2 with parallel-capable pbzip2
> > > >implementation [#PBZIP2]_, and the second time when support for zstd
> > > >compressor was added [#ZSTD]_.
> > > 
> > > I think this is actually the result of a rather opportunistic
> > > implementation.  The fault is that we chose to use an extension that
> > > suggests the file is a regular compressed tarball.
> > > When one detects that a file is xpak padded, it is trivial to feed the
> > > decompressor just the relevant part of the datastream.  The format
> > > itself isn't bad, and doesn't rely on obscure behaviour.
> > 
> > Except if you don't have the proper tools installed.  In which case
> > the 'opportunistic' behavior made it possible to extract the contents
> > without special tools... except when it actually happens not to work
> > anymore.  Roy's reply indicates that there is actually interest in this
> > design feature.
> 
> Your point is that the format is broken (== relies on obscure compressor
> feature).  My point is that the format simply requires a special tool.
> The fact that we prefer to use existing tools doesn't imply in any way
> that the format is broken to me.
> I think you should rewrite your point to mention that you don't want to
> use a tool that doesn't exist in @system (?) to unpack a binpkg.  My
> guess is that you could use some head/tail magic in a script if the
> trailing block is upsetting the decompressor.
> 
> I'm not saying this may look ugly, I'm just saying that your point seems
> biased.

I've spent a significant effort rewriting those point to make it clear
what the problem is, and separating it from other changes 'worth doing
while we're changing stuff'.  Hope that satisfies your nitpicking.

> > > > 3. **Placing metadata at the end of file makes partial fetches
> > > >complex.**  While it is technically possible to obtain package
> > > >metadata remotely without fetching the whole package, it usually
> > > >requires e.g. 2-3 HTTP requests with rather complex driver.  For
> > > >comparison, if metadata was placed at the beginning of the file,
> > > >early-terminated pipeline with a single fetch request would suffice.
> > > 
> > > I think this point needs to be quantified somewhat why it is so
> > > important.
> > > I may be wrong, but the average binpkg is small, <1MiB, bigger packages
> > > are <50MiB.
> > > So what is the gain to be saved here?  A "few" MiBs for what operation
> > > exactly?  I say "few" because I know for some users this is actually not
> > > just a blib before it's downloaded.  So if this is possible to achieve,
> > > in what scenarios

Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-19 Thread Kent Fredric
On Sun, 18 Nov 2018 12:00:48 +0100
Fabian Groffen  wrote:

> Your point is that the format is broken (== relies on obscure compressor
> feature).  My point is that the format simply requires a special tool.
> The fact that we prefer to use existing tools doesn't imply in any way
> that the format is broken to me.
> I think you should rewrite your point to mention that you don't want to
> use a tool that doesn't exist in @system (?) to unpack a binpkg.  My
> guess is that you could use some head/tail magic in a script if the
> trailing block is upsetting the decompressor.

The existing design to the best of my understanding poses problems when
it comes to adding new features, as the dependency on a "special tool"
becomes the bottleneck, as in order to add the new feature, the special
tool has to be adjusted to handle it, and potentially introduce serious
incompatible changes.

The alternative proposal stated in this pre-GLEP seems infinitely more
extensible, which means more room for 3rd-parties to add their own
features, while retaining basic portage interop.

For instance, I think a "nice" feature that could be added one day
would be the ability for the automated package builder to bundle:

- The ebuild that was used to build it
- All the eclasses that were used by the ebuild
- All the sources and patches that were used

And therein creating a fat bin/src hybrid, potentially allowing
rebuilding the exact same package with minor changes, independently of
portage repository changes.

And this may be useful for people who don't want the option set in the
binary build, but otherwise want the exact same material in a different
configuration.

In terms of user-friendliness, this could empower Gentoo in new ways,
in ways that compete with existing binary distributions wherein
upstreams publish .deb files for people to "just install".

Presently, the amount of additional hand-holding required (namely:
install this overlay, make sure you sync it right, etc, etc, etc) makes
it a little too "hands on" for some.

Now, I'm not saying Gentoo *should* do exactly this, but I like that
this approach gives us the *potential* to do this, and resultingly,
some downstream derivatives of Gentoo may be motivated to do something
like this, proving usable stand-alone bin-packages which interop nicely
with standard Gentoo installations, while also working nicely with
downstreams customizations.

Achieving this as it is requires downstream to develop their own
format, which is likely not going to work with standard Gentoo installs.


pgpLToo8zKb7q.pgp
Description: OpenPGP digital signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-19 Thread Zac Medico
On 11/18/18 6:51 PM, Rich Freeman wrote:
> On Sun, Nov 18, 2018 at 5:40 PM Zac Medico  wrote:
>>
>> On 11/18/18 1:55 PM, Rich Freeman wrote:
>>>
>>> My idea is to basically have portage generate a tag with all the info
>>> needed to identify the "right" package, take a hash of it, and then
>>> stick that in the filename.  Then when portage is looking for a binary
>>> package to use at install time it generates the same tag using the
>>> same algorithm and looks for a matching hash.
>>
>> We've already had this handled for a couple years now, via
>> FEATURES=binpkg-multi-instance.
> 
> According to the make.conf manpage this simply numbers builds.  So, if
> you build something twice with the same config you end up with two
> duplicate files (wasteful).  Presumably if you had a large collection
> of these packages portage would have to read the metadata within each
> one to figure out which one is appropriate to install.  That would be
> expensive if IO is slow, such as when fetching packages online
> on-demand.
> 
> But, it obviously is somewhat of an improvement for Roy's use case.
> 
> IMO using a content-hash of certain metadata would eliminate
> duplication, and based on filename alone it would be clear whether the
> sought-after binary package exists or not.  As with the build numbers
> you couldn't tell from filename inspection what packages you have, but
> if you know what you want you could immediately find it.  IMO trying
> to cram all that metadata into a filename to make them more
> transparent isn't a good idea, and using hashes lets the user set
> their own policy regarding flexibility.  Heck, you could auto-gen
> symlinks for subsets of metadata (ie, the same file could be linked
> from a file that specifies its USE flags but not its CFLAGS, so it
> would be found if either an exact hit on CFLAGS was sought or if
> CFLAGS were considered unimportant).
> 
> But, I'm certainly not suggesting that you're not allowed to go to bed
> until you've built it.  :)

The existing ${PKGDIR}/Packages file optimizes metadata access for both
local an remote access, and performs well for reasonable numbers of
packages.

If you insist on mixing binary packages in the same ${PKGDIR} for a
large number of alternative configurations, then it will not scale
unless you create a way to send your local configuration to the server
so that it can select the relevant package list for you.

However, bear in mind that mixing alternative configurations in the same
${PKGDIR} might lead to undesirable results if there is anything
relevant that is unaccounted for in the package metadata. Possible
unaccounted things may include:

1) glibc version the package was built against
2) symbols and/or sonames not accounted for by slot operator dependencies
3) soname dependencies (--usepkgonly + --ignore-soname-deps=n handles this)
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-19 Thread M. J. Everitt
On 18/11/18 22:40, Zac Medico wrote:
> On 11/18/18 1:55 PM, Rich Freeman wrote:
>> On Sun, Nov 18, 2018 at 4:10 PM Roy Bamford  wrote:
>>> Replying off list because I am not on the whitelist.
>> That seems odd.
>>
>>> 1) append a uuid to each filename. Generated when the bin package file is 
>>> generated.
>>> 2) encode the hostname of the machine that generated the file
>>> 3) encode the use flags in the filename.
>> So, I brought up this same issue in the earlier discussion and it was
>> considered out of scope, and I think this is fair.  The GLEP does not
>> specify filename, and IMO the standard for what goes INSIDE the file
>> will work just fine with any future enhancements that address exactly
>> this use case.
>>
>> Besides your case of building for a cluster, another use case is
>> having a central binary repo that portage could check and utilize when
>> a user's preferences happen to match what is pre-built.
>>
>> I suggest we start a different thread for any additional discussion of
>> this use case.  I was thinking and it probably wouldn't be super-hard
>> to actually start building something like this.  But, I don't want to
>> derail this GLEP as I don't see any reason designing something like
>> this needs to hold up the binary package format.  Both the existing
>> and proposed binary package formats will encode any metadata needed by
>> the package manager inside the file, and the only extension we need is
>> to encode identifying info in the filename.
>>
>> My idea is to basically have portage generate a tag with all the info
>> needed to identify the "right" package, take a hash of it, and then
>> stick that in the filename.  Then when portage is looking for a binary
>> package to use at install time it generates the same tag using the
>> same algorithm and looks for a matching hash.  If a hit is found then
>> it reads the complete metadata in the file and applies all the sanity
>> checks it already does.  Generating of binary packages with the hash
>> cold be made optional, and portage could also be configured to first
>> look for the matching hash, then fall back to the existing naming
>> convention, so that it would be compatible with existing generic
>> names.  So, users would get a choice as to whether they want to build
>> up a library of these packages, or just have each build overwrite the
>> last.
>>
>> Then the next step would be to allow these files to be fetched from a
>> binary repo optionally, and then finally we'd need tools to create the
>> repo.  But, this step isn't needed for your use case.  With the proper
>> optional switches you could utilize as much of this scheme as you
>> like.
>>
>> Also, you could optionally choose how much you want portage to encode
>> in the tag and look for.  Are you very fussy and only want a binary
>> package with matching CFLAGS/USE/whatever?  Or is just matching
>> USE/arch/etc enough?  Some of the existing portage options could
>> potentially be re-used here.
> We've already had this handled for a couple years now, via
> FEATURES=binpkg-multi-instance.
Working fine for me for catalyst ARM runs ...



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-18 Thread Rich Freeman
On Sun, Nov 18, 2018 at 5:40 PM Zac Medico  wrote:
>
> On 11/18/18 1:55 PM, Rich Freeman wrote:
> >
> > My idea is to basically have portage generate a tag with all the info
> > needed to identify the "right" package, take a hash of it, and then
> > stick that in the filename.  Then when portage is looking for a binary
> > package to use at install time it generates the same tag using the
> > same algorithm and looks for a matching hash.
>
> We've already had this handled for a couple years now, via
> FEATURES=binpkg-multi-instance.

According to the make.conf manpage this simply numbers builds.  So, if
you build something twice with the same config you end up with two
duplicate files (wasteful).  Presumably if you had a large collection
of these packages portage would have to read the metadata within each
one to figure out which one is appropriate to install.  That would be
expensive if IO is slow, such as when fetching packages online
on-demand.

But, it obviously is somewhat of an improvement for Roy's use case.

IMO using a content-hash of certain metadata would eliminate
duplication, and based on filename alone it would be clear whether the
sought-after binary package exists or not.  As with the build numbers
you couldn't tell from filename inspection what packages you have, but
if you know what you want you could immediately find it.  IMO trying
to cram all that metadata into a filename to make them more
transparent isn't a good idea, and using hashes lets the user set
their own policy regarding flexibility.  Heck, you could auto-gen
symlinks for subsets of metadata (ie, the same file could be linked
from a file that specifies its USE flags but not its CFLAGS, so it
would be found if either an exact hit on CFLAGS was sought or if
CFLAGS were considered unimportant).

But, I'm certainly not suggesting that you're not allowed to go to bed
until you've built it.  :)

-- 
Rich



Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-18 Thread Zac Medico
On 11/18/18 1:55 PM, Rich Freeman wrote:
> On Sun, Nov 18, 2018 at 4:10 PM Roy Bamford  wrote:
>>
>> Replying off list because I am not on the whitelist.
> 
> That seems odd.
> 
>> 1) append a uuid to each filename. Generated when the bin package file is 
>> generated.
>> 2) encode the hostname of the machine that generated the file
>> 3) encode the use flags in the filename.
> 
> So, I brought up this same issue in the earlier discussion and it was
> considered out of scope, and I think this is fair.  The GLEP does not
> specify filename, and IMO the standard for what goes INSIDE the file
> will work just fine with any future enhancements that address exactly
> this use case.
> 
> Besides your case of building for a cluster, another use case is
> having a central binary repo that portage could check and utilize when
> a user's preferences happen to match what is pre-built.
> 
> I suggest we start a different thread for any additional discussion of
> this use case.  I was thinking and it probably wouldn't be super-hard
> to actually start building something like this.  But, I don't want to
> derail this GLEP as I don't see any reason designing something like
> this needs to hold up the binary package format.  Both the existing
> and proposed binary package formats will encode any metadata needed by
> the package manager inside the file, and the only extension we need is
> to encode identifying info in the filename.
> 
> My idea is to basically have portage generate a tag with all the info
> needed to identify the "right" package, take a hash of it, and then
> stick that in the filename.  Then when portage is looking for a binary
> package to use at install time it generates the same tag using the
> same algorithm and looks for a matching hash.  If a hit is found then
> it reads the complete metadata in the file and applies all the sanity
> checks it already does.  Generating of binary packages with the hash
> cold be made optional, and portage could also be configured to first
> look for the matching hash, then fall back to the existing naming
> convention, so that it would be compatible with existing generic
> names.  So, users would get a choice as to whether they want to build
> up a library of these packages, or just have each build overwrite the
> last.
> 
> Then the next step would be to allow these files to be fetched from a
> binary repo optionally, and then finally we'd need tools to create the
> repo.  But, this step isn't needed for your use case.  With the proper
> optional switches you could utilize as much of this scheme as you
> like.
> 
> Also, you could optionally choose how much you want portage to encode
> in the tag and look for.  Are you very fussy and only want a binary
> package with matching CFLAGS/USE/whatever?  Or is just matching
> USE/arch/etc enough?  Some of the existing portage options could
> potentially be re-used here.

We've already had this handled for a couple years now, via
FEATURES=binpkg-multi-instance.
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-18 Thread Rich Freeman
On Sun, Nov 18, 2018 at 4:10 PM Roy Bamford  wrote:
>
> Replying off list because I am not on the whitelist.

That seems odd.

> 1) append a uuid to each filename. Generated when the bin package file is 
> generated.
> 2) encode the hostname of the machine that generated the file
> 3) encode the use flags in the filename.

So, I brought up this same issue in the earlier discussion and it was
considered out of scope, and I think this is fair.  The GLEP does not
specify filename, and IMO the standard for what goes INSIDE the file
will work just fine with any future enhancements that address exactly
this use case.

Besides your case of building for a cluster, another use case is
having a central binary repo that portage could check and utilize when
a user's preferences happen to match what is pre-built.

I suggest we start a different thread for any additional discussion of
this use case.  I was thinking and it probably wouldn't be super-hard
to actually start building something like this.  But, I don't want to
derail this GLEP as I don't see any reason designing something like
this needs to hold up the binary package format.  Both the existing
and proposed binary package formats will encode any metadata needed by
the package manager inside the file, and the only extension we need is
to encode identifying info in the filename.

My idea is to basically have portage generate a tag with all the info
needed to identify the "right" package, take a hash of it, and then
stick that in the filename.  Then when portage is looking for a binary
package to use at install time it generates the same tag using the
same algorithm and looks for a matching hash.  If a hit is found then
it reads the complete metadata in the file and applies all the sanity
checks it already does.  Generating of binary packages with the hash
cold be made optional, and portage could also be configured to first
look for the matching hash, then fall back to the existing naming
convention, so that it would be compatible with existing generic
names.  So, users would get a choice as to whether they want to build
up a library of these packages, or just have each build overwrite the
last.

Then the next step would be to allow these files to be fetched from a
binary repo optionally, and then finally we'd need tools to create the
repo.  But, this step isn't needed for your use case.  With the proper
optional switches you could utilize as much of this scheme as you
like.

Also, you could optionally choose how much you want portage to encode
in the tag and look for.  Are you very fussy and only want a binary
package with matching CFLAGS/USE/whatever?  Or is just matching
USE/arch/etc enough?  Some of the existing portage options could
potentially be re-used here.

Please make any replies in a new thread.

-- 
Rich



Fwd: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gen...@jonesmz.com]

2018-11-18 Thread Roy Bamford
See attached.

Replying off list because I am not on the whitelist ...

-- 
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods
--- Begin Message ---
On Sun, Nov 18, 2018 at 5:04 AM Roy Bamford  wrote:

> On 2018.11.18 09:38, Michał Górny wrote:
> > On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote:
> > > On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> > > > Problems with the current binary package format
>
> [snip]
>
> > > > 2. **The format relies on obscure compressor feature of ignoring
> > > >trailing garbage**.  While this behavior is traditionally
> > implemented
> > > >by many compressors, the original reasons for it have become
> > long
> > > >irrelevant and it is not surprising that new compressors do not
> > > >support it.  In particular, Portage already hit this problem
> > twice:
> > > >once when users replaced bzip2 with parallel-capable pbzip2
> > > >implementation [#PBZIP2]_, and the second time when support for
> > zstd
> > > >compressor was added [#ZSTD]_.
> > >
> > > I think this is actually the result of a rather opportunistic
> > > implementation.  The fault is that we chose to use an extension that
> > > suggests the file is a regular compressed tarball.
> > > When one detects that a file is xpak padded, it is trivial to feed
> > the
> > > decompressor just the relevant part of the datastream.  The format
> > > itself isn't bad, and doesn't rely on obscure behaviour.
> >
> > Except if you don't have the proper tools installed.  In which case
> > the 'opportunistic' behavior made it possible to extract the contents
> > without special tools... except when it actually happens not to work
> > anymore.  Roy's reply indicates that there is actually interest in
> > this
> > design feature.
> >
> [snip]
>
> Team,
>
> I use to post something like https://wiki.gentoo.org/wiki/Fix_My_Gentoo
> with a link to Patricks binhost on the forums every three or four months.
> It made it worth writing that wiki page anyway.
>
> We still get users removing elements of their toolchain or glbc from time
> to time.  The requirement that I didn't express very well, is that it
> shall
> be possible to install binary packages without the use of any Gentoo
> specific tooling.
>
> The current tarball of tarballs proposal would satisfy that requirement.
>
> Its unlikely that a custom binary format would.  Of course, this being
> Gentoo someone would write a run anywhere script that did the
> unpicking, We already have deb2targz and rpm2targz. We have the
> opportunity to design out binpgk2targz before it exists.
>
> --
> Regards,
>
> Roy Bamford
> (Neddyseagoon) a member of
> elections
> gentoo-ops
> forum-mods
>


Replying off list because I am not on the whitelist.

Please also consider my use case:

I have a cluster file system, cephfs, which all of my gentoo machines mount
for access to various shared file resources.

I want to have all of them mount a cephfs path to the folder which portage
is configured to look for binary packages.

This works great if all of the machines have identical portage
configurations, but breaks down as soon as one machine uses a different use
flag.

The reason for this is that the package file names do not encode anything
other than the package name and version number. So if a binpkg already
exists in my binpkg repository, and another machine builds with different
use flags, the binpkg gets overwritten, potentially while a third machine
is reading the binpkg file.

The filename also does not represent compile time dependencies, or any
number of other possible points of differentiation

This issue could be (at least partially) solved at least 3 ways.

1) append a uuid to each filename. Generated when the bin package file is
generated.
2) encode the hostname of the machine that generated the file
3) encode the use flags in the filename.

Perhaps a fuller solution is to respect an environment variable
"BINARY_PKG_FILENAME_FORMAT" that accepts a series of variable
substitutions to append after the package name and version number?

This variable would be used only when generating the binary package.
Portage would still use any binary package that it found that matched its
needs, regardless of suffix.

Thanks for your time.
--- End Message ---


pgpoqDkTyX_48.pgp
Description: PGP signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-18 Thread Roy Bamford
On 2018.11.18 09:38, Michał Górny wrote:
> On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote:
> > On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> > > Problems with the current binary package format

[snip]

> > > 2. **The format relies on obscure compressor feature of ignoring
> > >trailing garbage**.  While this behavior is traditionally
> implemented
> > >by many compressors, the original reasons for it have become
> long
> > >irrelevant and it is not surprising that new compressors do not
> > >support it.  In particular, Portage already hit this problem
> twice:
> > >once when users replaced bzip2 with parallel-capable pbzip2
> > >implementation [#PBZIP2]_, and the second time when support for
> zstd
> > >compressor was added [#ZSTD]_.
> > 
> > I think this is actually the result of a rather opportunistic
> > implementation.  The fault is that we chose to use an extension that
> > suggests the file is a regular compressed tarball.
> > When one detects that a file is xpak padded, it is trivial to feed
> the
> > decompressor just the relevant part of the datastream.  The format
> > itself isn't bad, and doesn't rely on obscure behaviour.
> 
> Except if you don't have the proper tools installed.  In which case
> the 'opportunistic' behavior made it possible to extract the contents
> without special tools... except when it actually happens not to work
> anymore.  Roy's reply indicates that there is actually interest in
> this
> design feature.
> 
[snip]

Team,

I use to post something like https://wiki.gentoo.org/wiki/Fix_My_Gentoo
with a link to Patricks binhost on the forums every three or four months. 
It made it worth writing that wiki page anyway.

We still get users removing elements of their toolchain or glbc from time
to time.  The requirement that I didn't express very well, is that it shall 
be possible to install binary packages without the use of any Gentoo
specific tooling.

The current tarball of tarballs proposal would satisfy that requirement.

Its unlikely that a custom binary format would.  Of course, this being 
Gentoo someone would write a run anywhere script that did the 
unpicking, We already have deb2targz and rpm2targz. We have the 
opportunity to design out binpgk2targz before it exists.

-- 
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods


pgpFaerHiTnmN.pgp
Description: PGP signature


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-18 Thread Fabian Groffen
On 18-11-2018 10:38:51 +0100, Michał Górny wrote:
> On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote:
> > On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> > > Problems with the current binary package format
> > > ---
> > > 
> > > The following problems were identified with the package format currently
> > > in use:
> > > 
> > > 1. **The packages rely on custom binary archive format to store
> > >metadata.**  It is entirely Gentoo invented, and requires dedicated
> > >tooling to work with it.  In fact, the reference implementation
> > >in Portage does not even include a CLI tool to work with tbz2
> > >packages; an unofficial implementation is provided as part
> > >of portage-utils toolkit [#PORTAGE-UTILS]_.
> > 
> > I think you should rewrite this section to the argument that the
> > metadata is hard to edit, and that there is only one tool to do so
> > (except a python interface from Portage?).
> > On a separate note, I don't think portage-utils can be considered
> > "unofficial", it is a Gentoo official project as far as I am aware.
> 
> In this context, Portage is 'official'.  Portage-utils is a project
> that's developed entirely separately from Portage and doesn't use
> Portage APIs but instead reinvents everything.  As such, it is easy for
> the two to go out of sync.  Or for one of them to have bugs that
> the other one doesn't have (say, with endianness).

I'm not sure if it's actually true, I was under the impression the same
author(s) worked on the Portage as well as portage-utils code.  Anyway,
aren't quickpkg and emerge enough from a user's perspective?

> > > 2. **The format relies on obscure compressor feature of ignoring
> > >trailing garbage**.  While this behavior is traditionally implemented
> > >by many compressors, the original reasons for it have become long
> > >irrelevant and it is not surprising that new compressors do not
> > >support it.  In particular, Portage already hit this problem twice:
> > >once when users replaced bzip2 with parallel-capable pbzip2
> > >implementation [#PBZIP2]_, and the second time when support for zstd
> > >compressor was added [#ZSTD]_.
> > 
> > I think this is actually the result of a rather opportunistic
> > implementation.  The fault is that we chose to use an extension that
> > suggests the file is a regular compressed tarball.
> > When one detects that a file is xpak padded, it is trivial to feed the
> > decompressor just the relevant part of the datastream.  The format
> > itself isn't bad, and doesn't rely on obscure behaviour.
> 
> Except if you don't have the proper tools installed.  In which case
> the 'opportunistic' behavior made it possible to extract the contents
> without special tools... except when it actually happens not to work
> anymore.  Roy's reply indicates that there is actually interest in this
> design feature.

Your point is that the format is broken (== relies on obscure compressor
feature).  My point is that the format simply requires a special tool.
The fact that we prefer to use existing tools doesn't imply in any way
that the format is broken to me.
I think you should rewrite your point to mention that you don't want to
use a tool that doesn't exist in @system (?) to unpack a binpkg.  My
guess is that you could use some head/tail magic in a script if the
trailing block is upsetting the decompressor.

I'm not saying this may look ugly, I'm just saying that your point seems
biased.

> > > 3. **Placing metadata at the end of file makes partial fetches
> > >complex.**  While it is technically possible to obtain package
> > >metadata remotely without fetching the whole package, it usually
> > >requires e.g. 2-3 HTTP requests with rather complex driver.  For
> > >comparison, if metadata was placed at the beginning of the file,
> > >early-terminated pipeline with a single fetch request would suffice.
> > 
> > I think this point needs to be quantified somewhat why it is so
> > important.
> > I may be wrong, but the average binpkg is small, <1MiB, bigger packages
> > are <50MiB.
> > So what is the gain to be saved here?  A "few" MiBs for what operation
> > exactly?  I say "few" because I know for some users this is actually not
> > just a blib before it's downloaded.  So if this is possible to achieve,
> > in what scenarios is this going to be used (and is this often?).
> 
> Last I checked, Gentoo aimed to support more users than the 'majority'
> of people with high-throughput Internet access.  If there's no cost
> in doing things better, why not do them better?

You didn't address the critical question, but instead just repeated what
I said.
So again, why do you need to read just the metadata?

> > > 4. **Extending the format with OpenPGP signatures is non-trivial.**
> > >Depending on the implementation details, it either requires fetching
> > >additional detached signature, breaking backwards compatibi

Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-18 Thread Michał Górny
On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote:
> On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> > Problems with the current binary package format
> > ---
> > 
> > The following problems were identified with the package format currently
> > in use:
> > 
> > 1. **The packages rely on custom binary archive format to store
> >metadata.**  It is entirely Gentoo invented, and requires dedicated
> >tooling to work with it.  In fact, the reference implementation
> >in Portage does not even include a CLI tool to work with tbz2
> >packages; an unofficial implementation is provided as part
> >of portage-utils toolkit [#PORTAGE-UTILS]_.
> 
> I think you should rewrite this section to the argument that the
> metadata is hard to edit, and that there is only one tool to do so
> (except a python interface from Portage?).
> On a separate note, I don't think portage-utils can be considered
> "unofficial", it is a Gentoo official project as far as I am aware.

In this context, Portage is 'official'.  Portage-utils is a project
that's developed entirely separately from Portage and doesn't use
Portage APIs but instead reinvents everything.  As such, it is easy for
the two to go out of sync.  Or for one of them to have bugs that
the other one doesn't have (say, with endianness).

> > 2. **The format relies on obscure compressor feature of ignoring
> >trailing garbage**.  While this behavior is traditionally implemented
> >by many compressors, the original reasons for it have become long
> >irrelevant and it is not surprising that new compressors do not
> >support it.  In particular, Portage already hit this problem twice:
> >once when users replaced bzip2 with parallel-capable pbzip2
> >implementation [#PBZIP2]_, and the second time when support for zstd
> >compressor was added [#ZSTD]_.
> 
> I think this is actually the result of a rather opportunistic
> implementation.  The fault is that we chose to use an extension that
> suggests the file is a regular compressed tarball.
> When one detects that a file is xpak padded, it is trivial to feed the
> decompressor just the relevant part of the datastream.  The format
> itself isn't bad, and doesn't rely on obscure behaviour.

Except if you don't have the proper tools installed.  In which case
the 'opportunistic' behavior made it possible to extract the contents
without special tools... except when it actually happens not to work
anymore.  Roy's reply indicates that there is actually interest in this
design feature.

> 
> > 3. **Placing metadata at the end of file makes partial fetches
> >complex.**  While it is technically possible to obtain package
> >metadata remotely without fetching the whole package, it usually
> >requires e.g. 2-3 HTTP requests with rather complex driver.  For
> >comparison, if metadata was placed at the beginning of the file,
> >early-terminated pipeline with a single fetch request would suffice.
> 
> I think this point needs to be quantified somewhat why it is so
> important.
> I may be wrong, but the average binpkg is small, <1MiB, bigger packages
> are <50MiB.
> So what is the gain to be saved here?  A "few" MiBs for what operation
> exactly?  I say "few" because I know for some users this is actually not
> just a blib before it's downloaded.  So if this is possible to achieve,
> in what scenarios is this going to be used (and is this often?).

Last I checked, Gentoo aimed to support more users than the 'majority'
of people with high-throughput Internet access.  If there's no cost
in doing things better, why not do them better?

> 
> > 4. **Extending the format with OpenPGP signatures is non-trivial.**
> >Depending on the implementation details, it either requires fetching
> >additional detached signature, breaking backwards compatibility or
> >introducing more custom logic to reassemble OpenPGP packets.
> 
> I think one could add an extra key to the xpak that holds a gpg sig or
> something.  Perhaps this point is better phrased as that current binpkgs
> don't have any validation options defined.

...which extra key would mean that the two disjoint implementations
in use would need more custom code that extracts the signature,
reconstructs signed data for verification and verifies it.  Or, in other
words, that user needs even more custom tooling to manually verify
the package he just fetched.

> 
> > 5. **Metadata is not compressed.**  This is not a significant problem,
> >it is just listed for completeness.
> > 
> > 
> > Goals for a new container format
> > 
> > 
> > The following goals have been set for a replacement format:
> > 
> > 1. **The packages must remain contained in a single file.**  As a matter
> >of user convenience, it should be possible to transfer binary
> >packages without having to use multiple files, and to install them
> >from any location.
> > 
> > 

Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-18 Thread Fabian Groffen
On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> Problems with the current binary package format
> ---
> 
> The following problems were identified with the package format currently
> in use:
> 
> 1. **The packages rely on custom binary archive format to store
>metadata.**  It is entirely Gentoo invented, and requires dedicated
>tooling to work with it.  In fact, the reference implementation
>in Portage does not even include a CLI tool to work with tbz2
>packages; an unofficial implementation is provided as part
>of portage-utils toolkit [#PORTAGE-UTILS]_.

I think you should rewrite this section to the argument that the
metadata is hard to edit, and that there is only one tool to do so
(except a python interface from Portage?).
On a separate note, I don't think portage-utils can be considered
"unofficial", it is a Gentoo official project as far as I am aware.

> 2. **The format relies on obscure compressor feature of ignoring
>trailing garbage**.  While this behavior is traditionally implemented
>by many compressors, the original reasons for it have become long
>irrelevant and it is not surprising that new compressors do not
>support it.  In particular, Portage already hit this problem twice:
>once when users replaced bzip2 with parallel-capable pbzip2
>implementation [#PBZIP2]_, and the second time when support for zstd
>compressor was added [#ZSTD]_.

I think this is actually the result of a rather opportunistic
implementation.  The fault is that we chose to use an extension that
suggests the file is a regular compressed tarball.
When one detects that a file is xpak padded, it is trivial to feed the
decompressor just the relevant part of the datastream.  The format
itself isn't bad, and doesn't rely on obscure behaviour.

> 3. **Placing metadata at the end of file makes partial fetches
>complex.**  While it is technically possible to obtain package
>metadata remotely without fetching the whole package, it usually
>requires e.g. 2-3 HTTP requests with rather complex driver.  For
>comparison, if metadata was placed at the beginning of the file,
>early-terminated pipeline with a single fetch request would suffice.

I think this point needs to be quantified somewhat why it is so
important.
I may be wrong, but the average binpkg is small, <1MiB, bigger packages
are <50MiB.
So what is the gain to be saved here?  A "few" MiBs for what operation
exactly?  I say "few" because I know for some users this is actually not
just a blib before it's downloaded.  So if this is possible to achieve,
in what scenarios is this going to be used (and is this often?).

> 4. **Extending the format with OpenPGP signatures is non-trivial.**
>Depending on the implementation details, it either requires fetching
>additional detached signature, breaking backwards compatibility or
>introducing more custom logic to reassemble OpenPGP packets.

I think one could add an extra key to the xpak that holds a gpg sig or
something.  Perhaps this point is better phrased as that current binpkgs
don't have any validation options defined.

> 5. **Metadata is not compressed.**  This is not a significant problem,
>it is just listed for completeness.
> 
> 
> Goals for a new container format
> 
> 
> The following goals have been set for a replacement format:
> 
> 1. **The packages must remain contained in a single file.**  As a matter
>of user convenience, it should be possible to transfer binary
>packages without having to use multiple files, and to install them
>from any location.
> 
> 2. **The file format must be entirely based on common file formats,
>respecting best practices, with as little customization as necessary
>to satisfy the requirements.**  In particular, it is unacceptable
>to create new binary formats.

I take this as your personal opinion.  I don't quite get why it is
unacceptable to create a new binary format though.  In particular when
you're looking for efficiency, such format could serve your purposes.
As long as it's clearly defined, I don't see the problem with a binary
format either.
Could you add why it is you think binary formats are unacceptable here?

> 3. **The file format should provide for partial fetching of binary
>packages.**  It should be possible to easily fetch and read
>the package metadata without having to download the whole package.

Like above, what is the use-case here?  Why would you want this?  I
think I'm missing something here.

> 4. **The file format must provide support for OpenPGP signatures.**
>Preferably, it should use standard OpenPGP message formats.
> 
> 5. **The file format must allow for efficient metadata updates.**
>In particular, it should be possible to update the metadata without
>having to recompress package files.
> 
> 6. **The file format should account for easy recognition both throug

Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-17 Thread Michał Górny
On Sat, 2018-11-17 at 14:05 +, Roy Bamford wrote:
> On 2018.11.17 11:21, Michał Górny wrote:
> > Hi,
> > 
> > Here's a pre-GLEP draft based on the earlier discussion on gentoo-
> > portage-dev mailing list.  The specification uses GLEP form as it
> > provides for cleanly specifying the motivation and rationale.
> > 
> > [snip glep proposal]
> > -- 
> > Best regards,
> > Michał Górny
> > 
> 
> Team,
>  
> One of the attractions of the existing format is that 
> tar xf /path/to/tarball -C /mnt/gentoo 
> works to fix things like glibc being removed and other
> missing essential portage components.
> 
> In effect, each binary package can be treated as a
> single package stage3 when a user needs a get out of jail
> free card.
> 
> Does this proposal allow for installing the payload without 
> the use of the Gentoo package manager from some random 
> distro being used as a rescue media?

Yes, and it can also be done via one-liner, though it's going to be more
complex than before, e.g.:

tar -xOf mypackage-1.gpkg.tar mypackage-1/image.tar.lz |
  tar --lzip -x -C /mnt/gentoo --strip-components 1

Though I wouldn't recommend using it but instead unpacking it normally
and inspecting the contents first.

-- 
Best regards,
Michał Górny


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-17 Thread Rich Freeman
On Sat, Nov 17, 2018 at 9:05 AM Roy Bamford  wrote:
>
> Does this proposal allow for installing the payload without
> the use of the Gentoo package manager from some random
> distro being used as a rescue media?
>

Yes, it is a tarball of tarballs.  There would be an extra step, but a
vanilla tarball containing the files to be extracted could be
extracted as long as you have tar and the appropriate decompressor
(not specified and could change, but I imagine it will remain bzip2
for now).


-- 
Rich



Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-17 Thread Roy Bamford
On 2018.11.17 11:21, Michał Górny wrote:
> Hi,
> 
> Here's a pre-GLEP draft based on the earlier discussion on gentoo-
> portage-dev mailing list.  The specification uses GLEP form as it
> provides for cleanly specifying the motivation and rationale.
> 
>[snip glep proposal]
> -- 
> Best regards,
> Michał Górny
> 

Team,
 
One of the attractions of the existing format is that 
tar xf /path/to/tarball -C /mnt/gentoo 
works to fix things like glibc being removed and other
missing essential portage components.

In effect, each binary package can be treated as a
single package stage3 when a user needs a get out of jail
free card.

Does this proposal allow for installing the payload without 
the use of the Gentoo package manager from some random 
distro being used as a rescue media?

-- 
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods


pgpGykl9iWp_g.pgp
Description: PGP signature


[gentoo-dev] [pre-GLEP] Gentoo binary package container format

2018-11-17 Thread Michał Górny
Hi,

Here's a pre-GLEP draft based on the earlier discussion on gentoo-
portage-dev mailing list.  The specification uses GLEP form as it
provides for cleanly specifying the motivation and rationale.

(Note: the number assignment is not official, just took the next number
to satisfy the glep converter script)

Also available via HTTPS:

rst:  https://dev.gentoo.org/~mgorny/tmp/glep-0078.rst
html: https://dev.gentoo.org/~mgorny/tmp/glep-0078.html

---
GLEP: 78
Title: Gentoo binary package container format
Author: Michał Górny 
Type: Standards Track
Status: Draft
Version: 1
Created: 2018-11-15
Last-Modified: 2018-11-16
Post-History: 2018-11-17
Content-Type: text/x-rst
---

Abstract


This GLEP proposes a new binary package container format for Gentoo.
The current tbz2/XPAK format is shortly described, and its deficiences
are listed.  Accordingly, the requirements for a new format are set
and a gpkg format satisfying them is proposed.  The rationale for
various design decisions is provided.


Motivation
==

The current Portage binary package format
-

The historical ``.tbz2`` binary package format used by Portage is
a concatenation of two distinct formats: header-oriented compressed .tar
format (used to hold package files) and trailer-oriented custom XPAK
format (used to hold metadata)  [#MAN-XPAK]_.  The format has already
been extended incompatibly twice.

The first time, support for storing multiple successive builds of binary
package for a single ebuild version has been added.  This feature relies
on appending additional hyphen, followed by an integer to the package
filename.  It is disabled by default (preserving backwards
compatibility) and controlled by ``binpkg-multi-instance`` feature.

The second time, support for additional compression formats has been
added.  When format other than bzip2 is used, the ``.tbz2`` suffix
is replaced by ``.xpak`` and Portage relies on magic bytes to detect
compression used.  For backwards compatibility, Portage still defaults
to using bzip2; compression program can be switched using
``BINPKG_COMPRESS`` configuration variable.

Additionally, there have been minor changes to the stored metadata
and file storage policies.  In particular, behavior regarding
``INSTALL_MASK``, controllable file compression and stripping has
changed over time.


Problems with the current binary package format
---

The following problems were identified with the package format currently
in use:

1. **The packages rely on custom binary archive format to store
   metadata.**  It is entirely Gentoo invented, and requires dedicated
   tooling to work with it.  In fact, the reference implementation
   in Portage does not even include a CLI tool to work with tbz2
   packages; an unofficial implementation is provided as part
   of portage-utils toolkit [#PORTAGE-UTILS]_.

2. **The format relies on obscure compressor feature of ignoring
   trailing garbage**.  While this behavior is traditionally implemented
   by many compressors, the original reasons for it have become long
   irrelevant and it is not surprising that new compressors do not
   support it.  In particular, Portage already hit this problem twice:
   once when users replaced bzip2 with parallel-capable pbzip2
   implementation [#PBZIP2]_, and the second time when support for zstd
   compressor was added [#ZSTD]_.

3. **Placing metadata at the end of file makes partial fetches
   complex.**  While it is technically possible to obtain package
   metadata remotely without fetching the whole package, it usually
   requires e.g. 2-3 HTTP requests with rather complex driver.  For
   comparison, if metadata was placed at the beginning of the file,
   early-terminated pipeline with a single fetch request would suffice.

4. **Extending the format with OpenPGP signatures is non-trivial.**
   Depending on the implementation details, it either requires fetching
   additional detached signature, breaking backwards compatibility or
   introducing more custom logic to reassemble OpenPGP packets.

5. **Metadata is not compressed.**  This is not a significant problem,
   it is just listed for completeness.


Goals for a new container format


The following goals have been set for a replacement format:

1. **The packages must remain contained in a single file.**  As a matter
   of user convenience, it should be possible to transfer binary
   packages without having to use multiple files, and to install them
   from any location.

2. **The file format must be entirely based on common file formats,
   respecting best practices, with as little customization as necessary
   to satisfy the requirements.**  In particular, it is unacceptable
   to create new binary formats.

3. **The file format should provide for partial fetching of binary
   packages.**  It should be possible to easily fetch and read
   the package metadata withou