Re: (Re-) Designing extractong-downaloder

2022-02-24 Thread Hartmut Goebel

Am 23.02.22 um 13:35 schrieb Maxime Devos:

Nevermind, this benefit is probably undone by the extra unpacking.


Probably.

Anyway, this is worth thinking of, as it would make the additional 
unpacking part of the source. And thus unpacking would be decoupled from 
the build-system. (Which was part of the idea behind the proposal.)


After considering this for some time, I actually like your idea: it is 
explicit (which is better than implicit), flexible and simple (no 
extracting downloader required at all). And it also does not lead to any 
problems with content-addressed downloads like SWH. The only downside I 
can see at the moment is that is stores both the outer and the inner 
archive.


Let's see what others think about it.

--
Regards
Hartmut Goebel

| Hartmut Goebel  | h.goe...@crazy-compilers.com   |
| www.crazy-compilers.com | compilers which you thought are impossible |




Re: (Re-) Designing extractong-downaloder

2022-02-24 Thread Hartmut Goebel

Am 23.02.22 um 11:52 schrieb pukkamustard:

Why use the source from hex.pm at all?


While issue 51061 is about the hex.pm importer and the rebar build 
system, this thread in only about the extracting downloader :-)



The hex.pm metadata.config file does not seem to exactly specify the
upstream source. We would need some heuristics to figure this out. But
maybe we could find a heuristic that works well enough? This would solve
the double-archive problem.


FMPOV, hex.pm is one important valid distribution point for erlang and 
elixir packages. Like PypPi is for Python and CPAN is for Perl. So we 
should support defining this as a packages source, which can also be 
used for checking for updates much easier than any git repository or 
git-based forge.


Some of the packages I've investigated so far are easier to build from 
hex.pm than from github. E.g. some github repos contain a „rebar“ binary 
(which needs to be deleted by a snippet when defining the source), while 
the corresponding hex.pm package can be used as-is.


Regarding heuristics: Since build should be reproducible, a source 
definition must not use any heuristics. Anyhow this might be useful for 
the hex.pm importer.


--
Regards
Hartmut Goebel

| Hartmut Goebel  | h.goe...@crazy-compilers.com   |
| www.crazy-compilers.com | compilers which you thought are impossible |




Re: (Re-) Designing extractong-downaloder

2022-02-23 Thread Maxime Devos
Maxime Devos schreef op wo 23-02-2022 om 13:30 [+0100]:
> A benefit of delegating the actual downloading to url-fetch, is that
> (guix scripts perform-download) would be used, so connections could
> be cached.

Nevermind, this benefit is probably undone by the extra unpacking.
I still recommend delegating the downloading to url-fetch though, such
that if, say, a bug in (guix swh) has been fixed or (guix swh) has been
improved in other ways, then time-travellers to the past can still
benefit of the improved (guix swh).

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


Re: (Re-) Designing extractong-downaloder

2022-02-23 Thread Maxime Devos
Hartmut Goebel schreef op wo 23-02-2022 om 09:57 [+0100]:
> TL;DR: What do you think about the idea of an „extracting dowloader“?
> 
> I'm about pick up work on „extracting downloader“ and the rebar build 
> system (for erlang), see  for a first 
> try, In the aforementioned issue some points came up regarding the basic 
> design of the patch. Thus before starting to write code, I'd like to 
> agree on a basic design.

Could the ‘extracting’ downloader be built on top of the regular
downloader?  More concretely:

(package
  (name "some-package-from-hex")
  (source
(extract-from-hex
  (origin
(method
  "http://some-url-pointing-to-a-tarball-wrapped-in-a-tarball;)
(sha256 (base32 )
  (build-system ...))

Here, 'extract-from-hex' would turn a file-like object into a , which lowers to some derivation extracting the tarball from
the tarball.  (guix upstream) might need to be modified to support
.

Also, is there some fundamental reason that hex.pm wraps tars inside
tars and only provides the wrapped tars, or could hex.pm be convinced
to also serve the underlying tars directly?

A benefit of delegating the actual downloading to url-fetch, is that
(guix scripts perform-download) would be used, so connections could be
cached.

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


Re: (Re-) Designing extractong-downaloder

2022-02-23 Thread pukkamustard


Hi Hartmut,

Hartmut Goebel  writes:

> I'm about pick up work on „extracting downloader“ and the rebar build
> system (for erlang),

I'm very much looking forward to this!

> The basic idea behind „extracting downloader“ is as follows: Packages
> provided by hex.pm (the distribution repository for erlang and elixir
> packages) are tar-archives containing some meta-data files and the
> actual source (contents.tar.gz), see example below, So the ideas was
> to only store the contents.tar.gz (instead of requiring an additional
> unpacking step).

Why use the source from hex.pm at all? Would it be possible to just
fetch the hex.pm archive when importing a package, read the
metadata.config file and then try and use upstream source (e.g. GitHub)?

The hex.pm metadata.config file does not seem to exactly specify the
upstream source. We would need some heuristics to figure this out. But
maybe we could find a heuristic that works well enough? This would solve
the double-archive problem.

For packages where the heuristics fails we fallback and use the source
as provided from hex.pm (unextracted) and use an additional build phase
to do the double extraction? If this only affects a few packages then
storing the source double-archived does not seem so bad.

Thanks,
pukkamustard



(Re-) Designing extractong-downaloder

2022-02-23 Thread Hartmut Goebel

Hi,

TL;DR: What do you think about the idea of an „extracting dowloader“?

I'm about pick up work on „extracting downloader“ and the rebar build 
system (for erlang), see  for a first 
try, In the aforementioned issue some points came up regarding the basic 
design of the patch. Thus before starting to write code, I'd like to 
agree on a basic design.


The basic idea behind „extracting downloader“ is as follows: Packages 
provided by hex.pm (the distribution repository for erlang and elixir 
packages) are tar-archives containing some meta-data files and the 
actual source (contents.tar.gz), see example below, So the ideas was to 
only store the contents.tar.gz (instead of requiring an additional 
unpacking step).


In some earlier discussion someone mentioned, this could be interesting 
for ruby gems, too.


Storing only the archive would allow to have the archive's hash as the 
"source"-hash and allow for easy validation of the hash. Anyhow, much of 
the complexity of the current implementation (see issue 51061) is caused 
by this idea, since the code needs to postbone hashing to after the 
download.


Also In some earlier discussion Ludo (afair) brought up the point 
whether e.g. swh would be able provide a source-package if hased this way.


What do you think about the idea of an „extracting dowloader“?


Example for a package from hex.pm:

$ wget https://repo.hex.pm/tarballs/getopt-1.0.2.tar
…
$ tar tvf getopt-1.0.2.tar
-rw-r--r-- 0/0   1 2000-01-01 01:00 VERSION
-rw-r--r-- 0/0  64 2000-01-01 01:00 CHECKSUM
-rw-r--r-- 0/0 451 2000-01-01 01:00 metadata.config
-rw-r--r-- 0/0   14513 2000-01-01 01:00 contents.tar.gz


--
Regards
Hartmut Goebel

| Hartmut Goebel  | h.goe...@crazy-compilers.com   |
| www.crazy-compilers.com | compilers which you thought are impossible |