Re: Packaging big generated data files?

2022-12-12 Thread zimoun
Hi,

On Wed, 07 Dec 2022 at 11:33, Denis 'GNUtoo' Carikli 
 wrote:

> The issue here is probably the size of the generated files: they are
> huge, so if they are packaged, they will most likely take significant
> resources in the Guix infrastructure.
>
> So what would be the way to go here? Would Guix accept patches to add
> packages for these files in Guix proper?  

>From my point of view, the data and the code should be packaged
separately; the package data using copy-build-system would be an input
for the package code.

> If so, does it needs to be done like with the ZFS (kernel module)
> package where "#:substitutable? #f" is used to avoid redistributing
> package builds? Or are other ways better for such use cases?

Yes, ’#:substitutable? #f’ seems the first way to go.


Cheers,
simon



Re: Packaging big generated data files?

2022-12-11 Thread Ludovic Courtès
Hi,

Denis 'GNUtoo' Carikli  skribis:

> On Thu, 08 Dec 2022 14:46:51 +0100
> Csepp  wrote:
>> Could ZIM files be downloaded over bittorrent as fixed output
>> derivations?  They can be pretty huge.  Also if the system started
>> seeding them as well, that would be pretty cool.
> I've no idea how to generate fixed output derivations.

Origins are lowered into “fixed-output derivations”.  They’re
“fixed-output” because their content hash is known in advance, and thus,
the method you used to produce them doesn’t matter (info "(guix)
Derivations").

So you could specify an origin with ‘bittorrent-fetch’ (to be written)
instead of ‘url-fetch’.

HTH,
Ludo’.



Re: Packaging big generated data files?

2022-12-10 Thread Denis 'GNUtoo' Carikli
On Wed, 07 Dec 2022 15:45:01 +0100
"pelzflorian (Florian Pelz)"  wrote:

> Denis 'GNUtoo' Carikli  writes:
> > Is there any policies or past decisions of the Guix project on
> > packaging big generated data files?
> 
> commit 183db725a4e7ef6a0ae5170bfa0967bb2eafded7
> Author: Ricardo Wurmus 
> Date:   Tue May 15 12:55:27 2018 +0200
> 
> gnu: Add r-bsgenome-dmelanogaster-ucsc-dm6.
> 
> * gnu/packages/bioconductor.scm
> (r-bsgenome-dmelanogaster-ucsc-dm6): New variable.
Thanks.

So I assume that we could do something like that for now and later on
see if it makes sense to generate the files.

Denis.


pgplezwAGjiLC.pgp
Description: OpenPGP digital signature


Re: Packaging big generated data files?

2022-12-10 Thread Denis 'GNUtoo' Carikli
On Thu, 08 Dec 2022 14:46:51 +0100
Csepp  wrote:
> Could ZIM files be downloaded over bittorrent as fixed output
> derivations?  They can be pretty huge.  Also if the system started
> seeding them as well, that would be pretty cool.
I've no idea how to generate fixed output derivations.

As for BiTorrent, ZIM files provided by kiwix can be downloaded over
it. As for using that in packages, all I found in Guix (beside
packages) was a Transmission service and associated test(s). So I guess
that would needs to be added.

Denis.


pgp_JsSAo8UVG.pgp
Description: OpenPGP digital signature


Re: Packaging big generated data files?

2022-12-08 Thread Csepp


Denis 'GNUtoo' Carikli  writes:

> [[PGP Signed Part:Undecided]]
> Hi,
>
> Is there any policies or past decisions of the Guix project on
> packaging big generated data files?
>
> I've added packages for software like kiwix-tools and navit that both
> work offline but that also need data files to be useful.
>
> Navit is a (car) navigation software that need maps. The maps can be
> generated from OpenStreetMap dumps with a tool available in Navit
> source code (maptool)[1] which is not packaged yet. Binary map files can
> also be downloaded directly from various sources.
>
> Right now the biggest file possible for such maps is about 47 GiB
> (for the whole planet).
>
> As for kiwix-tools, it can serve offline versions of websites like
> Wikipedia, and there too it needs files to work. The biggest file seems
> to be the complete version of English Wikipedia with scaled down
> pictures[2] and it takes about 89 GiB. I didn't look yet how these files
> were generated but I guess that they somehow can be generated from
> Wikipedia dumps.
>
> Packaging the binary files (without generating them) can be useful as
> it simplifies a lot the maintenance as one can just update the package
> version and checksum to update these. It also enables to keep the
> information (download URL, checksum, license) in one place and it
> enables easy reuse by Guix services and/or configuration files.
>
> If these files were generated in packages, it would also enable to
> tweak the data, for instance by adding height data in navit maps. As
> for kiwix compatible files, it would probably enable to decide when to
> make the snapshots or enable to package additional wikis
> (like the Libreplanet Wiki) or websites.
>
> The issue here is probably the size of the generated files: they are
> huge, so if they are packaged, they will most likely take significant
> resources in the Guix infrastructure.
>
> So what would be the way to go here? Would Guix accept patches to add
> packages for these files in Guix proper?  
>
> If so, does it needs to be done like with the ZFS (kernel module)
> package where "#:substitutable? #f" is used to avoid redistributing
> package builds? Or are other ways better for such use cases?
>
> Note that so far I've only packaged locally only kiwix compatible files
> for various wikis by just downloading already prepared files, so I
> didn't look yet into navit maps or into generating all these files, so
> I might miss some details about generating them.
>
> References:
> ---
> [1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
> [2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim
>
> Denis.
>
> [[End of PGP Signed Part]]

Could ZIM files be downloaded over bittorrent as fixed output
derivations?  They can be pretty huge.  Also if the system started
seeding them as well, that would be pretty cool.



Re: Packaging big generated data files?

2022-12-07 Thread pelzflorian (Florian Pelz)
Denis 'GNUtoo' Carikli  writes:
> Is there any policies or past decisions of the Guix project on
> packaging big generated data files?

commit 183db725a4e7ef6a0ae5170bfa0967bb2eafded7
Author: Ricardo Wurmus 
Date:   Tue May 15 12:55:27 2018 +0200

gnu: Add r-bsgenome-dmelanogaster-ucsc-dm6.

* gnu/packages/bioconductor.scm (r-bsgenome-dmelanogaster-ucsc-dm6): New 
variable.

HTH.

Regards,
Florian



Packaging big generated data files?

2022-12-07 Thread Denis 'GNUtoo' Carikli
Hi,

Is there any policies or past decisions of the Guix project on
packaging big generated data files?

I've added packages for software like kiwix-tools and navit that both
work offline but that also need data files to be useful.

Navit is a (car) navigation software that need maps. The maps can be
generated from OpenStreetMap dumps with a tool available in Navit
source code (maptool)[1] which is not packaged yet. Binary map files can
also be downloaded directly from various sources.

Right now the biggest file possible for such maps is about 47 GiB
(for the whole planet).

As for kiwix-tools, it can serve offline versions of websites like
Wikipedia, and there too it needs files to work. The biggest file seems
to be the complete version of English Wikipedia with scaled down
pictures[2] and it takes about 89 GiB. I didn't look yet how these files
were generated but I guess that they somehow can be generated from
Wikipedia dumps.

Packaging the binary files (without generating them) can be useful as
it simplifies a lot the maintenance as one can just update the package
version and checksum to update these. It also enables to keep the
information (download URL, checksum, license) in one place and it
enables easy reuse by Guix services and/or configuration files.

If these files were generated in packages, it would also enable to
tweak the data, for instance by adding height data in navit maps. As
for kiwix compatible files, it would probably enable to decide when to
make the snapshots or enable to package additional wikis
(like the Libreplanet Wiki) or websites.

The issue here is probably the size of the generated files: they are
huge, so if they are packaged, they will most likely take significant
resources in the Guix infrastructure.

So what would be the way to go here? Would Guix accept patches to add
packages for these files in Guix proper?  

If so, does it needs to be done like with the ZFS (kernel module)
package where "#:substitutable? #f" is used to avoid redistributing
package builds? Or are other ways better for such use cases?

Note that so far I've only packaged locally only kiwix compatible files
for various wikis by just downloading already prepared files, so I
didn't look yet into navit maps or into generating all these files, so
I might miss some details about generating them.

References:
---
[1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
[2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim

Denis.


pgpJ1igkoF_kj.pgp
Description: OpenPGP digital signature