Re: Building and caching old Guix derivations for a faster time machine

2024-01-14 Thread Maxim Cournoyer
Hi Simon,

Simon Tournier  writes:

> Hi Maxim,
>
> On Thu, 30 Nov 2023 at 08:28, Maxim Cournoyer  
> wrote:
>
>> I'd like to have a single archive type as well in the future, but I'd
>> settle on Zstd, not lzip, because it's faster to compress and
>> decompress, and its compression ratio is not that different when using
>> its highest level (19).
>
> When running an inferior (past revision), some past Guile code as it was
> in this past revision is launched.  Hum, I have never checked: the
> substitution mechanism depends on present revision code (Guile and
> daemon) or on past revision?
>
> Other said, what are the requirements for the backward compatibility?
> Being able to run past Guix from a recent Guix, somehow.

We're only impacting the future, not the past, I think.  The inferior
mechanism still relies on the same daemon, as far as I know, and the
currently available gzipped nars would remain available according to
their current retention policy (6 months when unused).

>>>  1. Keep for as longer as we can all the requirements for running Guix
>>>  itself, e.g., “guix time-machine”.  Keep all the dependencies and all
>>>  the outputs of derivations.  At least, for all the ones the build farms
>>>  are already building.
>>>
>>>  2. Keep for 3-5 years all the outputs for specific Guix revision, as
>>>  v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.
>>
>> That'd be nice, but not presently doable as we can't fine tune retention
>> for a particular 'derivation' and its inputs in the Cuirass
>> configuration, unless I've missed it.
>
> That’s an implementation detail, a bug or a feature request, pick the
> one you prefer. ;-)

I'd say it's a feature request :-).

> We could imagine various paths for these next steps, IMHO.  For
> instance, we could move these outputs to some specific stores
> independent of the current ones (ci.guix and bordeaux.guix).  For
> instance, we could have “cold” storage with some cooking bakery for
> making hot again, instead of keeping all hot.  For instance, we could
> imagine etc. :-)
>
> Well, I do not have think much and I just speak loud: Cuirass (and Build
> Coordinator) are the builders, and I would not rely on them for some NAR
> “archiving“, instead maybe “we” could put some love into the tool
> nar-herder.  Somehow, extract specific NAR that the project would like
> to keep longer than the unpredictable current mechanism.

It seems the nar-herder would perhaps be well suited for this, if
someone is inclined to implement it, given it keeps each nars in a
database, which should make it fast to query for all the 'guix' packages
substitutes.  Perhaps it even has (or could have) hooks when registering
a new nars which could define what is done to it (send to another
server).

Otherwise good old 'find' could be used to rsync the 'guix' named nars
and their .narinfo metadata files to a different location, but that'd
probably be less efficient (IO-intensive) on the huge multi-terabytes
collection of nars we carry.

-- 
Thanks,
Maxim



Re: Building and caching old Guix derivations for a faster time machine

2024-01-12 Thread Simon Tournier
Hi Maxim,

On Thu, 30 Nov 2023 at 08:28, Maxim Cournoyer  wrote:

> I'd like to have a single archive type as well in the future, but I'd
> settle on Zstd, not lzip, because it's faster to compress and
> decompress, and its compression ratio is not that different when using
> its highest level (19).

When running an inferior (past revision), some past Guile code as it was
in this past revision is launched.  Hum, I have never checked: the
substitution mechanism depends on present revision code (Guile and
daemon) or on past revision?

Other said, what are the requirements for the backward compatibility?
Being able to run past Guix from a recent Guix, somehow.


>>  1. Keep for as longer as we can all the requirements for running Guix
>>  itself, e.g., “guix time-machine”.  Keep all the dependencies and all
>>  the outputs of derivations.  At least, for all the ones the build farms
>>  are already building.
>>
>>  2. Keep for 3-5 years all the outputs for specific Guix revision, as
>>  v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.
>
> That'd be nice, but not presently doable as we can't fine tune retention
> for a particular 'derivation' and its inputs in the Cuirass
> configuration, unless I've missed it.

That’s an implementation detail, a bug or a feature request, pick the
one you prefer. ;-)

We could imagine various paths for these next steps, IMHO.  For
instance, we could move these outputs to some specific stores
independent of the current ones (ci.guix and bordeaux.guix).  For
instance, we could have “cold” storage with some cooking bakery for
making hot again, instead of keeping all hot.  For instance, we could
imagine etc. :-)

Well, I do not have think much and I just speak loud: Cuirass (and Build
Coordinator) are the builders, and I would not rely on them for some NAR
“archiving“, instead maybe “we” could put some love into the tool
nar-herder.  Somehow, extract specific NAR that the project would like
to keep longer than the unpredictable current mechanism.

Cheers,
simon



Re: Building and caching old Guix derivations for a faster time machine

2023-12-04 Thread Maxim Cournoyer
Hi Guillaume,

Guillaume Le Vaillant  writes:

> Maxim Cournoyer  skribis:
>
>> Hi Simon,
>>
>> Simon Tournier  writes:
>>
>>> Hi,
>>>
>>> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès  wrote:
>>>
 For long-term storage though, we could choose to keep lzip only (because
 it compresses better).  Not something we can really do with the current
 ‘guix publish’ setup though.
>>>
>>> It looks good to me.  For me, the priority list looks like:
>>
>> I'd like to have a single archive type as well in the future, but I'd
>> settle on Zstd, not lzip, because it's faster to compress and
>> decompress, and its compression ratio is not that different when using
>> its highest level (19).
>
> Last time I checked, zstd with max compression (zstd --ultra -22) was
> a little slower and had a little lower compression ratio than lzip with
> max compression (lzip -9).
> Zstd is however much faster for decompression.

I think when we talk about performance of NARs, we mean it in the
context of a Guix user installing them (decompressing) more than in the
context of the CI producing them, so zstd beats lzip here.

> Another thing that could be useful to consider is that lzip was designed
> for long term storage, so it has some redundancy allowing fixing/recovering
> a corrupt archive (e.g. using lziprecover) if there has been some bit
> rot in the hardware storing the file.
> Whereas as far as I know zstd will just tell you "error: bad checksum"
> and will have no way to fix the archive.

That's an interesting aspect of lzip, but in this age of CRC-check file
systems like Btrfs, we have other means on ensuring data integrity (and
recovery, assuming we have backups available).

I'm still of the opinion that carrying a single set of zstd-only NARs
makes the most sense in the long run.

-- 
Thanks,
Maxim



Re: Building and caching old Guix derivations for a faster time machine

2023-11-30 Thread Guillaume Le Vaillant
Maxim Cournoyer  skribis:

> Hi Simon,
>
> Simon Tournier  writes:
>
>> Hi,
>>
>> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès  wrote:
>>
>>> For long-term storage though, we could choose to keep lzip only (because
>>> it compresses better).  Not something we can really do with the current
>>> ‘guix publish’ setup though.
>>
>> It looks good to me.  For me, the priority list looks like:
>
> I'd like to have a single archive type as well in the future, but I'd
> settle on Zstd, not lzip, because it's faster to compress and
> decompress, and its compression ratio is not that different when using
> its highest level (19).

Last time I checked, zstd with max compression (zstd --ultra -22) was
a little slower and had a little lower compression ratio than lzip with
max compression (lzip -9).
Zstd is however much faster for decompression.

Another thing that could be useful to consider is that lzip was designed
for long term storage, so it has some redundancy allowing fixing/recovering
a corrupt archive (e.g. using lziprecover) if there has been some bit
rot in the hardware storing the file.
Whereas as far as I know zstd will just tell you "error: bad checksum"
and will have no way to fix the archive.


signature.asc
Description: PGP signature


Re: Building and caching old Guix derivations for a faster time machine

2023-11-30 Thread Maxim Cournoyer
Hi Simon,

Simon Tournier  writes:

> Hi,
>
> On mer., 22 nov. 2023 at 19:27, Ludovic Courtès  wrote:
>
>> For long-term storage though, we could choose to keep lzip only (because
>> it compresses better).  Not something we can really do with the current
>> ‘guix publish’ setup though.
>
> It looks good to me.  For me, the priority list looks like:

I'd like to have a single archive type as well in the future, but I'd
settle on Zstd, not lzip, because it's faster to compress and
decompress, and its compression ratio is not that different when using
its highest level (19).

>  1. Keep for as longer as we can all the requirements for running Guix
>  itself, e.g., “guix time-machine”.  Keep all the dependencies and all
>  the outputs of derivations.  At least, for all the ones the build farms
>  are already building.
>
>  2. Keep for 3-5 years all the outputs for specific Guix revision, as
>  v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.

That'd be nice, but not presently doable as we can't fine tune retention
for a particular 'derivation' and its inputs in the Cuirass
configuration, unless I've missed it.

-- 
Thanks,
Maxim



Re: Building and caching old Guix derivations for a faster time machine

2023-11-29 Thread Simon Tournier
Hi,

On mer., 22 nov. 2023 at 19:27, Ludovic Courtès  wrote:

> For long-term storage though, we could choose to keep lzip only (because
> it compresses better).  Not something we can really do with the current
> ‘guix publish’ setup though.

It looks good to me.  For me, the priority list looks like:

 1. Keep for as longer as we can all the requirements for running Guix
 itself, e.g., “guix time-machine”.  Keep all the dependencies and all
 the outputs of derivations.  At least, for all the ones the build farms
 are already building.

 2. Keep for 3-5 years all the outputs for specific Guix revision, as
 v1.0, v1.1, v1.2, v1.3, v1.4.  And some few others.

Cheers,
simon



Re: Building and caching old Guix derivations for a faster time machine

2023-11-22 Thread Ludovic Courtès
Hi,

Maxim Cournoyer  skribis:

>> I agree.  The ‘guix publish’ TTL¹ at ci.guix was increased to 180 days
>> following  in 2021.  That’s still not
>> that much and these days and right now we have 84 TiB free at ci.guix.
>>
>> I guess we can afford increasing the TTL, probably starting with, say,
>> 300 days, and monitoring disk usage.
>>
>> WDYT?
>
> While the 84 TiB we have at our disposal is indeed lot, I'd rather we
> keep the TTL at 180 days, to keep things more manageable for backup/sync
> purposes.  Our current TTL currently yields 7 TiB of compressed NARs,
> which fits nicely into the hydra-guix-129 10 TiB slice available for
> local/simple redundancy (it's still on my TODO, missing the copy bit).
>
> I've been meaning to document an easy mirroring setup for that
> /var/cache/guix/publish directory, and having 14 TiB instead of 7 TiB
> there would hurt such setups.

Maybe we should learn from what Chris has been doing with the
Nar-Herder, too.  Ideally, the build farm front-end (‘berlin’ in this
case) would be merely a cache for recently-built artifacts, and we’d
have long-term storage elsewhere where we could keep nars for several
years.

The important thing being: we need to decouple the build farm from
(long-term) nar provision.

> Perhaps a compromise we could do is drop yet another compression format?
> We carry both Zstd and LZMA for Berlin, which I see little value in; if
> we carried only ZSTD archives we could probably continue having < 10 TiB
> of NARs for a TTL of 360 days (although having only 3.5 TiB of NARs to
> sync around for mirrors would be great too!).
>
> What do you think?

For compatibility reasons¹ and performance reasons², I would refrain
from removing lzip or zstd substitutes, at least for “current”
substitutes.

For long-term storage though, we could choose to keep lzip only (because
it compresses better).  Not something we can really do with the current
‘guix publish’ setup though.

Thoughts?

Ludo’.

¹ Zstd support was added relatively recently.  Older daemons may support
  lzip but not zstd.

² https://guix.gnu.org/en/blog/2021/getting-bytes-to-disk-more-quickly/