Re: Faster "guix pull" by incremental compilation and non-circular modules?

2022-05-31 Thread Ricardo Wurmus


Maxime Devos  writes:

> [[PGP Signed Part:Undecided]]
> Gábor Boskovits schreef op di 31-05-2022 om 06:54 [+0200]:
>> I was thinking about a bit of a different structure that can also be
>> automated. My original idea was to use the already existing tree
>> structure of the derivations, and split it based on depth. I think
>> that gives a bit more structure, but might require splitting things
>> that now are together (for example iirc sometimes we are defining
>> bootstrap packages inheriting from the fully fledged ones, which
>> introduces a syntactic dependency on something that is  higher up the
>> tree). Wdyt?
>
> The package modules could be split and reorganised a bit to roughly
> follow the derivation DAG, if that's what you mean (*)? Then the "guix"
> package would depend on less -> compute-guix-derivation etc has less to
> do, also good for lowering "guix $do_something" memory footprint. 

This seems like a thing we’d have to do repeatedly as module cycles can
appear due to seemingly innocent package upgrades.

I get the appeal of untangling the module graph, but I suspect it will
turn out to be a lot of busywork leading to repeated disruption (because
packages keep moving to different modules, breaking third party channels
for no good reason) for little gain.

I do hope we can reduce the amount of work that compute-guix-derivation
has to perform, even if it causes some disruption once.

-- 
Ricardo



Re: Faster "guix pull" by incremental compilation and non-circular modules?

2022-05-31 Thread Maxime Devos
Gábor Boskovits schreef op di 31-05-2022 om 06:54 [+0200]:
> I was thinking about a bit of a different structure that can also be
> automated. My original idea was to use the already existing tree
> structure of the derivations, and split it based on depth. I think
> that gives a bit more structure, but might require splitting things
> that now are together (for example iirc sometimes we are defining
> bootstrap packages inheriting from the fully fledged ones, which
> introduces a syntactic dependency on something that is  higher up the
> tree). Wdyt?

The package modules could be split and reorganised a bit to roughly
follow the derivation DAG, if that's what you mean (*)? Then the "guix"
package would depend on less -> compute-guix-derivation etc has less to
do, also good for lowering "guix $do_something" memory footprint. 
Seems orthogonal to my proposal.  I guess we could eventually do both?

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


Re: Faster "guix pull" by incremental compilation and non-circular modules?

2022-05-30 Thread Gábor Boskovits
Hello Maxime,

Maxime Devos  ezt írta (időpont: 2022. febr. 28.,
H, 19:51):

> Ludovic Courtès schreef op ma 28-02-2022 om 14:17 [+0100]:
> > Hi,
> >
> > Maxime Devos  skribis:
> >
> > >   2. Instead of building all of Guix as a single derivation,
> > >  create a DAG of derivations.  More concretely:
> > >
> > >  First read the *.scm files to determine which module imports
> > >  which modules. Then to compile, say, (gnu packages acl),
> > >  a derivation taking gnu/packages/acl.scm and its dependencies
> > >  gnu/packages/attr.go, gnu/packages/base.go, ... is made
> > >  compiling gnu/packages/acl.scm to a gnu/packages/acl.go.
> > >
> > >  Then to build all of Guix, 'union-build' or 'file-union' is used.
> >
> > This is what (guix self), used by ‘guix pull’, is already doing.
> >
> > However, currently, package modules are split in just two groups: the
> > “base” group is the closure of (guix packages base), and the second
> > group has all the rest:
> >
> > [...]
>
> Looking at (guix self), it also has a few groups for non-package
> modules (system tests, scripts, ...).
>
> > At its core though, the situation pretty much reflects the free software
> > situation: there are low-level packages (glibc, GCC, GTK, etc.) that
> > might depend on high-level packages (Python, Pandoc, Rust, etc.).
> >
> > It’s not easy to split this spaghetti ball in smaller groups.
>
> It's not easy to manually split the spaghetti, but we don't have
> to, we could let the computer split the spaghetti for us (at least
> partially, because of the circular imports), by computing the graph of
> strongly-connected components and considering each SCC to be a ‘group’,
> some of which depend on other groups, forming a DAG.
>

I was thinking about a bit of a different structure that can also be
automated. My original idea was to use the already existing tree structure
of the derivations, and split it based on depth. I think that gives a bit
more structure, but might require splitting things that now are together
(for example iirc sometimes we are defining bootstrap packages inheriting
from the fully fledged ones, which introduces a syntactic dependency on
something that is  higher up the tree). Wdyt?

Regards,
g_bor


> I believe Ricardo Wurmus has some script for computing the SCC?
>
> Splitting large SCC in smaller parts can be left as an exercise
> for later, it's a somewhat orthogonal concern.
>
> > Thoughts?
>
> I think it would be nice to let (guix self) automatically determine the
> DAG of groups.  It would reduce the ad-hocness of the *...-modules*
> variables (some care required for patches, guix/man-db.scm, .js ...).
>
> It would also make the node tree wide, which could reduce memory usage
> (which might help with the ‘guix pull segfaults on i686-linux’
> reports).  In case of a crash (*), "guix pull" does not have to start
> over from scratch, which would also help with those reports.
>
> (*) This does not help with "failed to compute the derivation of Guix".
>
> Some work would be required, but I think it will be worth it, and it
> only has to be done once.
>
> TBC, this was just an idea I wanted to share, I won't be working on it
> in the forseeable future.
>
> Greetings,
> Maxime.
>


Re: Faster "guix pull" by incremental compilation and non-circular modules?

2022-02-28 Thread Maxime Devos
Ludovic Courtès schreef op ma 28-02-2022 om 14:17 [+0100]:
> Hi,
> 
> Maxime Devos  skribis:
> 
> >   2. Instead of building all of Guix as a single derivation,
> >  create a DAG of derivations.  More concretely:
> > 
> >  First read the *.scm files to determine which module imports
> >  which modules. Then to compile, say, (gnu packages acl),
> >  a derivation taking gnu/packages/acl.scm and its dependencies
> >  gnu/packages/attr.go, gnu/packages/base.go, ... is made
> >  compiling gnu/packages/acl.scm to a gnu/packages/acl.go.
> > 
> >  Then to build all of Guix, 'union-build' or 'file-union' is used.
> 
> This is what (guix self), used by ‘guix pull’, is already doing.
> 
> However, currently, package modules are split in just two groups: the
> “base” group is the closure of (guix packages base), and the second
> group has all the rest:
> 
> [...]

Looking at (guix self), it also has a few groups for non-package
modules (system tests, scripts, ...).

> At its core though, the situation pretty much reflects the free software
> situation: there are low-level packages (glibc, GCC, GTK, etc.) that
> might depend on high-level packages (Python, Pandoc, Rust, etc.).
> 
> It’s not easy to split this spaghetti ball in smaller groups.

It's not easy to manually split the spaghetti, but we don't have
to, we could let the computer split the spaghetti for us (at least
partially, because of the circular imports), by computing the graph of
strongly-connected components and considering each SCC to be a ‘group’,
some of which depend on other groups, forming a DAG.

I believe Ricardo Wurmus has some script for computing the SCC?

Splitting large SCC in smaller parts can be left as an exercise
for later, it's a somewhat orthogonal concern.

> Thoughts?

I think it would be nice to let (guix self) automatically determine the
DAG of groups.  It would reduce the ad-hocness of the *...-modules*
variables (some care required for patches, guix/man-db.scm, .js ...).

It would also make the node tree wide, which could reduce memory usage
(which might help with the ‘guix pull segfaults on i686-linux’
reports).  In case of a crash (*), "guix pull" does not have to start
over from scratch, which would also help with those reports.

(*) This does not help with "failed to compute the derivation of Guix".

Some work would be required, but I think it will be worth it, and it
only has to be done once.

TBC, this was just an idea I wanted to share, I won't be working on it
in the forseeable future.

Greetings,
Maxime.


signature.asc
Description: This is a digitally signed message part


Re: Faster "guix pull" by incremental compilation and non-circular modules?

2022-02-28 Thread Ludovic Courtès
Hi,

Maxime Devos  skribis:

>   2. Instead of building all of Guix as a single derivation,
>  create a DAG of derivations.  More concretely:
>
>  First read the *.scm files to determine which module imports
>  which modules. Then to compile, say, (gnu packages acl),
>  a derivation taking gnu/packages/acl.scm and its dependencies
>  gnu/packages/attr.go, gnu/packages/base.go, ... is made
>  compiling gnu/packages/acl.scm to a gnu/packages/acl.go.
>
>  Then to build all of Guix, 'union-build' or 'file-union' is used.

This is what (guix self), used by ‘guix pull’, is already doing.

However, currently, package modules are split in just two groups: the
“base” group is the closure of (guix packages base), and the second
group has all the rest:

--8<---cut here---start->8---
scheme@(guile-user)> ,use(guix modules)
scheme@(guile-user)> (length (source-module-closure '((gnu packages base
$4 = 417
scheme@(guile-user)> ,use(ice-9 ftw)
scheme@(guile-user)> (length (scandir "gnu/packages" (lambda (f) 
(string-suffix? ".scm" f
$5 = 537
scheme@(guile-user)> (- $5 $4)
$6 = 120
--8<---cut here---end--->8---

This is clearly uneven, especially considering that the biggest file,
crates-io.scm, is in the first group.

At its core though, the situation pretty much reflects the free software
situation: there are low-level packages (glibc, GCC, GTK, etc.) that
might depend on high-level packages (Python, Pandoc, Rust, etc.).

It’s not easy to split this spaghetti ball in smaller groups.

Thoughts?

Ludo’.



Re: Faster "guix pull" by incremental compilation and non-circular modules?

2022-02-20 Thread Philip McGrath
Hi,

On Sunday, February 20, 2022 7:19:07 AM EST Maxime Devos wrote:
> Ekaitz Zarraga schreef op zo 20-02-2022 om 11:34 [+]:
> > Making a Guix pull is unpredictable too...
> 
> An idea for making "guix pull" faster:
> 
>   1. Make package imports non-circular, breaking up package modules
>  in parts when necessary.
> 
>   2. Instead of building all of Guix as a single derivation,
>  create a DAG of derivations.  More concretely:
> 
>  First read the *.scm files to determine which module imports
>  which modules. Then to compile, say, (gnu packages acl),
>  a derivation taking gnu/packages/acl.scm and its dependencies
>  gnu/packages/attr.go, gnu/packages/base.go, ... is made
>  compiling gnu/packages/acl.scm to a gnu/packages/acl.go.
> 
>  Then to build all of Guix, 'union-build' or 'file-union' is used.
> 
> The benefit is that if, say, gnu/packages/gnunet.scm is changed, then
> the old gnu/packages/acl.go will be reused because its derivation
> doesn't depend on gnu/packages/gnunet.scm (*).
> 
> The need for non-circular imports can be avoided by computing the
> strongly-connected components and compiling all the modules in a
> component as a single unit.
> 
> Greetings,
> Maxime.
> 
> (*) Actually I didn't verify if acl.scm depends on gnunet.scm or not.

I was just (or maybe am still?) dealing with some issues caused by cyclic
imports of package modules while updating Racket to 8.4: see in particular
, as well as #66, #112, and #113 in the
same thread.

As I mentioned there, I am most familiar with the semantics of Racket
modules,[1][2][3][4][5][6] and secondarily with R6RS libraries.[7]

I find the semantics of Guile's cyclic module imports very confusing, and I
don't know of any documentation for what is and isn't supported other
than advice from Ludo’ in —is there any?

Racket's module system fundamentally requires that module imports for a DAG
(see footnote 1 in § 2.2 of [1]), which is necessary to provide “The Separate
Compilation Guarantee”[1][6]:

> Any effects of the instantiation of the module’s phase 1 due to compilation
> on the Racket runtime system are discarded.

This is an extremely useful property, especially if you care about
reproducibility.

I know that R6RS does not provide The Separate Compilation Guarantee, since
§ 7.2 [8] explicitly says:

> An implementation may distinguish instances/visits of a library for
> different phases or to use an instance/visit at any phase as an
> instance/visit at any other phase. An implementation may further expand
> each library form with distinct visits of libraries in any phase and/or
> instances of libraries in phases above 0. An implementation may create
> instances/visits of more libraries at more phases than required to satisfy
> references. When an identifier appears as an expression in a phase that is
> inconsistent with the identifier's level, then an implementation may raise
> an exception either at expand time or run time, or it may allow the
> reference. Thus, a library whose meaning depends on whether the instances
> of a library are distinguished or shared across phases or library
> expansions may be unportable.

I haven't found anything in R6RS that explicitly addresses cyclic library
imports, but I think this passage from § 7.2 [8]:

> If any of a library's definitions are referenced at phase 0 in the expanded
> form of a program, then an instance of the referenced library is created
> for phase 0 before the program's definitions and expressions are evaluated.
> This rule applies transitively: if the expanded form of one library
> references at phase 0 an identifier from another library, then before the
> referencing library is instantiated at phase n, the referenced library must
> be instantiated at phase n. When an identifier is referenced at any phase n
> greater than 0, in contrast, then the defining library is instantiated at
> phase n at some unspecified time before the reference is evaluated.
> Similarly, when a macro keyword is referenced at phase n during the
> expansion of a library, then the defining library is visited at phase n at
> some unspecified time before the reference is evaluated.

combined with the specification in § 7.1 [9] that:

> The expressions of variable definitions are evaluated from left to right, as
> if in an implicit letrec*, and the body expressions are also evaluated from
> left to right after the expressions of the variable definitions. A fresh
> location is created for each exported variable and initialized to the value
> of its local counterpart. The effect of returning twice to the continuation
> of the last body expression is unspecified.

means that they are prohibited. If library (a) and library (b) were each to
import each other and each to refer to the other's definitions at phase 0,
then library (a) must be instantiated *before* library (b), but library (b)
must also be 

Faster "guix pull" by incremental compilation and non-circular modules?

2022-02-20 Thread Maxime Devos
Ekaitz Zarraga schreef op zo 20-02-2022 om 11:34 [+]:
> Making a Guix pull is unpredictable too...

An idea for making "guix pull" faster:

  1. Make package imports non-circular, breaking up package modules
 in parts when necessary.

  2. Instead of building all of Guix as a single derivation,
 create a DAG of derivations.  More concretely:

 First read the *.scm files to determine which module imports
 which modules. Then to compile, say, (gnu packages acl),
 a derivation taking gnu/packages/acl.scm and its dependencies
 gnu/packages/attr.go, gnu/packages/base.go, ... is made
 compiling gnu/packages/acl.scm to a gnu/packages/acl.go.

 Then to build all of Guix, 'union-build' or 'file-union' is used.

The benefit is that if, say, gnu/packages/gnunet.scm is changed, then
the old gnu/packages/acl.go will be reused because its derivation
doesn't depend on gnu/packages/gnunet.scm (*).

The need for non-circular imports can be avoided by computing the
strongly-connected components and compiling all the modules in a
component as a single unit.

Greetings,
Maxime.

(*) Actually I didn't verify if acl.scm depends on gnunet.scm or not.


signature.asc
Description: This is a digitally signed message part