Re: Can we speed it up? Prev: compiling guix is too slow?

2018-02-05 Thread Pjotr Prins
On Mon, Feb 05, 2018 at 10:56:54AM +0100, Konrad Hinsen wrote:
> Pjotr Prins  writes:
> 
> >> I wonder if anyone has analyzed the dependency graphs of software
> >> packages (not necessarily for Guix, some big distribution like
> >> Debian would be more interesting), with the goal if identifying good
> >> splits based on simple criteria.
> >
> > Yeah, that would be a neat exercise. Any student here inclined to have
> > a go?
> 
> As was to be expected, a quick search found some promising pointers, e.g.:
> 
>   https://arxiv.org/abs/0905.4226
> 
>   http://ieeexplore.ieee.org/document/7490780/
> 
> Plus a lot more on language-specific dependency analysis, which is less
> directly useful but may contain interesting methods that could be
> generalized.
> 
> So, yes, this would make a good research project (master's level for
> example) with the potential for academic recognition.

Agree. I am thinking we should add a section to the website with such
projects.

Pj.
-- 



Re: Can we speed it up? Prev: compiling guix is too slow?

2018-02-05 Thread Ludovic Courtès
Pjotr Prins  skribis:

> What will it be like with 15K packages? We will get there. We can
> actually try it now by doubling the package tree - anyone wants to try
> and create a simulation? I.e., not just double the tree, make sure
> there are cross references between the two graphs by modifying some of
> the inputs at random.
>
> This leads to the following thought: why don't we create a 'lazy'
> build. There are multiple ways to go about it (I think). One would be
> to parse scheme files for package names and only compile those that
> are needed when someone invokes a guix command (and have not been
> compiled yet). Or generate a meta list for a source tree.

Note that there are several things we could do:

  • Not compile gnu/packages/*.scm at all and instead turn on Guile’s
auto-compilation.  As things are currently, the first ‘guix package’
invocation would take ages though.

  • Never compile gnu/packages/*.scm and instead interpret it, though
that’s currently relatively slow and probably more memory-consuming.

I think with ‘wip-pull-reload’ (I’ll resume work on it, I promise!)
things should already be nicer, and then, we should keep improving the
compiler (Andy already significantly improved CPU consumption in Guile
2.2.3).

> Or subcategorize packages so only those packages get included that are
> asked for (assuming there are no deeper dependencies). For example,
> few people need the bioinformatics packages. We could have the sub
> section of the graph split out and have people do:
>
>   guix package --topic=bio -i samtools
>
> for example and compile the contents gnu/packages/bio/ directory when
> that happens the first time for a specific checkout.

I think that should be the last resort, but yeah.

Another thing we could do to speed up lookup-by-name is to maintain a
cache that maps package names to modules (currently we always traverse
all the package modules with ‘fold-packages’.)

> I think scalability is a good goal and instant compilation another ;).
> A few years back it just took 30 seconds to build Guix.

Yep, scalability is this year’s challenge!

Ludo’.



Re: Can we speed it up? Prev: compiling guix is too slow?

2018-02-05 Thread Konrad Hinsen
Pjotr Prins  writes:

>> I wonder if anyone has analyzed the dependency graphs of software
>> packages (not necessarily for Guix, some big distribution like
>> Debian would be more interesting), with the goal if identifying good
>> splits based on simple criteria.
>
> Yeah, that would be a neat exercise. Any student here inclined to have
> a go?

As was to be expected, a quick search found some promising pointers, e.g.:

  https://arxiv.org/abs/0905.4226

  http://ieeexplore.ieee.org/document/7490780/

Plus a lot more on language-specific dependency analysis, which is less
directly useful but may contain interesting methods that could be
generalized.

So, yes, this would make a good research project (master's level for
example) with the potential for academic recognition.

Konrad.



Re: Can we speed it up? Prev: compiling guix is too slow?

2018-02-05 Thread Pjotr Prins
On Mon, Feb 05, 2018 at 09:15:37AM +0100, Konrad Hinsen wrote:
> On 05/02/2018 08:34, Pjotr Prins wrote:
> 
> >compiled yet). Or generate a meta list for a source tree. Or
> >subcategorize packages so only those packages get included that are
> >asked for (assuming there are no deeper dependencies). For example,
> >few people need the bioinformatics packages. We could have the sub
> >section of the graph split out and have people do:
> >
> >   guix package --topic=bio -i samtools
> 
> Or move special-topic packages to separate channels, once they get
> implemented. The hard part is of course *where* to split the graph,
> not how to implement it.

Channels are coming. I agree that bioinformatics could become a
channel - that is what bioconda does for Conda. But for other sections
of the graph (say Ruby) we better not split out to channels. That is
why I am bringing this up as an in-trunk solution.

> >Sectioning the graph may be hard (you'd be inclined to section off
> >languages and window managers), but I think it can be dictated by
> >whether a sub graph can live on its own.
> I wonder if anyone has analyzed the dependency graphs of software
> packages (not necessarily for Guix, some big distribution like
> Debian would be more interesting), with the goal if identifying good
> splits based on simple criteria.

Yeah, that would be a neat exercise. Any student here inclined to have
a go?

Pj.



Re: Can we speed it up? Prev: compiling guix is too slow?

2018-02-05 Thread Konrad Hinsen

On 05/02/2018 08:34, Pjotr Prins wrote:


compiled yet). Or generate a meta list for a source tree. Or
subcategorize packages so only those packages get included that are
asked for (assuming there are no deeper dependencies). For example,
few people need the bioinformatics packages. We could have the sub
section of the graph split out and have people do:

   guix package --topic=bio -i samtools


Or move special-topic packages to separate channels, once they get 
implemented. The hard part is of course *where* to split the graph, not 
how to implement it.



Sectioning the graph may be hard (you'd be inclined to section off
languages and window managers), but I think it can be dictated by
whether a sub graph can live on its own.
I wonder if anyone has analyzed the dependency graphs of software 
packages (not necessarily for Guix, some big distribution like Debian 
would be more interesting), with the goal if identifying good splits 
based on simple criteria.


Konrad.