Re: Speeding up “guix pull”: splitting modules

2020-01-20 Thread zimoun
Hi Ludo,

On Sun, 12 Jan 2020 at 00:29, Ludovic Courtès  wrote:

> zimoun  skribis:
>
> > So knowing where the cycles are could help to transform the DaG (not
> > fully acyclic yet) to a DAG. :-)
>
> Unfortunately, the module graph necessarily contains cycles.  The only
> way to avoid them would be to have exactly one module per package, given
> that the package graph is acyclic.

My mind is not clear and I miss a point.

I understand that module graph contains cycles. I am not convinced it
is *necessary* and I am not convinced neither that the only solution
is "one module per package".

Today, the structure of modules is per language and/or topic.
The union of all the package graphs which is big DAG can be
arbitrarily splited into sub-DAGs and these sub-DAGs becomes modules.
And I am not convinced that the size of these sub-DAG is only one
package.


Well, that's another story. :-)
And it will not help for speeding up.

However, working on the graph could help to identify the "hot"
packages and/or modules, IMHO.


Cheers,
simon



Re: Speeding up “guix pull”: splitting modules

2020-01-19 Thread Ludovic Courtès
Hi!

Christopher Baines  skribis:

> Ludovic Courtès  writes:

[...]

>> Another less elegant option would be to resort to a service (Data
>> Service or Cuirass) that would map a Guix commit to its .drv, and then
>> fetch that .drv from substitute servers and build it.
>
> I haven't got around to storing the derivations for `guix pull` in the
> Guix Data Service, but it does provide substitutes for derivations now,
> so support for this is only one step away now.

Sweet!

>> Finally, we could/should also profile that phase and see what can be
>> done.
>>
>> Lastly :-), it would be great to profile this phase over time to see how
>> it evolves.  (Does the Guix Data Service already stores timings for such
>> things?)
>
> There are timings output in the logs for jobs ([1] for example), it's
> not really feasible to compare that across different revisions.

Right, and the timing after “Finished building the channel derivation”
is not directly useful because it depends on the state of the store (the
number of Guix dependencies that had already been built).

> I do want to improve the process of loading data in to the Guix Data
> Service though, and that might involve storing this data in a more
> structured way, so that it can be analysed more easily.

Cool.

Thanks,
Ludo’.



Re: Speeding up “guix pull”: splitting modules

2020-01-16 Thread Christopher Baines

Ludovic Courtès  writes:

> BTW, there’s also the “Compute Guix derivation” phase that could be sped
> up, and that’s mostly independent of Guile.
>
> Things that could help include: “recursive Nix”¹ (the Guix derivation
> would itself be the result of a derivation, thus subject to
> substitutes), or something equivalent without special daemon support
> where we’d simply embed (guix derivations) & co. to compute the
> derivation in memory.
>
> Another less elegant option would be to resort to a service (Data
> Service or Cuirass) that would map a Guix commit to its .drv, and then
> fetch that .drv from substitute servers and build it.

I haven't got around to storing the derivations for `guix pull` in the
Guix Data Service, but it does provide substitutes for derivations now,
so support for this is only one step away now.

> Finally, we could/should also profile that phase and see what can be
> done.
>
> Lastly :-), it would be great to profile this phase over time to see how
> it evolves.  (Does the Guix Data Service already stores timings for such
> things?)

There are timings output in the logs for jobs ([1] for example), it's
not really feasible to compare that across different revisions.

I do want to improve the process of loading data in to the Guix Data
Service though, and that might involve storing this data in a more
structured way, so that it can be analysed more easily.


1: look for ", took ": http://data.guix.gnu.org/job/10722


signature.asc
Description: PGP signature


Re: Speeding up “guix pull”: splitting modules

2020-01-11 Thread Ludovic Courtès
Hello,

zimoun  skribis:

> So knowing where the cycles are could help to transform the DaG (not
> fully acyclic yet) to a DAG. :-)

Unfortunately, the module graph necessarily contains cycles.  The only
way to avoid them would be to have exactly one module per package, given
that the package graph is acyclic.

Thanks,
Ludo’.



Re: Speeding up “guix pull”: splitting modules

2020-01-10 Thread Ricardo Wurmus


zimoun  writes:

> On Wed, 8 Jan 2020 at 22:50, Ludovic Courtès  wrote:
>> Ricardo Wurmus  skribis:
>
>> > On the other hand: this would need to be an ongoing effort.  Newly
>> > introduced packages or even new features might create complex module
>> > cycles.  It sounds tedious to keep track of this and to enforce
>> > boundaries.
>>
>> Yes, I think this is a dead end: glibc could well end up become on
>> Haskell (hi, Pandoc!), and then the whole module split effort collapses.
>
> What kind of metrics could help to detect which modules are going to
> the wrong way?
> For example, would some DAG post-processings help?

I don’t think it’s feasible to clean up the module graph in a way that
can be kept clean long enough.  Ideally we wouldn’t need to think about
Guile modules at all, so the best screw to turn appears to be in the
Guile compiler and not in Guix.

--
Ricardo




Re: Speeding up “guix pull”: splitting modules

2020-01-10 Thread zimoun
Hi Gábor,

On Fri, 10 Jan 2020 at 13:42, Gábor Boskovits  wrote:

> > The modules graph (DAG) is already available. :-)
>
> The main problem here is that the modules do not form a DAG.
> There are circular dependencies between the modules.
> If those were not, then modular build would be possible, but because of
> the spaghetti we are forced to build these together.

Maybe we have a naming problem. :-)

I agree that it is not an Acyclic graph and if I understand you
correctly it is because there are cycles that the mess starts.

Using the Directed properties (but not required in fact, whatever :-),
traversing the graph detects the cycle. It is more or less what it is
done in the function `guix/import/utils.scm (topological-sort)`.

So knowing where the cycles are could help to transform the DaG (not
fully acyclic yet) to a DAG. :-)


All the best,
simon



Re: Speeding up “guix pull”: splitting modules

2020-01-10 Thread Gábor Boskovits
Hello zimoun,

zimoun  ezt írta (időpont: 2020. jan. 10., P, 13:10):
>
> Hi Gábor,
>
> Thank you for the explanations.
>
> Below, I am thinking loudly. :-)
>
> On Tue, 7 Jan 2020 at 21:14, Gábor Boskovits  wrote:
>
> > > > Gábor once suggested an iterative approach of identifying the most
> > > > important nodes in the package graph that should be moved to their own
> > > > modules, so that we would end up with a package graph that looks less
> > > > like a hair ball.  I think it would useful to get a better view on the
> > > > relationships between modules.
>
> [...]
>
> > I was suggesting the following idea:
> > The underlying package graph is actually a DAG, so by splitting modules it 
> > is
> > possible to achieve a state where the module graph also becomes a DAG.
> > The exact heuristic or algorithm of splitting was not defined.
>
> The modules graph (DAG) is already available. :-)

The main problem here is that the modules do not form a DAG.
There are circular dependencies between the modules.
If those were not, then modular build would be possible, but because of
the spaghetti we are forced to build these together.

> Modulo some glue code / tweaks, the tools are already in
> guix/graph.scm and guix/scripts/graph.scm, if I understand correctly.
> And note that the commit  ddd59159004ca73c9449a27945116ff5069c3743
> introduces topological sort -- for another topic: recursive import --
> but could be reused /adapted to detect cycles, IMHO.
>
> Would a weighted DAG help to detect which modules are in "bad shape"?
> Other said, mix somehow the packages DAG and the modules DAG?
>
> For example, the edge between the module m1 and the module m2 should
> be weighted by the ratio between the number of packages defined in the
> module m2 required in the module m1.
> So then, sorting this edges by weight value, it would give a better
> view of the relationship between modules.
>
> What do you think?
>
> We need to detect which "big" modules are imported for "few" packages, right?
>
>
> Aside, a naive question: does '#:select' improve the situation?

I don't know, but it is an interesting question.

>
>
>
> All the best,
> simon


Best regards,
g_bor
-- 
OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21



Re: Speeding up “guix pull”: splitting modules

2020-01-10 Thread zimoun
Hi Ludo,

On Wed, 8 Jan 2020 at 22:50, Ludovic Courtès  wrote:
> Ricardo Wurmus  skribis:

> > On the other hand: this would need to be an ongoing effort.  Newly
> > introduced packages or even new features might create complex module
> > cycles.  It sounds tedious to keep track of this and to enforce
> > boundaries.
>
> Yes, I think this is a dead end: glibc could well end up become on
> Haskell (hi, Pandoc!), and then the whole module split effort collapses.

What kind of metrics could help to detect which modules are going to
the wrong way?
For example, would some DAG post-processings help?


All the best,
simon



Re: Speeding up “guix pull”: splitting modules

2020-01-10 Thread zimoun
Hi Gábor,

Thank you for the explanations.

Below, I am thinking loudly. :-)

On Tue, 7 Jan 2020 at 21:14, Gábor Boskovits  wrote:

> > > Gábor once suggested an iterative approach of identifying the most
> > > important nodes in the package graph that should be moved to their own
> > > modules, so that we would end up with a package graph that looks less
> > > like a hair ball.  I think it would useful to get a better view on the
> > > relationships between modules.

[...]

> I was suggesting the following idea:
> The underlying package graph is actually a DAG, so by splitting modules it is
> possible to achieve a state where the module graph also becomes a DAG.
> The exact heuristic or algorithm of splitting was not defined.

The modules graph (DAG) is already available. :-)
Modulo some glue code / tweaks, the tools are already in
guix/graph.scm and guix/scripts/graph.scm, if I understand correctly.
And note that the commit  ddd59159004ca73c9449a27945116ff5069c3743
introduces topological sort -- for another topic: recursive import --
but could be reused /adapted to detect cycles, IMHO.

Would a weighted DAG help to detect which modules are in "bad shape"?
Other said, mix somehow the packages DAG and the modules DAG?

For example, the edge between the module m1 and the module m2 should
be weighted by the ratio between the number of packages defined in the
module m2 required in the module m1.
So then, sorting this edges by weight value, it would give a better
view of the relationship between modules.

What do you think?

We need to detect which "big" modules are imported for "few" packages, right?


Aside, a naive question: does '#:select' improve the situation?



All the best,
simon



Re: Speeding up “guix pull”: splitting modules

2020-01-08 Thread Ludovic Courtès
BTW, there’s also the “Compute Guix derivation” phase that could be sped
up, and that’s mostly independent of Guile.

Things that could help include: “recursive Nix”¹ (the Guix derivation
would itself be the result of a derivation, thus subject to
substitutes), or something equivalent without special daemon support
where we’d simply embed (guix derivations) & co. to compute the
derivation in memory.

Another less elegant option would be to resort to a service (Data
Service or Cuirass) that would map a Guix commit to its .drv, and then
fetch that .drv from substitute servers and build it.

Finally, we could/should also profile that phase and see what can be
done.

Lastly :-), it would be great to profile this phase over time to see how
it evolves.  (Does the Guix Data Service already stores timings for such
things?)

Ludo’.

¹ https://github.com/NixOS/nix/pull/3205



Re: Speeding up “guix pull”: splitting modules

2020-01-08 Thread Ludovic Courtès
Hello!

Ricardo Wurmus  skribis:

> On the other hand: this would need to be an ongoing effort.  Newly
> introduced packages or even new features might create complex module
> cycles.  It sounds tedious to keep track of this and to enforce
> boundaries.

Yes, I think this is a dead end: glibc could well end up become on
Haskell (hi, Pandoc!), and then the whole module split effort collapses.

Ultimately, I think we need to look into optimizing the compiler.  The
profiling I did a while back¹ suggests pessimal behavior of some of the
compiler phases when given large inputs.

Ludo’.

¹ https://lists.gnu.org/archive/html/guile-devel/2017-10/msg00035.html
  (I’ve been meaning to resume that investigation for a long time, but
  I’d really need to do nothing but that for a couple of weeks…)




Re: Speeding up “guix pull”: splitting modules

2020-01-07 Thread Gábor Boskovits
Hello,

zimoun  ezt írta (időpont: 2020. jan. 7., K, 19:38):
>
> Hi Ricardo,
>
> On Sun, 5 Jan 2020 at 21:38, Ricardo Wurmus  wrote:
>
> > Gábor once suggested an iterative approach of identifying the most
> > important nodes in the package graph that should be moved to their own
> > modules, so that we would end up with a package graph that looks less
> > like a hair ball.  I think it would useful to get a better view on the
> > relationships between modules.
>
> I am not sure to correctly understand. What does it mean "the most
> important nodes in the package graph"?
> Does "important node" mean a package which appears as inputs in a lot
> of other packages?

I was suggesting the following idea:
The underlying package graph is actually a DAG, so by splitting modules it is
possible to achieve a state where the module graph also becomes a DAG.
The exact heuristic or algorithm of splitting was not defined.

One idea which is quite cloes to my heart is to make a one package per module
rewrite, with possibly some grouping modules for convenience on top, but that
was determined not to be feasible performacewise at the time.

>
>
> Cheers,
> simon
>

Best regards,
g_bor
-- 
OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21



Re: Speeding up “guix pull”: splitting modules

2020-01-07 Thread zimoun
Hi Ricardo,

On Sun, 5 Jan 2020 at 21:38, Ricardo Wurmus  wrote:

> Gábor once suggested an iterative approach of identifying the most
> important nodes in the package graph that should be moved to their own
> modules, so that we would end up with a package graph that looks less
> like a hair ball.  I think it would useful to get a better view on the
> relationships between modules.

I am not sure to correctly understand. What does it mean "the most
important nodes in the package graph"?
Does "important node" mean a package which appears as inputs in a lot
of other packages?


Cheers,
simon



Re: Speeding up “guix pull”: splitting modules

2020-01-06 Thread Andy Wingo
Hi,

Just a couple notes here regarding the concrete questions.  I'm sure
there are good solutions but I don't know what they are yet!

On Sun 05 Jan 2020 21:37, Ricardo Wurmus  writes:

> Or could we improve compilation
> times by disabling optimizations?

I think Guix does this already; see %lightweight-optimizations in (guix
build compile).

> Or by marking the modules as declarative and thus unlock more
> optimizations (with Guile 3)?

Not really; the declarative modules optimization only really helps cases
where inlining would help.  For package definitions, there's not much to
inline, I don't think.  Given that the consideration is compile-time
rather than run-time I would not think that declarative modules would
help.

Andy



Speeding up “guix pull”: splitting modules

2020-01-05 Thread Ricardo Wurmus
Hi Guix,

thanks to Ludo’s excellent work on “guix pull”, updating Guix has become
a lot faster than in the old days.  With build farm support one usually
only needs to wait a little while to be able to download all parts of
an up-to-date Guix.

Unfortunately, changes to the repository invalidate expensive
derivations regularly, and waiting for the build farm to get around to
building these derivations gets old quickly.  Running “guix pull”
repeatedly is a drag because the chances of getting a fully remotely
built Guix are pretty slim.

Part of the problem is that there’s just one derivation for the
compilation of all package modules.  This one derivation’s output is an
input to another expensive derivation: the one for the Guix System
(services, etc).

We should aim to split up the guix-packages derivation, so that the
guix-system derivation will not have to be rebuilt when its packages
aren’t affected.  Unfortunately, it’s difficult to build modules
independently from one another because of the many inter-module
dependency cycles.

Gábor once suggested an iterative approach of identifying the most
important nodes in the package graph that should be moved to their own
modules, so that we would end up with a package graph that looks less
like a hair ball.  I think it would useful to get a better view on the
relationships between modules.

On the other hand: this would need to be an ongoing effort.  Newly
introduced packages or even new features might create complex module
cycles.  It sounds tedious to keep track of this and to enforce
boundaries.

Is there an alternative?  Could we, for example, make all lookups of
cross-module package references lazy?  Or could we improve compilation
times by disabling optimizations?  Or by marking the modules as
declarative and thus unlock more optimizations (with Guile 3)?

--
Ricardo