Re: Performance of computing cross derivations

2024-01-11 Thread Efraim Flashner
On Thu, Jan 11, 2024 at 01:26:35PM +, Christopher Baines wrote:
> 
> Ludovic Courtès  writes:
> 
> > Christopher Baines  skribis:
> >
> >> I think you're right, while I send some other changes in #68266, I think
> >> it's this change around make-rust-sysroot that has pretty much all the
> >> effects on performance.
> >>
> >> I think the tens of thousands of duplicated packages from cross-base
> >> that I was looking at are almost entirely coming from
> >> make-rust-sysroot. As Ludo mentions in [1], maybe this has something to
> >> do with use of cross- procedures in native-inputs, although I'm not sure
> >> that moving those calls out of native-inputs is a correct thing to do.
> >>
> >> I don't know what the correct approach here is, but I think something
> >> needs doing here to address the performance regression.
> >
> > I probably missed it in the thread: what commit caused the regression,
> > and how can I test any changes?  I’m willing to help but I missed some
> > of the context.
> 
> It's not a pure performance regression, more that in it's current form,
> rust cross derivations are very expensive to compute. It's been this way
> since cross-compiling was enabled in [1].
> 
> 1: 
> https://git.savannah.gnu.org/cgit/guix.git/patch/?id=e604972d9c697302691aeb22e9c50c933a1a3c72
> 
> I've been looking at data service slowness in processing revisions over
> the last few weeks, and I think it's mostly down to this. Looking at the
> revision prior to the change [2], computing all the derivations took
> around 3 hours, which is ages, but still quick compared to the nearly 9
> hours it took after this change [3].
> 
> 2: https://data.guix.gnu.org/revision/58bbb38c5bd2e42aab9e9408d8c9d8da3409f178
> 3: https://data.guix.gnu.org/revision/c9e1a72cc27925484635ae01bc4de28bf232689d
> 
> Obviously having more derivations is good and that usually means more
> work for the data service, but in this case it seems like things can be
> sped up quite a bit.
> 
> For testing locally, I've been computing all the derivations for
> i586-pc-gnu, but Efraim also posted a concise command to look at
> computing some cross derivations for a subset of rust packages [4].
> 
> 4: https://lists.gnu.org/archive/html/guix-devel/2024-01/msg00053.html

list-all-cargo-build-system-packages is actually a script I have locally
that I should probably put in the etc/teams/rust folder.  I've attached
it in case anyone wants to try it out, or see the speed-up of computing
the cross-derivations.

-- 
Efraim Flashner  רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
guile -c '(use-modules (gnu packages)(guix packages)(guix build-system)) 
(display (fold-packages (lambda (package lst) (if (eq? (build-system-name 
(package-build-system package)) (quote cargo)) (cons package lst) lst)) 
(list)))' | tr ' ' '\n' | grep \@


signature.asc
Description: PGP signature


Re: Performance of computing cross derivations

2024-01-11 Thread Christopher Baines

Ludovic Courtès  writes:

> Christopher Baines  skribis:
>
>> I think you're right, while I send some other changes in #68266, I think
>> it's this change around make-rust-sysroot that has pretty much all the
>> effects on performance.
>>
>> I think the tens of thousands of duplicated packages from cross-base
>> that I was looking at are almost entirely coming from
>> make-rust-sysroot. As Ludo mentions in [1], maybe this has something to
>> do with use of cross- procedures in native-inputs, although I'm not sure
>> that moving those calls out of native-inputs is a correct thing to do.
>>
>> I don't know what the correct approach here is, but I think something
>> needs doing here to address the performance regression.
>
> I probably missed it in the thread: what commit caused the regression,
> and how can I test any changes?  I’m willing to help but I missed some
> of the context.

It's not a pure performance regression, more that in it's current form,
rust cross derivations are very expensive to compute. It's been this way
since cross-compiling was enabled in [1].

1: 
https://git.savannah.gnu.org/cgit/guix.git/patch/?id=e604972d9c697302691aeb22e9c50c933a1a3c72

I've been looking at data service slowness in processing revisions over
the last few weeks, and I think it's mostly down to this. Looking at the
revision prior to the change [2], computing all the derivations took
around 3 hours, which is ages, but still quick compared to the nearly 9
hours it took after this change [3].

2: https://data.guix.gnu.org/revision/58bbb38c5bd2e42aab9e9408d8c9d8da3409f178
3: https://data.guix.gnu.org/revision/c9e1a72cc27925484635ae01bc4de28bf232689d

Obviously having more derivations is good and that usually means more
work for the data service, but in this case it seems like things can be
sped up quite a bit.

For testing locally, I've been computing all the derivations for
i586-pc-gnu, but Efraim also posted a concise command to look at
computing some cross derivations for a subset of rust packages [4].

4: https://lists.gnu.org/archive/html/guix-devel/2024-01/msg00053.html


signature.asc
Description: PGP signature


Re: Performance of computing cross derivations

2024-01-11 Thread Ludovic Courtès
Hi,

Christopher Baines  skribis:

> I think you're right, while I send some other changes in #68266, I think
> it's this change around make-rust-sysroot that has pretty much all the
> effects on performance.
>
> I think the tens of thousands of duplicated packages from cross-base
> that I was looking at are almost entirely coming from
> make-rust-sysroot. As Ludo mentions in [1], maybe this has something to
> do with use of cross- procedures in native-inputs, although I'm not sure
> that moving those calls out of native-inputs is a correct thing to do.
>
> I don't know what the correct approach here is, but I think something
> needs doing here to address the performance regression.

I probably missed it in the thread: what commit caused the regression,
and how can I test any changes?  I’m willing to help but I missed some
of the context.

Thanks,
Ludo’.



Re: Performance of computing cross derivations

2024-01-10 Thread Christopher Baines

Efraim Flashner  writes:

> [[PGP Signed Part:Signature made by expired key 41AAE7DCCA3D8351 Efraim 
> Flashner ]]
> On Fri, Jan 05, 2024 at 04:41:14PM +, Christopher Baines wrote:
>> 
>> Ludovic Courtès  writes:
>> 
>> > Hi,
>> >
>> > Christopher Baines  skribis:
>> >
>> >> When asked by the data service, it seems to take Guix around 3 minutes
>> >> to compute cross derivations for all packages (to a single
>> >> target). Here's a simple script that replicates this:
>> 
>> ...
>> 
>> > One idiom that defeats caching is:
>> >
>> >   (define (make-me-a-package x y z)
>> > (package
>> >   …))
>> >
>> > Such a procedure returns a fresh package every time it’s called,
>> > preventing caching from happening (because cache entries are compared
>> > with ‘eq?’).  That typically leads to lower hit rates.
>> >
>> > Anyway, lots of words to say that I don’t see anything immediately
>> > obvious with cross-compilation, yet I wouldn’t be surprised if some of
>> > these cache-defeating idioms were used because we’ve payed less
>> > attention to this.
>> 
>> I've got a feeling that performance has got worse since I looked at this
>> originally, I've finally got around to having a further look.
>> 
>> I spent some time looking at various metrics, but it was most useful to
>> just write the cache keys of various types to files and have a read.
>> 
>> The cross-base module was causing many issues, as all but one of the
>> procedures there produced new package records each time. There is also
>> make-rust-sysroot which showed up.
>> 
>> I've sent some patches as #68266 to add memoization to avoid this, and
>> that seems to speed things up.
>> 
>> Looking at other things in the cache, I think there are some issues with
>> file-append and local-file. The use of file-append in svn-fetch and
>> local-file in the lower procedure in the python build system both bloat
>> the cache for example, although I'm less sure about how to address these
>> cases.
>> 
>> One thing I am sure about though, is that these problems will come
>> back. Maybe we could add some reporting in to Guix to look through the
>> cache at the keys, lower them all and check for equivalence. That way it
>> should be possible to automate saying that having [1] in the cache
>> several thousand times is unhelpful. The data service could then run
>> this reporting and store it.
>> 
>> 1: #> gnu/packages/version-control.scm:2267 7f294d908840> "/bin/svn">
>
> I grabbed the patch for make-rust-sysroot to try it out:
> Native builds:
> time GUIX_PROFILING="object-cache" ./pre-inst-env guix build --no-grafts 
> $(./pre-inst-env ~/list-all-cargo-build-system-packages | grep rust- | head 
> -n 100) -d

...

> That's a massive drop in the size of the cache and a big decrease in the
> amount of time it took to calculate those 100 items.

I think you're right, while I send some other changes in #68266, I think
it's this change around make-rust-sysroot that has pretty much all the
effects on performance.

I think the tens of thousands of duplicated packages from cross-base
that I was looking at are almost entirely coming from
make-rust-sysroot. As Ludo mentions in [1], maybe this has something to
do with use of cross- procedures in native-inputs, although I'm not sure
that moving those calls out of native-inputs is a correct thing to do.

I don't know what the correct approach here is, but I think something
needs doing here to address the performance regression.

1: https://lists.gnu.org/archive/html/guix-patches/2024-01/msg00733.html


signature.asc
Description: PGP signature


Re: Performance of computing cross derivations

2024-01-08 Thread Efraim Flashner
On Fri, Jan 05, 2024 at 04:41:14PM +, Christopher Baines wrote:
> 
> Ludovic Courtès  writes:
> 
> > Hi,
> >
> > Christopher Baines  skribis:
> >
> >> When asked by the data service, it seems to take Guix around 3 minutes
> >> to compute cross derivations for all packages (to a single
> >> target). Here's a simple script that replicates this:
> 
> ...
> 
> > One idiom that defeats caching is:
> >
> >   (define (make-me-a-package x y z)
> > (package
> >   …))
> >
> > Such a procedure returns a fresh package every time it’s called,
> > preventing caching from happening (because cache entries are compared
> > with ‘eq?’).  That typically leads to lower hit rates.
> >
> > Anyway, lots of words to say that I don’t see anything immediately
> > obvious with cross-compilation, yet I wouldn’t be surprised if some of
> > these cache-defeating idioms were used because we’ve payed less
> > attention to this.
> 
> I've got a feeling that performance has got worse since I looked at this
> originally, I've finally got around to having a further look.
> 
> I spent some time looking at various metrics, but it was most useful to
> just write the cache keys of various types to files and have a read.
> 
> The cross-base module was causing many issues, as all but one of the
> procedures there produced new package records each time. There is also
> make-rust-sysroot which showed up.
> 
> I've sent some patches as #68266 to add memoization to avoid this, and
> that seems to speed things up.
> 
> Looking at other things in the cache, I think there are some issues with
> file-append and local-file. The use of file-append in svn-fetch and
> local-file in the lower procedure in the python build system both bloat
> the cache for example, although I'm less sure about how to address these
> cases.
> 
> One thing I am sure about though, is that these problems will come
> back. Maybe we could add some reporting in to Guix to look through the
> cache at the keys, lower them all and check for equivalence. That way it
> should be possible to automate saying that having [1] in the cache
> several thousand times is unhelpful. The data service could then run
> this reporting and store it.
> 
> 1: # gnu/packages/version-control.scm:2267 7f294d908840> "/bin/svn">

I grabbed the patch for make-rust-sysroot to try it out:
Native builds:
time GUIX_PROFILING="object-cache" ./pre-inst-env guix build --no-grafts 
$(./pre-inst-env ~/list-all-cargo-build-system-packages | grep rust- | head -n 
100) -d

Object Cache:
  fresh caches:21
  lookups:  133146
  hits: 130101 (97.7%)
  cache size:3044 entries

real0m7.539s
user0m10.239s
sys 0m0.327s

Before:
time GUIX_PROFILING="object-cache" ./pre-inst-env guix build --no-grafts 
$(./pre-inst-env ~/list-all-cargo-build-system-packages | grep rust- | head -n 
100) --target=aarch64-linux-gnu -d

Object Cache:
  fresh caches:20
  lookups:  221189
  hits: 211390 (95.6%)
  cache size:9798 entries

real0m18.215s
user0m14.492s
sys 0m0.469s

After:
time GUIX_PROFILING="object-cache" ./pre-inst-env guix build --no-grafts 
$(./pre-inst-env ~/list-all-cargo-build-system-packages | grep rust- | head -n 
100) --target=aarch64-linux-gnu -d

Object Cache:
  fresh caches:20
  lookups:  138654
  hits: 135291 (97.6%)
  cache size:3362 entries

real0m7.753s
user0m10.248s
sys 0m0.328s

That's a massive drop in the size of the cache and a big decrease in the
amount of time it took to calculate those 100 items.

-- 
Efraim Flashner  רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted


signature.asc
Description: PGP signature


Re: Performance of computing cross derivations

2024-01-05 Thread Christopher Baines

Ludovic Courtès  writes:

> Hi,
>
> Christopher Baines  skribis:
>
>> When asked by the data service, it seems to take Guix around 3 minutes
>> to compute cross derivations for all packages (to a single
>> target). Here's a simple script that replicates this:

...

> One idiom that defeats caching is:
>
>   (define (make-me-a-package x y z)
> (package
>   …))
>
> Such a procedure returns a fresh package every time it’s called,
> preventing caching from happening (because cache entries are compared
> with ‘eq?’).  That typically leads to lower hit rates.
>
> Anyway, lots of words to say that I don’t see anything immediately
> obvious with cross-compilation, yet I wouldn’t be surprised if some of
> these cache-defeating idioms were used because we’ve payed less
> attention to this.

I've got a feeling that performance has got worse since I looked at this
originally, I've finally got around to having a further look.

I spent some time looking at various metrics, but it was most useful to
just write the cache keys of various types to files and have a read.

The cross-base module was causing many issues, as all but one of the
procedures there produced new package records each time. There is also
make-rust-sysroot which showed up.

I've sent some patches as #68266 to add memoization to avoid this, and
that seems to speed things up.

Looking at other things in the cache, I think there are some issues with
file-append and local-file. The use of file-append in svn-fetch and
local-file in the lower procedure in the python build system both bloat
the cache for example, although I'm less sure about how to address these
cases.

One thing I am sure about though, is that these problems will come
back. Maybe we could add some reporting in to Guix to look through the
cache at the keys, lower them all and check for equivalence. That way it
should be possible to automate saying that having [1] in the cache
several thousand times is unhelpful. The data service could then run
this reporting and store it.

1: # "/bin/svn">


signature.asc
Description: PGP signature


Re: Performance of computing cross derivations

2023-11-16 Thread Ludovic Courtès
Hi,

Christopher Baines  skribis:

> When asked by the data service, it seems to take Guix around 3 minutes
> to compute cross derivations for all packages (to a single
> target). Here's a simple script that replicates this:

To understand the cost of computing a package’s derivation, I generally
start looking at caches and memoization:

--8<---cut here---start->8---
$ GUIX_PROFILING="object-cache" guix build gcc-toolchain -d --no-grafts
/gnu/store/iwn6frqqcyw808sgsnjv26dn6rq7mijd-gcc-toolchain-13.2.0.drv
Object Cache:
  fresh caches:19
  lookups:   3667
  hits:  3342 (91.1%)
  cache size: 323 entries
$ GUIX_PROFILING="object-cache" guix build sed -d --no-grafts 
--target=aarch64-linux-gnu
/gnu/store/yxakl87wizwzcqapx4sdkp56652cxb4m-sed-4.8.drv
Object Cache:
  fresh caches:20
  lookups:   5420
  hits:  4919 (90.8%)
  cache size: 500 entries
--8<---cut here---end--->8---

Caches are critical: since we’re dealing with huge package graphs, we
need to make sure we don’t end up computing the same thing several
times.  (You can also add “memoization” to the ‘GUIX_PROFILING’ variable
above.)

One idiom that defeats caching is:

  (define (make-me-a-package x y z)
(package
  …))

Such a procedure returns a fresh package every time it’s called,
preventing caching from happening (because cache entries are compared
with ‘eq?’).  That typically leads to lower hit rates.

Anyway, lots of words to say that I don’t see anything immediately
obvious with cross-compilation, yet I wouldn’t be surprised if some of
these cache-defeating idioms were used because we’ve payed less
attention to this.

An even better thing to start with: compare the timing of ‘guix build -d
--no-grafts $PKG --target=aarch64-linux-gnu’ for all valid values of
$PKG, and investigate those that take the most time.

HTH!

Ludo’.



Performance of computing cross derivations

2023-10-30 Thread Christopher Baines
Hey!

When asked by the data service, it seems to take Guix around 3 minutes
to compute cross derivations for all packages (to a single
target). Here's a simple script that replicates this:

  (use-modules (srfi srfi-34)
   (gnu packages)
   (guix grafts)
   (guix packages)
   (guix store)
   (statprof))
  
  (define (all-cross system target)
(with-store store
  (%graft? #f)
  (fold-packages
   (lambda (package result)
 (with-exception-handler
 (lambda (exn)
   (unless (package-cross-build-system-error? exn)
 (peek exn))
   result)
   (lambda ()
 (package-cross-derivation store
   package
   target
   system)
 (+ 1 result))
   #:unwind? #t))
   0)))
  
  (statprof
   (lambda ()
 (peek "COUNT"
   (all-cross "x86_64-linux"
  "i586-pc-gnu")))
   #:count-calls? #t)


Here's some relevant output:

  % cumulative   self 
  time   secondsseconds   calls   procedure
   50.48126.68102.40
ice-9/vlist.scm:502:0:vhash-foldq*
   11.49 23.31 23.31hashq
5.16 10.52 10.47write
2.79 14.28  5.65
ice-9/vlist.scm:494:0:vhash-fold*
2.28  4.63  4.63equal?
2.14  4.35  4.35hash
1.85  4.67  3.75
guix/packages.scm:1874:0:input=?
1.78  3.68  3.61put-string
1.77  7.16  3.59
guix/derivations.scm:736:0:derivation/masked-inputs
0.93  1.90  1.90get-bytevector-n
0.78  1.58  1.58put-char
0.67  1.36  1.36search-path

  ...
  
  Total time: 202.872232073 seconds (30.927648399 seconds in GC)


Over 3 minutes seems like a long time for this, especially since it only
computes around 1 derivations.

I don't know how to use statprof, but looking at vhash-foldq* being at
the top of the output, is this suggesting that around a third of the CPU
time is being spent looking for things in various caches?

I had a go at using the Guix profiling stuff and I did get some output,
but I couldn't figure out how to get it to show all the caching going
on.

Any ideas?

Chris


signature.asc
Description: PGP signature