bug#65720: Guile-Git-managed checkouts grow way too much

2023-11-23 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs 
> ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G
> /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M.git

Fixed by b150c546b04c9ebb09de9f2c39789221054f5eea.

We still need to update the ‘guix’ package so that tools that rely on
(guix git) such as the Data Service, hpcguix-web, and Cuirass, can
benefit from this change.

Ludo’.





bug#65720: [bug#66650] bug#65720: Guile-Git-managed checkouts grow way too much

2023-11-22 Thread Ludovic Courtès
Hi,

Simon Tournier  skribis:

> Somehow I was expressing: my view probably falls into the “Premature
> optimization is the root of all evil” category.  Other said, I have no
> objection and I will revisit the issue when I will be on fire, if I am,
> or annoyed for real.

Alright!

Pushed as b150c546b04c9ebb09de9f2c39789221054f5eea.

Let’s see how it behaves and if there are problems we had overlooked…

Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-11-22 Thread Simon Tournier
Hi Ludo,

Thanks for explaining.

On Wed, 22 Nov 2023 at 12:17, Ludovic Courtès  wrote:

>   it’s rarely going to fire.

[...]

>> Let move it elsewhere if I am really annoyed.
>
> :-/

Sorry, I poorly worded my last comment. :-)

Somehow I was expressing: my view probably falls into the “Premature
optimization is the root of all evil” category.  Other said, I have no
objection and I will revisit the issue when I will be on fire, if I am,
or annoyed for real.

Cheers,
simon

PS:

Aside this patch:

>> So, somehow when 'maybe-run-git-gc' is called appears to me
>> "unpredictable".  But anyway. :-)
>
> Sure, but the way I see it, that’s the nature of caches.

What makes cache unpredictable is their current state.  However, this
does not imply that *all* the actions modifying from one state to
another must also be triggered in unpredictable moment.

For instance, I choose when I wash family’s clothes and the wash-machine
does not start by itself when the unpredictable stack of family’s dirty
clothes is enough.  Because, maybe today it’s rainy so drying is
difficult and tomorrow will be sunny so it will be a better moment. :-)

For me, “guix gc” should be the driver for cleaning all the various Guix
caches.  Anyway. :-D





bug#65720: Guile-Git-managed checkouts grow way too much

2023-10-24 Thread Simon Tournier
Hi,

On Mon, 23 Oct 2023 at 22:27, Tobias Geerinckx-Rice  wrote:

>>Why not trigger it by “guix gc”?
>
> Unless there's a new option I missed, guix gc doesn't handle this.

Maybe I missed something but “guix gc” handles what we implement, no? :-)

Well, I run “guix gc” when I need some space.  And this
“maybe-run-git-gc” does exactly that: collect some spaces when I need
them.

For me, they are part of “guix gc” and not part of some update.


Aside, re-thinking about other features, I am consistent with other
comments I made when introducing ’maybe-remove-expired-cache-entries’;
see .  And consistent because most
probably I still think the same: cache cleanup should be handled by
“guix gc” and not by the commands themselves.  And maybe we are having
the same discussion. ;-)


>>Well, I expect “guix gc” to take some time and I choose when.  However,
>>I want “guix pull” or “guix time-machine” to be as fast as possible
>
> I don't think that things should be pushed into guix gc merely because
> they are slow.

Maybe I misread, somehow it appears to me that you miss the key part: I
choose when some extra work is done and I keep “guix pull” and “guix
time-machine” as fast as possible.


Cheers,
simon





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-19 Thread Simon Tournier
Hi Ludo.

On Tue, 19 Sep 2023 at 00:35, Ludovic Courtès  wrote:

> --8<---cut here---start->8---
> scheme@(guile-user)> ,use(git)
> scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git; 
> "/tmp/guix")
> $5 = #
> ;; 600.534529s real time, 435.260926s run time.  0.00s spent in GC.
> scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git; 
> "/tmp/guix-after-removing-nix-branch")
> $6 = #
> ;; 420.321511s real time, 398.772963s run time.  0.00s spent in GC.
> --8<---cut here---end--->8---

[...]

> --8<---cut here---start->8---
> $ du -hs /tmp/guix/.git
> 373M  /tmp/guix/.git
> $ du -hs /tmp/guix-after-removing-nix-branch/.git
> 362M  /tmp/guix-after-removing-nix-branch/.git
> --8<---cut here---end--->8---

Just to also point [1] that using shallow clone and restrict to the
oldest reachable commit by the time-machine, it saves 25% of bits to
download, and similarly on disk.

--8<---cut here---start->8---
scheme@(guix-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git; 
"/tmp/guix-guile")
$1 = #
;; 383.186818s real time, 278.060733s run time.  0.00s spent in GC.

$ time git clone https://git.savannah.gnu.org/git/guix.git guix-full
Receiving objects: 100% (693699/693699), 342.14 MiB | 2.87 MiB/s, done.
real2m40,830s
user3m4,683s
sys 0m8,189s

$ time git clone --shallow-since=2019-04-30 
https://git.savannah.gnu.org/git/guix.git guix-oldest
Receiving objects: 100% (428646/428646), 259.41 MiB | 3.87 MiB/s, done.
real1m45,604s
user2m32,370s
sys 0m5,916s

$ du -sh guix-*/.git
362Mguix-full/.git
362Mguix-guile/.git
272Mguix-oldest/.git
--8<---cut here---end--->8---

Cheers,
simon


1: Re: hard dependency on Git? (was bug#65866: [PATCH 0/8] Add built-in builder 
for Git checkouts)
Simon Tournier 
Mon, 11 Sep 2023 19:52:34 +0200
id:871qf4ha1p@gmail.com
https://lists.gnu.org/archive/html/guix-devel/2023-09
https://yhetil.org/guix/871qf4ha1p@gmail.com






bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-18 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs 
> ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G
> /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M.git

More data…  The biggest file in that repo is a pack that was created
when that repo was first cloned (Aug. 2021):

--8<---cut here---start->8---
$ du 
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/*
 |sort -k1 -n| tail -3
44272   
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-3c2f1857501b01c321bc67ba1f30704deb9e18e9.pack
47272   
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-30d5b35ad14a8398464e49e224811b162f673d66.pack
191492  
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
$ ls -l 
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
-r--r--r-- 1 ludo users 196079671 Aug  9  2021 
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
$ ls -ld 
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
-rw-r--r-- 1 ludo users 266 Aug  9  2021 
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
--8<---cut here---end--->8---

The pack starts with things from Aug. 2021:

--8<---cut here---start->8---
$ git show-index < pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.idx|sort -k1 
-n|head -3
12 30289f4d4638452520f52c1a36240220d0d940ff (852d8cb3)
927 d7ffc535c52f49177a8e5553569cdb1e321b5bc6 (2007c5d0)
1800 0a379de3249d5e9ff66fb404f7e5aa8ce2cb3d24 (b1e69aa4)
$ git show 30289f4d4638452520f52c1a36240220d0d940ff
commit 30289f4d4638452520f52c1a36240220d0d940ff
Author: Milkey Mouse 
Date:   Sun Aug 8 22:15:40 2021 -0700

[…]
--8<---cut here---end--->8---

… and at the bottom (large offsets) it contains very old blogs from the
Nix repo that somehow made it here.

I figured we still had a ‘nix’ branch from the early days, that contains
the history of Nix.  I’ve now removed it, which helps a bit:

--8<---cut here---start->8---
scheme@(guile-user)> ,use(git)
scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git; 
"/tmp/guix")
$5 = #
;; 600.534529s real time, 435.260926s run time.  0.00s spent in GC.
scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git; 
"/tmp/guix-after-removing-nix-branch")
$6 = #
;; 420.321511s real time, 398.772963s run time.  0.00s spent in GC.
--8<---cut here---end--->8---

… and more importantly:

--8<---cut here---start->8---
$ du -hs /tmp/guix/.git
373M/tmp/guix/.git
$ du -hs /tmp/guix-after-removing-nix-branch/.git
362M/tmp/guix-after-removing-nix-branch/.git
--8<---cut here---end--->8---

Anyway, what seems to happen is that every pull (every call to
‘remote-fetch’) creates a new pack (see ‘git_fetch_download_pack’ in
libgit2), which becomes inefficient in the long run (lots of small
poorly-compressed packs).  That’s at least one possible explanation.

To be continued…

Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-13 Thread Simon Tournier
Hi Ludo,

On Wed, 13 Sep 2023 at 20:10, Ludovic Courtès  wrote:

> ‘get-internal-run-time’ returns “units of processor time” used by the
> current process (info "(guile) Time").  When shelling out, the process
> calls waitpid(2) and does nothing, so naturally its processor time is
> close to zero.
>
> ‘get-internal-real-time’ should give something closer to elapsed time.

Well, let avoid to mix unrelated discussion. :-)  For discussing that
specific part, I reported on guix-devel my timing using ,time.

comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git
Simon Tournier 
Tue, 12 Sep 2023 00:48:30 +0200
id:865y4gz5q9@gmail.com
https://lists.gnu.org/archive/html/guix-devel/2023-09
https://yhetil.org/guix/865y4gz5q9@gmail.com

The result is still significantly less and discussion is welcome
overthere. :-)

Cheers,
simon





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-13 Thread Ludovic Courtès
Hi,

wolf  skribis:

> (define (time proc)
>   (let* ((start (get-internal-run-time))
>  (_ (proc))
>  (end   (get-internal-run-time)))
> (exact->inexact (* 1000 (/ (- end start) 
> internal-time-units-per-second)
>
> (format #t "Guix: ~ams\nGit:  ~ams\n"
> (time (λ () (commit-relation c1 c2)))
> (time (λ () (shelling-commit-relation c1 c2

‘get-internal-run-time’ returns “units of processor time” used by the
current process (info "(guile) Time").  When shelling out, the process
calls waitpid(2) and does nothing, so naturally its processor time is
close to zero.

‘get-internal-real-time’ should give something closer to elapsed time.

Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-11 Thread wolf
On 2023-09-08 19:08:05 +0200, Ludovic Courtès wrote:
> Hello!
> 
> Josselin Poiret  skribis:
> 
> > Right, although I wouldn't necessarily say that the former doesn't have
> > a proper API, but rather that it has a Unix-oriented API.  That leads to
> > performance issues on e.g. Windows but on Linux I'm not sure there's
> > much of a difference.
> 
> [...]
> 
> > We could consider replacing the guile-git dependency with another
> > library built directly on top of git-minimal, and have this be a
> > dependency of Guix.  Not ideal though, and not really scalable either:
> > we can't just add every VCS as direct dependencies.
> 
> I cannot imagine a viable implementation of things like ‘commit-closure’
> and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.

I am sure I must be missing some part of the contract of the function, but at
least the commit-relation seems fairly straightforward:

(define (shelling-commit-relation old new)
  (let ((h-old (oid->string (commit-id old)))
(h-new (oid->string (commit-id new
(cond ((eq? old new)
   'self)
  ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new))
   'ancestor)
  ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old))
   'descendant)
  (else
   'unrelated

I would argue it is even somewhat more readable than the current implementation.

> I’m quite confident this would be slow

My version is ~2000x faster compared to (guix git):

Guix: 1048.620992ms
Git:  0.532143ms

Again, I am sure I must have miss something, either in the implementation or in
the measurements, because it is pretty hard to believe there is so much room for
improvement.

The full script I used is attached to this email.

> and brittle.

In general git plumbing command are design to have stable CLI interface in order
to be usable in scripting.  So I am not sure where the brittleness would come
from.

> 
> It looks like there’s no option other than carrying the two
> implementations.

Assuming I made no mistake (hard to believe), it is probably worth exploring the
feasibility of just shelling out to the git binary some more.

> 
> ~~~
> 
> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme.  That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.
> 
> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?
> 
> Thanks,
> Ludo’.
>

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
#!/bin/sh
# -*-scheme-*-
exec guile -s "$0" "$@"
!#

(use-modules (git)
 (guix git))

(define %repo "/tmp/guix-fork")

(define h1 "72745172d155e489936f694d6b9013cb76272370")
(define h2 "6d60d7ccba5a8e06c17d55a1772fa7f4529b5eff")
(define h3 "c3db650680f995f0556d3ddce567cdc1c33e4603")

;;; r has to still be defined when the commit-relation is called.  There is *no*
;;; error, but it always returns 'unrelated.  Quite a footgun.
(define r (repository-open %repo))
(define c1 (commit-lookup r (string->oid h1)))
(define c2 (commit-lookup r (string->oid h2)))
(define c3 (commit-lookup r (string->oid h3)))

(define (git-C dir . args)
  (apply system* "git" "-C" dir args))

(define (shelling-commit-relation old new)
  (let ((h-old (oid->string (commit-id old)))
(h-new (oid->string (commit-id new
(cond ((eq? old new)
   'self)
  ;; In real code, git-C should probably return #t (for 0), #f (for 1)
  ;; or raise (for anything else).
  ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new))
   'ancestor)
  ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old))
   'descendant)
  (else
   'unrelated

;;; Make sure it actually works.
(let ((tests `((,c1 . ,c1)
   (,c1 . ,c2)
   (,c2 . ,c1)
   (,c1 . ,c3
  (for-each (λ (c)
  (format #t "Guix: ~a\nGit:  ~a\n\n"
  (commit-relation (car c) (cdr c))
  (shelling-commit-relation (car c) (cdr c
tests))

(define (time proc)
  (let* ((start (get-internal-run-time))
 (_ (proc))
 (end   (get-internal-run-time)))
(exact->inexact (* 1000 (/ (- end start) internal-time-units-per-second)

(format #t "Guix: ~ams\nGit:  ~ams\n"
(time (λ () (commit-relation c1 c2)))
(time (λ () (shelling-commit-relation c1 c2


signature.asc
Description: PGP signature


bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-11 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> It would also be pretty bad for closure size:
>
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
>
> It’s also not clear concretely how we’d add that dependency.  Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?

A solution to this particular problem is coming:

  https://issues.guix.gnu.org/65866

Ludo’.





bug#65720: Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much)

2023-09-11 Thread Simon Tournier
Hi Ludo,

On Fri, 08 Sep 2023 at 19:08, Ludovic Courtès  wrote:

> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme.  That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.

Well, that is a piece of work. :-)

Maybe there is an hope with: git-std-lib.

Subject: Proposal/Discussion: Turning parts of Git into libraries
From: Emily Shaffer 
To: Git List 
Date: Fri, 17 Feb 2023 13:12:23 -0800   

https://lore.kernel.org/git/CAJoAoZ=Cig_kLocxKGax31sU7Xe4==bgzc__bg2_pr7krnq...@mail.gmail.com/

And some patches are starting to float around.
https://public-inbox.org/git/20230810163346.274132-1-calvin...@google.com/


> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?

It depends on what means “viable”. :-)

https://github.com/mirage/ocaml-git
https://hackage.haskell.org/package/git

Irmin [1] is an OCaml library for building mergeable, branchable
distributed data stores – A Distributed Database Built on the Same
Principles as Git.  And irmin relies on ocaml-git.

1: https://github.com/mirage/irmin

Then there is a pure Go implementation and another using Java.

https://git-scm.com/book/en/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-go-git
https://git-scm.com/book/en/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-JGit

I do not know all that are “viable”.  Well, I do not know if ’git gc’ is
implemented.  And I do not know which plumbing is implemented and which
porcelain is available.

Last, SWH uses dulwich [2] which is a pure Python implementation of Git.

2: https://www.dulwich.io/

To my knowledge, there is no “dulwich gc” but they implement “dulwich
fsck” and “dulwich repack”.

Back on 10 Years of Guix or at UNESCO on February – I do not remember
exactly when – we were discussing about implementation of Git.  And we
mentioned an implementation in Rust.  Maybe this one:

https://github.com/Byron/gitoxide

Cheers,
simon






bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-11 Thread Csepp


Simon Tournier  writes:

> Hi,
>
> On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès  wrote:
>
 It would also be pretty bad for closure size:

 --8<---cut here---start->8---
 $ guix size guile-git | tail -1
 total: 106.6 MiB
 $ guix size guile-git git-minimal | tail -1
 total: 169.8 MiB
 --8<---cut here---end--->8---

 It’s also not clear concretely how we’d add that dependency.  Try
 invoking ‘git’ from $PATH and print a warning if it doesn’t work?
 But then, what about applications like Cuirass and hpcguix-web?
>>>
>>> I think we can rely on something like,
>>>
>>> guix shell -C git-minimal -- git gc
>>
>> We’re talking about the implementation of a cache (meant to speed up
>> operations), that would actually fill said cache plus do a whole bunch
>> of expensive operations?  Nah.  :-)
>
> I do not think.  If I understand correctly, we need to run “git gc” at
> some point, therefore git-minimal needs to me around.  The question is
> how and when.
>
> Well, maybe I am missing what the bug is about.  For me, it is about
> running ‘git gc’ for cleaning the Git checkout cache, no?
>
>
> Solution #1.  Add git-minimal as inputs.  It increases the closure and
> the extra load (on average) is about the ratio between the rate of “guix
> pull” and the rate of the git-minimal changes.
>
> Assuming, that people are running “guix pull” once per week and say “git
> gc” is run after 50 pulls.  (These both number are totally arbitrary and
> based on my personal estimate).
>
> Data Service [1] tells:
>
> 2023-07-07 15:45:22 2023-09-08 21:22:08
> 2023-05-11 16:10:48 2023-07-07 14:21:45
> 2023-05-01 16:40:08 2023-05-11 14:36:16
> 2023-04-25 13:34:54 2023-05-01 15:19:55
> 2023-04-25 13:34:54 2023-09-08 21:22:08
> 2023-03-06 17:22:28 2023-04-25 12:27:33
> 2023-01-17 23:49:19 2023-03-06 16:48:43
> 2022-11-08 13:06:42 2023-01-17 15:11:47
> 2022-10-08 05:14:46 2022-11-08 09:56:31
> 2022-09-06 15:00:08 2022-10-08 04:15:43
> 2022-08-13 22:02:31 2022-09-06 12:58:52
> …
>
> It means that an user will download ~10 times git-minimal for nothing.
>
>
> Solution #2.  The one I am proposing. :-)  Download git-minimal only
> when Guix needs it for running “git gc”.  Yeah, there is probably a
> small overload with some operations.  But, I bet this overload is much
> smaller than the one of solution #1.
>
> Well, it depends on the number of times people are updating the cache vs
> the rate of change of git-minimal.
>
> For sure, if one updates 100 times per week the cache, having
> git-minimal as inputs is far better.  But I do not think that the
> regular usage on average. :-)
>
> That’s why I am proposing to have an option for turning off this “git
> gc“ operation.
>
> Well, we have lived since years without running ‘git gc’ so running it
> once per year on average is probably enough to keep the cache size
> reasonable.  And git-minimal is changing every month.
>
>
> Maybe, there is some solution #3. ;-)
>
> Cheers,
> simon
>
>
> 1: 
> https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history

Please don't create another situation like with guix system roll-back,
where a crucial sysadmin operation doesn't work without network access.
Or at least make it configurable, so things that are likely to be needed
for future operations are pre-fetched.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-11 Thread Csepp


Ludovic Courtès  writes:

> Hello!
>
> Josselin Poiret  skribis:
>
>> Right, although I wouldn't necessarily say that the former doesn't have
>> a proper API, but rather that it has a Unix-oriented API.  That leads to
>> performance issues on e.g. Windows but on Linux I'm not sure there's
>> much of a difference.
>
> [...]
>
>> We could consider replacing the guile-git dependency with another
>> library built directly on top of git-minimal, and have this be a
>> dependency of Guix.  Not ideal though, and not really scalable either:
>> we can't just add every VCS as direct dependencies.
>
> I cannot imagine a viable implementation of things like ‘commit-closure’
> and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.
> I’m quite confident this would be slow and brittle.
>
> It looks like there’s no option other than carrying the two
> implementations.
>
> ~~~
>
> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme.  That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.
>
> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?
>
> Thanks,
> Ludo’.

For sake of completeness:
There is an alternative implentation in C for Plan 9 that I've used and
is now mature enough that the 9front project switched to it from
Mercurial.
It might be possible to compile it with the plan9port compiler wrapper.

There is also a Git implementation in OCaml that some MirageOS
unikernels use to serve static content from a git repository.
Also the Irmin "database" is based on git and is written in OCaml.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-09 Thread Simon Tournier
Hi,

On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès  wrote:

>>> It would also be pretty bad for closure size:
>>>
>>> --8<---cut here---start->8---
>>> $ guix size guile-git | tail -1
>>> total: 106.6 MiB
>>> $ guix size guile-git git-minimal | tail -1
>>> total: 169.8 MiB
>>> --8<---cut here---end--->8---
>>>
>>> It’s also not clear concretely how we’d add that dependency.  Try
>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>>> But then, what about applications like Cuirass and hpcguix-web?
>>
>> I think we can rely on something like,
>>
>> guix shell -C git-minimal -- git gc
>
> We’re talking about the implementation of a cache (meant to speed up
> operations), that would actually fill said cache plus do a whole bunch
> of expensive operations?  Nah.  :-)

I do not think.  If I understand correctly, we need to run “git gc” at
some point, therefore git-minimal needs to me around.  The question is
how and when.

Well, maybe I am missing what the bug is about.  For me, it is about
running ‘git gc’ for cleaning the Git checkout cache, no?


Solution #1.  Add git-minimal as inputs.  It increases the closure and
the extra load (on average) is about the ratio between the rate of “guix
pull” and the rate of the git-minimal changes.

Assuming, that people are running “guix pull” once per week and say “git
gc” is run after 50 pulls.  (These both number are totally arbitrary and
based on my personal estimate).

Data Service [1] tells:

2023-07-07 15:45:22 2023-09-08 21:22:08
2023-05-11 16:10:48 2023-07-07 14:21:45
2023-05-01 16:40:08 2023-05-11 14:36:16
2023-04-25 13:34:54 2023-05-01 15:19:55
2023-04-25 13:34:54 2023-09-08 21:22:08
2023-03-06 17:22:28 2023-04-25 12:27:33
2023-01-17 23:49:19 2023-03-06 16:48:43
2022-11-08 13:06:42 2023-01-17 15:11:47
2022-10-08 05:14:46 2022-11-08 09:56:31
2022-09-06 15:00:08 2022-10-08 04:15:43
2022-08-13 22:02:31 2022-09-06 12:58:52
…

It means that an user will download ~10 times git-minimal for nothing.


Solution #2.  The one I am proposing. :-)  Download git-minimal only
when Guix needs it for running “git gc”.  Yeah, there is probably a
small overload with some operations.  But, I bet this overload is much
smaller than the one of solution #1.

Well, it depends on the number of times people are updating the cache vs
the rate of change of git-minimal.

For sure, if one updates 100 times per week the cache, having
git-minimal as inputs is far better.  But I do not think that the
regular usage on average. :-)

That’s why I am proposing to have an option for turning off this “git
gc“ operation.

Well, we have lived since years without running ‘git gc’ so running it
once per year on average is probably enough to keep the cache size
reasonable.  And git-minimal is changing every month.


Maybe, there is some solution #3. ;-)

Cheers,
simon


1: 
https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-08 Thread Ludovic Courtès
Hi!

Simon Tournier  skribis:

> On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès  wrote:
>
>> It would also be pretty bad for closure size:
>>
>> --8<---cut here---start->8---
>> $ guix size guile-git | tail -1
>> total: 106.6 MiB
>> $ guix size guile-git git-minimal | tail -1
>> total: 169.8 MiB
>> --8<---cut here---end--->8---
>>
>> It’s also not clear concretely how we’d add that dependency.  Try
>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>> But then, what about applications like Cuirass and hpcguix-web?
>
> I think we can rely on something like,
>
> guix shell -C git-minimal -- git gc

We’re talking about the implementation of a cache (meant to speed up
operations), that would actually fill said cache plus do a whole bunch
of expensive operations?  Nah.  :-)

Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-08 Thread Ludovic Courtès
Hello!

Josselin Poiret  skribis:

> Right, although I wouldn't necessarily say that the former doesn't have
> a proper API, but rather that it has a Unix-oriented API.  That leads to
> performance issues on e.g. Windows but on Linux I'm not sure there's
> much of a difference.

[...]

> We could consider replacing the guile-git dependency with another
> library built directly on top of git-minimal, and have this be a
> dependency of Guix.  Not ideal though, and not really scalable either:
> we can't just add every VCS as direct dependencies.

I cannot imagine a viable implementation of things like ‘commit-closure’
and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.
I’m quite confident this would be slow and brittle.

It looks like there’s no option other than carrying the two
implementations.

~~~

Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
in pure Scheme.  That was on April 1st though, so people mistakenly
assumed it was a joke and the project was never carried out.

I digress, but I wonder: is there not even a viable Haskell or OCaml
implementation of Git?

Thanks,
Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-06 Thread Simon Tournier
Hi,

On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès  wrote:

> It would also be pretty bad for closure size:
>
> --8<---cut here---start->8---
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
> --8<---cut here---end--->8---
>
> It’s also not clear concretely how we’d add that dependency.  Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
> But then, what about applications like Cuirass and hpcguix-web?

I think we can rely on something like,

guix shell -C git-minimal -- git gc

It would be invoked internally using the Scheme API for inferiors and
friends.  Doing so, it would add nothing to the closure size.

It appears to me safe to assume that this command can be run from any
Guix installation.  Since the Git GC would only be done once every X Git
fetches, the overhead would be much lower.

Hum, am I repeating myself [1]? :-)

And I would run this “git gc” via “guix gc”, not via “guix pull”.  Well,
I do not like all these automatic removals happening based on date
(last-expiry-cleanup) with some usual commands.  It always happens when
I do not want. ;-) Contrary to “guix gc”.  Bah, another story. :-)

Cheers,
simon


1: bug#65720: Guile-Git-managed checkouts grow way too much
Simon Tournier 
Tue, 05 Sep 2023 20:59:07 +0200
id:86edjcqwec@gmail.com
https://issues.guix.gnu.org//65720
https://issues.guix.gnu.org/msgid/86edjcqwec@gmail.com
https://yhetil.org/guix/86edjcqwec@gmail.com







bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-06 Thread Jelle Licht


Hi Ludo,

> 
> On 4 Sep 2023, at 23:49, Ludovic Courtès  wrote:
> 
> Of course having to re-clone entire repositories every 9 months is
> ridiculous, but storing gigabytes of packs is worse IMO (I’m
> specifically thinking about the Guix repo, which every users copies via
> ‘guix pull’).

Please ignore if it doesn’t make sense, or would not make a practical 
difference for the current issue, but wouldn’t a local clone do the trick here? 
As in, clone from the ‘clogged’ local repo, move over fresh clone to old 
location.

Kr, Jelle





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-06 Thread Josselin Poiret via Bug reports for GNU Guix
Hi Ludo,

Ludovic Courtès  writes:

> Surely you’d agree that it would suck though: depending on two Git
> implementations because one doesn’t have a proper API and the other one
> lacks a bunch of features.

Right, although I wouldn't necessarily say that the former doesn't have
a proper API, but rather that it has a Unix-oriented API.  That leads to
performance issues on e.g. Windows but on Linux I'm not sure there's
much of a difference.

> It would also be pretty bad for closure size:
>
> --8<---cut here---start->8---
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
> --8<---cut here---end--->8---
>
> It’s also not clear concretely how we’d add that dependency.  Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
> But then, what about applications like Cuirass and hpcguix-web?
>
> Tricky, tricky.

We could consider replacing the guile-git dependency with another
library built directly on top of git-minimal, and have this be a
dependency of Guix.  Not ideal though, and not really scalable either:
we can't just add every VCS as direct dependencies.

From what I've seen, people are now scaling back on their use of
libgit2 because of the impedence mismatch and are resorting more and
more to git plumbing.  From a pragmatic point of view, I'd prefer the
latter, since it is more stable and feature-complete.

Best,
-- 
Josselin Poiret


signature.asc
Description: PGP signature


bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-05 Thread Simon Tournier
Hi,

On Mon, 04 Sep 2023 at 23:47, Ludovic Courtès  wrote:

>> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.
>
> Confirmed: .

Ouch!

The goals of the project haven't changed, and neither have the
tradeoffs. If one were to rewrite git-gc on top of libgit2, the
best-case scenario is ending up with what we already had.

If you want to use regular maintenance on some repostories, use
git gc, that's what it's there for.

https://github.com/libgit2/libgit2/issues/3247#issuecomment-152508040

> My inclination for the short term would be to work around this
> limitation by (1) finding a heuristic to determine is a checkout has
> likely accumulated too much cruft, and (2) considering such checkouts
> as expired (thereby forcing a re-clone) or running ‘git gc’ on them if
> ‘git’ is available.

About (1) maybe we could add a “counter” and teach after X updates of
the checkout then let run (2).  Well, I guess the number of crufts is
more or less proportional with the number of checkout updates; that’s
the heuristic I would use.

The most annoying is (2).  Because forcing a re-clone does not appear to
me a solution; I prefer to waste disk space (and probably run myself and
manually ‘git gc’) than re-clone… Somehow this re-clone would always
happen when I am using a poor network.

Moreover, assuming this clean-up (2) would be run once every while, we
could imagine to invoke something like,

guix shell -C git-minimal
 -- git
 -C 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
 gc

when the checkout is updated.  And maybe we could provide another “guix
pull” command-line option for turning off this and mark it as done
(reset the “counter”).

Well, that’s a poor solution but we can assume that git-minimal is at
worse available using “guix shell git-minimal”.  Note that the closure
of git-minimal is far less than re-cloning the full Guix repository.

Cheers,
simon





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-05 Thread Ludovic Courtès
Hello,

Jelle Licht  skribis:

>> On 4 Sep 2023, at 23:49, Ludovic Courtès  wrote:
>> 
>> Of course having to re-clone entire repositories every 9 months is
>> ridiculous, but storing gigabytes of packs is worse IMO (I’m
>> specifically thinking about the Guix repo, which every users copies via
>> ‘guix pull’).
>
> Please ignore if it doesn’t make sense, or would not make a practical 
> difference for the current issue, but wouldn’t a local clone do the trick 
> here? As in, clone from the ‘clogged’ local repo, move over fresh clone to 
> old location.

Good question.

--8<---cut here---start->8---
scheme@(guix git)> ,use(git)
scheme@(guix git)> (clone 
"/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/"
 "/tmp/fresh-clone")
$7 = #
scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone")
6.7G/tmp/fresh-clone
$8 = 0
scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone/.git")
6.6G/tmp/fresh-clone/.git
$9 = 0
scheme@(guix git)> (system* "du" "-hs" 
"/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")
6.7G
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/
$10 = 0
--8<---cut here---end--->8---

Conclusion: it makes no difference.

Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-05 Thread Ludovic Courtès
Hi,

Josselin Poiret  skribis:

> I think using the git binary instead of libgit2 as a workaround is a
> good idea.  We can consider building it directly as well, so that people
> who don't have it in their profiles can still benefit from it.  We could
> even consider using git commands in most places and using libgit2 only
> where we really need the tight coupling.

Surely you’d agree that it would suck though: depending on two Git
implementations because one doesn’t have a proper API and the other one
lacks a bunch of features.

It would also be pretty bad for closure size:

--8<---cut here---start->8---
$ guix size guile-git | tail -1
total: 106.6 MiB
$ guix size guile-git git-minimal | tail -1
total: 169.8 MiB
--8<---cut here---end--->8---

It’s also not clear concretely how we’d add that dependency.  Try
invoking ‘git’ from $PATH and print a warning if it doesn’t work?
But then, what about applications like Cuirass and hpcguix-web?

Tricky, tricky.

Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-05 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> $ du -hs 
> ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G
> /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq

Another data point, with Cuirass instances:

--8<---cut here---start->8---
ludo@berlin ~$ sudo du -hs 
/var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
65G 
/var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
ludo@berlin ~$ sudo stat 
/var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
 | tail -1
 Birth: 2022-07-30 23:15:45.582559879 +0200
--8<---cut here---end--->8---

… and:

--8<---cut here---start->8---
ludo@guix-hpc4 ~$ sudo du -hs 
/var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
86G 
/var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
ludo@guix-hpc4 ~$ sudo stat 
/var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
 | tail -1
  Créé : 2021-06-01 11:48:48.854669310 +0200
--8<---cut here---end--->8---

So yeah, problem we have.

Ludo’.





bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-05 Thread Josselin Poiret via Bug reports for GNU Guix
Hi Ludo,

Ludovic Courtès  writes:

> My inclination for the short term would be to work around this
> limitation by (1) finding a heuristic to determine is a checkout has
> likely accumulated too much cruft, and (2) considering such checkouts as
> expired (thereby forcing a re-clone) or running ‘git gc’ on them if
> ‘git’ is available.

I think using the git binary instead of libgit2 as a workaround is a
good idea.  We can consider building it directly as well, so that people
who don't have it in their profiles can still benefit from it.  We could
even consider using git commands in most places and using libgit2 only
where we really need the tight coupling.  IIUC, libgit2 is eternally
trying to catch up to git and often performs in a counter-intuitive way
(I expect the various bugs with stale deleted files in checkouts to be
caused by this).  Maybe it could also let us use bare repository and
directly extract the refs we want without having to mess with checkouts?

Best,
-- 
Josselin Poiret


signature.asc
Description: PGP signature


bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-04 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason.  As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs 
> ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G
> /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M.git

Unsurprisingly, GC makes a big difference:

--8<---cut here---start->8---
$ cp -r 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq 
/tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M/tmp/checkout
--8<---cut here---end--->8---

> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.

Confirmed: .

My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running ‘git gc’ on them if
‘git’ is available.

I can’t think of a good heuristic for (1).  Birth time could be one, but
we’d need statx(2):

--8<---cut here---start->8---
$ stat 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | 
tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
 Birth: 2021-08-09 10:48:17.748722151 +0200
--8<---cut here---end--->8---

Lacking statx(2), we can approximate creation time by looking at
‘.git/config’:

--8<---cut here---start->8---
$ stat 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
 | tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
 Birth: 2021-08-09 10:50:28.031760953 +0200
--8<---cut here---end--->8---

This strategy can be implemented like this:

diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
 
   ;; Use the mtime rather than the atime to cope with file systems mounted
   ;; with 'noatime'.
-  (file-expiration-time (* 90 24 3600) stat:mtime))
+  (let ((ttl (* 90 24 3600))
+(max-checkout-retention (* 9 30 24 3600)))
+(lambda (file)
+  (match (false-if-exception (lstat file))
+(#f 0) ;FILE may have been deleted in the meantime
+(st (min (pk 'ttl (+ (stat:mtime st) ttl))
+ (pk 'maxttl (match (false-if-exception
+  (lstat (in-vicinity file ".git/config")))
+(#f +inf.0)
+(st (+ (stat:mtime st) max-checkout-retention))
 
 (define %checkout-cache-cleanup-period
   ;; Period for the removal of expired cached checkouts.

Namely, a cached checkout as considered as “expired” after 9 months.  In
my case, it gives this:

--8<---cut here---start->8---
scheme@(guix git)> (cached-checkout-expiration 
"/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")

;;; (ttl 1701596081)

;;; (maxttl 1651827028)
$6 = 1651827028
--8<---cut here---end--->8---

Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I’m
specifically thinking about the Guix repo, which every users copies via
‘guix pull’).

Thoughts?

Thanks,
Ludo’.


bug#65720: Guile-Git-managed checkouts grow way too much

2023-09-03 Thread Ludovic Courtès
Hello!

As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
checkouts managed by Guile-Git appear to grow beyond reason.  As an
example, here’s the same ‘.git’ managed with Guile-Git and with Git:

--8<---cut here---start->8---
$ du -hs 
~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
6.7G
/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
$ du -hs .git
517M.git
--8<---cut here---end--->8---

It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.

Ludo’.