Re: comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git

2023-09-14 Thread Simon Tournier
Hi Ludo,

On Thu, 14 Sep 2023 at 12:30, Ludovic Courtès  wrote:

>but I don’t think
> we can get a decent throughput if we shell out for all these things
> (assuming ‘git’ can even give us raw data).

Do you consider that Magit does not have a decent throughput?
Do you consider that Git-Annex does not have a decent throughput?

To my knowledge, they shell out Git plumbing commands; one using Emacs
Lisp and the other Haskell.

And some porcelain Git commands that we all are using daily are Bash
scripts calling plumbing Git commands that shell out.  (Or were Bash
scripts before being replaced by C builtin).

For example, git-rebase, git-pull, git-log, etc.

https://github.com/git/git/commit/55071ea248ef8040e4b29575376273e4dd061683
https://github.com/git/git/commit/b1456605c26eb6bd991b70b0ca0a3ce0f02473e9
https://github.com/git/git/commit/e3a125a94d34d22a8ca53e84949a1bb38cd6e425

The task is probably complex and boring, I agree.  However, I am not
convinced the issue is about a “decent throughput”.  The best would be
to have the same performance using libgit2 and using plumbing Git
command for one example: say ’commit-relation’.  Or another one. :-)

Otherwise, I believe what I am seeing. ;-)

Cheers,
simon



Re: comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git

2023-09-14 Thread Ludovic Courtès
Simon Tournier  skribis:

> On my machine, I get something less spectacular for a history with 1000
> commits in between.
>
> scheme@(guix-user)> ,time (commit-relation* 1000th newest)
> $1 = ancestor
> ;; 0.128948s real time, 0.082921s run time.  0.046578s spent in GC.
> scheme@(guix-user)> ,time (commit-relation 1000th newest)
> $2 = ancestor
> ;; 4.588075s real time, 5.521358s run time.  1.404764s spent in GC.
>
> I did something very similar as wolf is proposing and named it
> ’commit-relation*’.

That’s an order of magnitude.  Probably it could be a bit less if we put
some effort in it (‘commit-relation’ is implemented in a fairly naive
way.)

That said, ‘commit-relation’ is just one example.  I’d encourage
interested people to look at (guix git-authenticate) to get a feel of
what we need.  Most of it is quite pedestrian, like
‘load-keyring-from-reference’ or ‘commit-signing-key’, but I don’t think
we can get a decent throughput if we shell out for all these things
(assuming ‘git’ can even give us raw data).

Ludo’.



Re: comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git

2023-09-12 Thread Attila Lendvai
is the decision between libgit2 and invoking git really such a big commitment?

let's make sure the entire guix codebase uses a single git related API, and 
then we can easily switch back and forth between the two.

on another note, i'm surprised that the reference implementation of git itself 
doesn't have a lib, and libgit2 even had to be written. even this may change in 
the future.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“I learned long ago, never to wrestle with a pig. You get dirty, and besides, 
the pig likes it.”
— George Bernard Shaw (1856–1950)




comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git

2023-09-11 Thread Simon Tournier
Hi,

On Mon, 11 Sep 2023 at 14:26, Maxim Cournoyer  wrote:

> In the grand scheme of things (pun intended), we'd like every
> programming to be feasible via nice Scheme APIs, which is what Guile-Git
> provides to work with git repositories.  The appeal is to have a single
> language to rule them all, reducing friction among Guix contributors.
> The alternative here is to have an API reduced to invoking system
> commands with string arguments, which is less expressive and lacks
> elegance.

As Maxim noticed in the message that I am proposing to revisit, it seems
that libgit2 comes with some performance penalties.  As wolf is
illustrating in the message:

bug#65720: Guile-Git-managed checkouts grow way too much
wolf 
Mon, 11 Sep 2023 16:42:59 +0200
id:ZP8nc1m8rN_34XV-@ws
https://issues.guix.gnu.org//65720
https://issues.guix.gnu.org/msgid/ZP8nc1m8rN_34XV-@ws
https://yhetil.org/guix/ZP8nc1m8rN_34XV-@ws

it might be possible to use an invocation of plain Git command which is
much faster in this case.  Well, that’s need to be investigated, IMHO.

For instance, instead of the current ’commit-relation’ implementation,

(define (commit-relation old new)
  "Return a symbol denoting the relation between OLD and NEW, two commit
objects: 'ancestor (meaning that OLD is an ancestor of NEW), 
'descendant, or
'unrelated, or 'self (OLD and NEW are the same commit)."
  (if (eq? old new)
  'self
  (let ((newest (commit-closure new)))
(if (set-contains? newest old)
'ancestor
(let* ((seen   (list->setq (commit-parents new)))
   (oldest (commit-closure old seen)))
  (if (set-contains? oldest new)
  'descendant
  'unrelated))

which relies on ’commit-closure’, they propose to use a plumbing Git
commands, as:

(define (shelling-commit-relation old new)
  (let ((h-old (oid->string (commit-id old)))
(h-new (oid->string (commit-id new
(cond ((eq? old new)
   'self)
  ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old 
h-new))
   'ancestor)
  ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new 
h-old))
   'descendant)
  (else
   'unrelated

Well, this needs to be checked (read the Git documentation which is
probable harder than read some Scheme implementation ;-)) in order to
see if these invocations are doing the same.


>> I’m quite confident this would be slow
>
> My version is ~2000x faster compared to (guix git):
>
> Guix: 1048.620992ms
> Git:  0.532143ms

On my machine, I get something less spectacular for a history with 1000
commits in between.

--8<---cut here---start->8---
scheme@(guix-user)> ,time (commit-relation* 1000th newest)
$1 = ancestor
;; 0.128948s real time, 0.082921s run time.  0.046578s spent in GC.
scheme@(guix-user)> ,time (commit-relation 1000th newest)
$2 = ancestor
;; 4.588075s real time, 5.521358s run time.  1.404764s spent in GC.
--8<---cut here---end--->8---

I did something very similar as wolf is proposing and named it
’commit-relation*’.

Well, considering the implementation of ’commit-relation’, I think the
slowness is expected compared to the plain plumbing Git command.
Basically, ’commit-closure’ walks the Git history and for sure the loop
cannot be as efficient as an optimized Git specific implementation.

Hum, I think the most annoying is the time spent in GC.  Basically,
’commit-closure’ is building a set with many visited elements and that
set must be garbage collected.  And this GC time is not nothing compared
to the whole time, IMHO.

I agree with the grand scheme of things and that’s why I started this
thread. :-) However, for what it is worth, today I am less convinced
that manipulating libgit2 is able to provide “not-so-worse” performance
compared to what plain plumbing Git commands could offer.

Cheers,
simon