Re: [PATCH 00/14] Serialized Commit Graph

2018-01-26 Thread Derrick Stolee

On 1/25/2018 6:06 PM, Ævar Arnfjörð Bjarmason wrote:

On Thu, Jan 25 2018, Derrick Stolee jotted:

Oops! This is my mistake. The correct command should be:

  git show-ref -s | git graph --write --update-head --stdin-commits

Without "--stdin-commits" the command will walk all packed objects
to look for commits and then build the graph. That's why it's taking
so long. That method takes several minutes on the Linux repo, but with
--stdin-commits it should take as long as "git log >/dev/null".

Thanks, it took around 15m to finish with the command I initially ran on
my test repo.

Then the `merge-base --is-ancestor` performance problem I was
complaining about in
https://public-inbox.org/git/87608bawoa@evledraar.gmail.com/ takes
around 1s with your series, 5s without it. Nice.


Thanks for testing this! May I ask how many commits are in your repo? 
One way to find out is to run 'git graph --read' and it will tell you 
how many commits are in the serialized graph.


Thanks,
-Stolee


Re: [PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Ævar Arnfjörð Bjarmason

On Thu, Jan 25 2018, Derrick Stolee jotted:

> On 1/25/2018 10:46 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Jan 25 2018, Derrick Stolee jotted:
>>
>>> * 'git log --topo-order -1000' walks all reachable commits to avoid
>>>incorrect topological orders, but only needs the commit message for
>>>the top 1000 commits.
>>>
>>> * 'git merge-base  ' may walk many commits to find the correct
>>>boundary between the commits reachable from A and those reachable
>>>from B. No commit messages are needed.
>>>
>>> * 'git branch -vv' checks ahead/behind status for all local branches
>>>compared to their upstream remote branches. This is essentially as
>>>hard as computing merge bases for each.
>> This is great, spotted / questions so far:
>>
>> * git graph --blah says you need to enable the config, should say
>>"unknown option --blah ". I.e. overzelous config guard.
>
> This is a good point.
>
>> * On a big repo (git show-ref -s | ~/g/git/git-graph --write
>>--update-head) is as of writing this still hanging for me, but strace
>>shows it's brk()-ing. Presumably just still busy, a progress bar would
>>be very nice.
>
> Oops! This is my mistake. The correct command should be:
>
>  git show-ref -s | git graph --write --update-head --stdin-commits
>
> Without "--stdin-commits" the command will walk all packed objects
> to look for commits and then build the graph. That's why it's taking
> so long. That method takes several minutes on the Linux repo, but with
> --stdin-commits it should take as long as "git log >/dev/null".

Thanks, it took around 15m to finish with the command I initially ran on
my test repo.

Then the `merge-base --is-ancestor` performance problem I was
complaining about in
https://public-inbox.org/git/87608bawoa@evledraar.gmail.com/ takes
around 1s with your series, 5s without it. Nice.


Re: [PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Derrick Stolee

On 1/25/2018 10:46 AM, Ævar Arnfjörð Bjarmason wrote:

On Thu, Jan 25 2018, Derrick Stolee jotted:


* 'git log --topo-order -1000' walks all reachable commits to avoid
   incorrect topological orders, but only needs the commit message for
   the top 1000 commits.

* 'git merge-base  ' may walk many commits to find the correct
   boundary between the commits reachable from A and those reachable
   from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
   compared to their upstream remote branches. This is essentially as
   hard as computing merge bases for each.

This is great, spotted / questions so far:

* git graph --blah says you need to enable the config, should say
   "unknown option --blah ". I.e. overzelous config guard.


This is a good point.


* On a big repo (git show-ref -s | ~/g/git/git-graph --write
   --update-head) is as of writing this still hanging for me, but strace
   shows it's brk()-ing. Presumably just still busy, a progress bar would
   be very nice.


Oops! This is my mistake. The correct command should be:

    git show-ref -s | git graph --write --update-head --stdin-commits

Without "--stdin-commits" the command will walk all packed objects
to look for commits and then build the graph. That's why it's taking
so long. That method takes several minutes on the Linux repo, but with
--stdin-commits it should take as long as "git log >/dev/null".


* Shouldn't there be a pack.useGraph option so this gets auto-updated on
   repack? I understand this series is a WIP, so that's more a "is that
   the UI" than "it needs now".


This will definitely be part of a follow-up patch.

Thanks,
-Stolee


Re: [PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Ævar Arnfjörð Bjarmason

On Thu, Jan 25 2018, Derrick Stolee jotted:

> * 'git log --topo-order -1000' walks all reachable commits to avoid
>   incorrect topological orders, but only needs the commit message for
>   the top 1000 commits.
>
> * 'git merge-base  ' may walk many commits to find the correct
>   boundary between the commits reachable from A and those reachable
>   from B. No commit messages are needed.
>
> * 'git branch -vv' checks ahead/behind status for all local branches
>   compared to their upstream remote branches. This is essentially as
>   hard as computing merge bases for each.

This is great, spotted / questions so far:

* git graph --blah says you need to enable the config, should say
  "unknown option --blah ". I.e. overzelous config guard.

* On a big repo (git show-ref -s | ~/g/git/git-graph --write
  --update-head) is as of writing this still hanging for me, but strace
  shows it's brk()-ing. Presumably just still busy, a progress bar would
  be very nice.

* Shouldn't there be a pack.useGraph option so this gets auto-updated on
  repack? I understand this series is a WIP, so that's more a "is that
  the UI" than "it needs now".


[PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Derrick Stolee
As promised [1], this patch contains a way to serialize the commit graph.
The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base  ' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

Here are some performance results for a copy of the Linux repository
where 'master' has 704,766 reachable commits and is behind 'origin/master'
by 19,610 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv   |  0.42s |  0.27s | -35%  |
| rev-list --all   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

To test this yourself, run the following on your repo:

git config core.graph true
git show-ref -s | git graph --write --update-head

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisions by toggling the 'core.graph' setting.

[1] 
https://public-inbox.org/git/d154319e-bb9e-b300-7c37-27b1dcd2a...@jeffhostetler.com/
Re: What's cooking in git.git (Jan 2018, #03; Tue, 23)

[2] https://github.com/derrickstolee/git/pull/2
A GitHub pull request containing the latest version of this patch.

P.S. I'm sending this patch from my gmail address to avoid Outlook
munging the URLs included in the design document.

Derrick Stolee (14):
  graph: add packed graph design document
  packed-graph: add core.graph setting
  packed-graph: create git-graph builtin
  packed-graph: add format document
  packed-graph: implement construct_graph()
  packed-graph: implement git-graph --write
  packed-graph: implement git-graph --read
  graph: implement git-graph --update-head
  packed-graph: implement git-graph --clear
  packed-graph: teach git-graph --delete-expired
  commit: integrate packed graph with commit parsing
  packed-graph: read only from specific pack-indexes
  packed-graph: close under reachability
  packed-graph: teach git-graph to read commits

 Documentation/config.txt |   3 +
 Documentation/git-graph.txt  | 102 
 Documentation/technical/graph-format.txt |  88 
 Documentation/technical/packed-graph.txt | 185 +++
 Makefile |   2 +
 alloc.c  |   1 +
 builtin.h|   1 +
 builtin/graph.c  | 231 +
 cache.h  |   1 +
 command-list.txt |   1 +
 commit.c |  20 +-
 commit.h |   2 +
 config.c |   5 +
 environment.c|   1 +
 git.c|   1 +
 log-tree.c   |   3 +-
 packed-graph.c   | 840 +++
 packed-graph.h   |  65 +++
 packfile.c   |   4 +-
 packfile.h   |   2 +
 t/t5319-graph.sh | 271 ++
 21 files changed, 1822 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/git-graph.txt
 create mode 100644 Documentation/technical/graph-format.txt
 create mode 100644 Documentation/technical/packed-graph.txt
 create mode 100644 builtin/graph.c
 create mode 100644 packed-graph.c
 create mode 100644 packed-graph.h
 create mode 100755 t/t5319-graph.sh

--