Re: [PATCH v6 00/14] Serialized Git Commit Graph
On 3/19/2018 8:55 AM, Derrick Stolee wrote: Thanks for this! Fixing this performance problem is very important to me, as we will use the "--stdin-packs" mechanism in the GVFS scenario (we will walk all commits in the prefetch packs full of commits and trees instead of relying on refs). This speedup is very valuable! Thanks, -Stolee Also, for those interested in this series, I plan to do a rebase onto 2.17.0, when available, as my re-roll. I pushed my responses to the current feedback at the GitHub PR for the series [1]. If you are planning to provide more feedback to the series, then please let me know and I'll delay my re-roll so you have a chance to review. Thanks, -Stolee [1] https://github.com/derrickstolee/git/pull/2
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On 3/16/2018 12:28 PM, Lars Schneider wrote: On 14 Mar 2018, at 21:43, Junio C Hamanowrote: Derrick Stolee writes: Hopefully this version is ready to merge. I have several follow-up topics in mind to submit soon after, including: A few patches add trailing blank lines and other whitespace breakages, which will stop my "git merge" later to 'next' and down, as I have a pre-commit hook to catch them. @stolee: I run "git --no-pager diff --check $BASE_HASH...$HEAD_HASH" to detect these kinds of things. I run this as part of my "prepare patch" [1] script which is inspired by a similar script originally written by Dscho. Do you think it would make sense to mention (or even recommend) such a script in your awesome GfW CONTRIBUTING.md? - Lars [1] https://github.com/larsxschneider/git-list-helper/blob/master/prepare-patch.sh#L71 Thanks for the suggestions. Somehow I got extra whitespace doing copy/paste in vim and I never re-opened that file in my normal editor (VS Code with an extension that shows trailing whitespace). On 3/15/2018 1:23 PM, Johannes Schindelin wrote: git log --check` was introduced to show you whitespace problems). If all of those whitespace issues are unintentional, you can fix them using `git rebase --whitespace=fix` in the most efficient way. Thanks for both of the suggestions. The `rebase` check was already in the document, so I put the checks immediately above that line. PR is available now [1]. Thanks, -Stolee [1] https://github.com/git-for-windows/git/pull/1567
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On 3/16/2018 4:19 PM, Jeff King wrote: On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote: Furthermore, in order to look at an object it has to be zlib inflated first, and since commit objects tend to be much smaller than trees and especially blobs, there are a lot less bytes to inflate: $ grep ^commit type-size |cut -d' ' -f2 |avg 34395730 / 53754 = 639 $ cat type-size |cut -d' ' -f2 |avg 3866685744 / 244723 = 15800 So a simple revision walk inflates less than 1% of the bytes that the "enumerate objects packfiles" approach has to inflate. I don't think this is quite accurate. It's true that we have to _consider_ every object, but Git is smart enough not to inflate each one to find its type. For loose objects we just inflate the header. For packed objects, we either pick the type directly out of the packfile header (for a non-delta) or can walk the delta chain (without actually looking at the data bytes!) until we hit the base. Hmm, so that's a big part of the problem with this patch series. It actually _does_ unpack every object with --stdin-packs to get the type, which is just silly. With the patch below, my time for "commit-graph write --stdin-packs" on linux.git goes from over 5 minutes (I got bored and killed it) to 17 seconds. diff --git a/commit-graph.c b/commit-graph.c index 6348bab82b..cf1da2e8c1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid, { struct packed_oid_list *list = (struct packed_oid_list*)data; enum object_type type; - unsigned long size; - void *inner_data; off_t offset = nth_packed_object_offset(pack, pos); - inner_data = unpack_entry(pack, offset, , ); - FREE_AND_NULL(inner_data); + struct object_info oi = OBJECT_INFO_INIT; + + oi.typep = + if (packed_object_info(pack, offset, ) < 0) + die("unable to get type of object %s", oid_to_hex(oid)); if (type != OBJ_COMMIT) return 0; -Peff Thanks for this! Fixing this performance problem is very important to me, as we will use the "--stdin-packs" mechanism in the GVFS scenario (we will walk all commits in the prefetch packs full of commits and trees instead of relying on refs). This speedup is very valuable! Thanks, -Stolee
Re: [PATCH v6 00/14] Serialized Git Commit Graph
Hi Junio, On Fri, 16 Mar 2018, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > I understand that it is a trade-off between time you have to spend and > > that others have to spend, and since you do not scale, that trade-off > > has to be in your favor. > > That tradeoff may exist, but it does not weigh in the picture above > at all. It does, however. You frequently do not even tell the original contributor what changes you made while applying the patches. I know because I have been surprised by some of those changes, long after you merged them into `master`. And quite honestly, my time is valuable, too. So you should stop assuming that I, for one (and probably other contributors, too) compare carefully what differences exist between the local topic branch and what you chose to make of the patches. I cannot make you stop suggesting that, but I can tell you right away that I won't do that unless I *have* to. It is an inefficient use of my time. I wish you would also realize that it invariably leads to your having to "touch up" iteration after iteration because your touch-ups had not been picked up. Having said that, I can live with the status quo. I have a track record of being able to. Ciao, Dscho
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On Fri, Mar 16, 2018 at 11:33:55AM -0700, Junio C Hamano wrote: > It is not so surprising that history walking runs rings around > enumerating objects in packfiles, if packfiles are built well. > > A well-built packfile tends to has newer objects in base form and > has delta that goes in backward direction (older objects are > represented as delta against newer ones). This helps warlking from > the tips of the history quite a bit, because your delta base cache > will tend to have the base object (i.e. objects in the newer part of > the history you just walked) that will be required to access the > "next" older part of the history more often than not. > > Trying to read the objects in the pack in their object name order > would essentially mean reading them in a cryptgraphically random > order. Half the time you will end up wanting to access an object > that is near the tip of a very deep delta chain even before you've > accessed any of the base objects in the delta chain. I coincidentally was doing some experiments in this area a few weeks ago, and found a few things: 1. The ordering makes a _huge_ difference for accessing trees and blobs. 2. Pack order (not pack-idx order) is actually the best order, since it tends to follow the delta patterns (it's close to traversal order, but packs delta families more tightly). 3. None of this really matters for commits, since we almost never store them as deltas anyway. Here are a few experiments people can do themselves to demonstrate (my numbers here are all from linux.git, which is sort of a wort-case for bad ordering because its size stresses the default delta cache): [every object in sha1 order: slow] $ time git cat-file --batch-all-objects --batch >/dev/null real 8m44.041s user 8m31.359s sys 0m12.262s [every object from a traversal: faster, but --objects traversals are actually CPU heavy due to all of the hash lookups for each tree. Note not just wall-clock time but the CPU since it's split across two processes] $ time git rev-list --objects --all | cut -d' ' -f2 | git cat-file --batch >/dev/null real 1m2.667s user 0m58.537s sys 0m32.392s [every object in pack order: fastest. This is due to skipping the traversal overhead, and should use our delta cache quite efficiently. I'm assuming a single pack and no loose objects here, but the performance should generalize since accessing the "big" pack dominates] $ time git show-index <$(ls .git/objects/pack/*.idx) | sort -n | cut -d' ' -f2 | git cat-file --batch >/dev/null real 0m51.718s user 0m50.963s sys 0m7.068s [just commits, sha1 order: not horrible] $ time git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectname)' | grep ^commit | cut -d' ' -f2 | git cat-file --batch >/dev/null real 0m8.115s user 0m14.033s sys 0m1.170s [just commits, pack order: slightly worse due to the extra piping, but obviously that could be done more quickly internally] $ time git show-index <$(ls .git/objects/pack/*.idx) | sort -n | cut -d' ' -f2 | git cat-file --batch-check='%(objecttype) %(objectname)' | grep ^commit | cut -d' ' -f2 | git cat-file --batch >/dev/null real 0m21.670s user 0m24.867s sys 0m9.600s [and the reason is that hardly any commits get deltas] $ git cat-file --batch-all-objects --batch-check='%(objecttype) %(deltabase)' | grep ^commit >commits $ wc -l commits 692596 $ grep -v '' commits | wc -l 18856 For the purposes of this patch series, I don't think the order matters much, since we're only dealing with commits. For doing --batch-check, I think the sha1 ordering given by "cat-file --batch-all-objects" is convenient, and doesn't have a big impact on performance. But it's _awful_ for --batch. I think we may want to add a sorting option to just return the objects in the original packfile order. -Peff
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote: > > Furthermore, in order to look at an object it has to be zlib inflated > > first, and since commit objects tend to be much smaller than trees and > > especially blobs, there are a lot less bytes to inflate: > > > > $ grep ^commit type-size |cut -d' ' -f2 |avg > > 34395730 / 53754 = 639 > > $ cat type-size |cut -d' ' -f2 |avg > > 3866685744 / 244723 = 15800 > > > > So a simple revision walk inflates less than 1% of the bytes that the > > "enumerate objects packfiles" approach has to inflate. > > I don't think this is quite accurate. It's true that we have to > _consider_ every object, but Git is smart enough not to inflate each one > to find its type. For loose objects we just inflate the header. For > packed objects, we either pick the type directly out of the packfile > header (for a non-delta) or can walk the delta chain (without actually > looking at the data bytes!) until we hit the base. Hmm, so that's a big part of the problem with this patch series. It actually _does_ unpack every object with --stdin-packs to get the type, which is just silly. With the patch below, my time for "commit-graph write --stdin-packs" on linux.git goes from over 5 minutes (I got bored and killed it) to 17 seconds. diff --git a/commit-graph.c b/commit-graph.c index 6348bab82b..cf1da2e8c1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid, { struct packed_oid_list *list = (struct packed_oid_list*)data; enum object_type type; - unsigned long size; - void *inner_data; off_t offset = nth_packed_object_offset(pack, pos); - inner_data = unpack_entry(pack, offset, , ); - FREE_AND_NULL(inner_data); + struct object_info oi = OBJECT_INFO_INIT; + + oi.typep = + if (packed_object_info(pack, offset, ) < 0) + die("unable to get type of object %s", oid_to_hex(oid)); if (type != OBJ_COMMIT) return 0; -Peff
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On Fri, Mar 16, 2018 at 7:33 PM, Junio C Hamanowrote: > SZEDER Gábor writes: > >> You should forget '--stdin-packs' and use '--stdin-commits' to generate >> the initial graph, it's much faster even without '--additive'[1]. See >> >> >> https://public-inbox.org/git/CAM0VKj=wmkBNH=pscrztxfrc13rig1easw89q6ljansdjde...@mail.gmail.com/ >> >> I still think that the default behaviour for 'git commit-graph write' >> should simply walk history from all refs instead of enumerating all >> objects in all packfiles. > > Somehow I missed that one. Thanks for the link to it. > > It is not so surprising that history walking runs rings around > enumerating objects in packfiles, if packfiles are built well. > > A well-built packfile tends to has newer objects in base form and > has delta that goes in backward direction (older objects are > represented as delta against newer ones). This helps warlking from > the tips of the history quite a bit, because your delta base cache > will tend to have the base object (i.e. objects in the newer part of > the history you just walked) that will be required to access the > "next" older part of the history more often than not. > > Trying to read the objects in the pack in their object name order > would essentially mean reading them in a cryptgraphically random > order. Half the time you will end up wanting to access an object > that is near the tip of a very deep delta chain even before you've > accessed any of the base objects in the delta chain. I came up with a different explanation back then: we are only interested in commit objects when creating the commit graph, and only a small-ish fraction of all objects are commit objects, so the "enumerate objects in packfiles" approach has to look at a lot more objects: # in my git fork $ git rev-list --all --objects |cut -d' ' -f1 |\ git cat-file --batch-check='%(objecttype) %(objectsize)' >type-size $ grep -c ^commit type-size 53754 $ wc -l type-size 244723 type-size I.e. only about 20% of all objects are commit objects. Furthermore, in order to look at an object it has to be zlib inflated first, and since commit objects tend to be much smaller than trees and especially blobs, there are a lot less bytes to inflate: $ grep ^commit type-size |cut -d' ' -f2 |avg 34395730 / 53754 = 639 $ cat type-size |cut -d' ' -f2 |avg 3866685744 / 244723 = 15800 So a simple revision walk inflates less than 1% of the bytes that the "enumerate objects packfiles" approach has to inflate. >> [1] - Please excuse the bikeshed: '--additive' is such a strange >> sounding option name, at least for me. '--append', perhaps? > > Yeah, I think "fetch --append" is probably a precedence.
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On Fri, Mar 16, 2018 at 08:48:49PM +0100, SZEDER Gábor wrote: > I came up with a different explanation back then: we are only interested > in commit objects when creating the commit graph, and only a small-ish > fraction of all objects are commit objects, so the "enumerate objects in > packfiles" approach has to look at a lot more objects: > > # in my git fork > $ git rev-list --all --objects |cut -d' ' -f1 |\ > git cat-file --batch-check='%(objecttype) %(objectsize)' >type-size > $ grep -c ^commit type-size > 53754 > $ wc -l type-size > 244723 type-size > > I.e. only about 20% of all objects are commit objects. > > Furthermore, in order to look at an object it has to be zlib inflated > first, and since commit objects tend to be much smaller than trees and > especially blobs, there are a lot less bytes to inflate: > > $ grep ^commit type-size |cut -d' ' -f2 |avg > 34395730 / 53754 = 639 > $ cat type-size |cut -d' ' -f2 |avg > 3866685744 / 244723 = 15800 > > So a simple revision walk inflates less than 1% of the bytes that the > "enumerate objects packfiles" approach has to inflate. I don't think this is quite accurate. It's true that we have to _consider_ every object, but Git is smart enough not to inflate each one to find its type. For loose objects we just inflate the header. For packed objects, we either pick the type directly out of the packfile header (for a non-delta) or can walk the delta chain (without actually looking at the data bytes!) until we hit the base. So starting from scratch: git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectname)' | grep ^commit | cut -d' ' -f2 | git cat-file --batch is in the same ballpark for most repos as: git rev-list --all | git cat-file --batch though in my timings the traversal is a little bit faster (and I'd expect that to remain the case when doing it all in a single process, since the traversal only follows commit links, whereas processing the object list has to do the type lookup for each object before deciding whether to inflate it). I'm not sure, though, if that edge would remain for incremental updates. For instance, after we take in some new objects via "fetch", the traversal strategy would want to do something like: git rev-list $new_tips --not --all | git cat-file --batch whose performance will depend on the refs _currently_ in the repository, as we load them as UNINTERESTING tips for the walk. Whereas doing: git show-index <.git/objects/pack/the-one-new-packfile.idx | cut -d' ' -f2 | git cat-file --batch-check='%(objecttype) %(objectname)' | grep ^commit | cut -d' ' -f2 | git cat-file --batch always scales exactly with the size of the new objects (obviously that's kind of baroque and this would all be done internally, but I'm trying to demonstrate the algorithmic complexity). I'm not sure what the plan would be if we explode loose objects, though. ;) -Peff
Re: [PATCH v6 00/14] Serialized Git Commit Graph
SZEDER Gáborwrites: > You should forget '--stdin-packs' and use '--stdin-commits' to generate > the initial graph, it's much faster even without '--additive'[1]. See > > > https://public-inbox.org/git/CAM0VKj=wmkBNH=pscrztxfrc13rig1easw89q6ljansdjde...@mail.gmail.com/ > > I still think that the default behaviour for 'git commit-graph write' > should simply walk history from all refs instead of enumerating all > objects in all packfiles. Somehow I missed that one. Thanks for the link to it. It is not so surprising that history walking runs rings around enumerating objects in packfiles, if packfiles are built well. A well-built packfile tends to has newer objects in base form and has delta that goes in backward direction (older objects are represented as delta against newer ones). This helps warlking from the tips of the history quite a bit, because your delta base cache will tend to have the base object (i.e. objects in the newer part of the history you just walked) that will be required to access the "next" older part of the history more often than not. Trying to read the objects in the pack in their object name order would essentially mean reading them in a cryptgraphically random order. Half the time you will end up wanting to access an object that is near the tip of a very deep delta chain even before you've accessed any of the base objects in the delta chain. > [1] - Please excuse the bikeshed: '--additive' is such a strange > sounding option name, at least for me. '--append', perhaps? Yeah, I think "fetch --append" is probably a precedence.
Re: [PATCH v6 00/14] Serialized Git Commit Graph
Johannes Schindelinwrites: >> > Stolee, you definitely want to inspect those changes (`git log --check` >> > was introduced to show you whitespace problems). If all of those >> > whitespace issues are unintentional, you can fix them using `git rebase >> > --whitespace=fix` in the most efficient way. >> >> Another way that may be easier (depending on the way Derrick works) >> is to fetch from me and start working from there, as if they were >> the last set of commits that were sent to the list. "git log >> --first-parent --oneline master..pu" would show where the tip of the >> topic is. > > That is not really easier. We had that discussion before. Stolee would > have to remove your Signed-off-by: lines *manually*. In return, all the whitespace fixes (and other fixes if any) I did on my end can be reused free by the submitter, instead of having to redo it *manually*. If a reroll of the series does not touch one specific commit, that commit can be left as-is; I do not see a need to remove anybody's sign-off or add yet another of your own, if the last two sign-offs are from you and your upstream maintainer, if you did not change anythning in what you got from the latter. This depends on what tool is used to work on refinement, but with "rebase -i", you'd leave "pick" as "pick" and not "edit" or "reword" and it would do the right thing. If you did refine, you get an editor when you record that refinement, so it is just a few key strokes, either "dd" or \C-k, to do that removal *manually*. So I am not sure why you are making a mountain out of this molehill. If you do want to remove the last two sign-off (i.e. penultimate one by the author done during the initial submission, plus the last one by me), well, "rebase -i" is open source. We can add features to the tool to help everybody collaborate better. Extending changes like planned addition of --signoff by Phillip, it is not all that far-fetched to add a mechanism that notices a project-specific trailer rewrite rules in-tree and uses that in between each step to rewrite the trailer block of the commit message, for example, and the rule > I understand that it is a trade-off between time you have to spend and > that others have to spend, and since you do not scale, that trade-off has > to be in your favor. That tradeoff may exist, but it does not weigh in the picture above at all. Perhaps it is better to try to actually think of a way to work together better, instead of just whining.
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On Fri, Mar 16, 2018 at 4:06 PM, Ævar Arnfjörð Bjarmasonwrote: > I noticed that it takes a *long* time to generate the graph, on a bigger > repo I have it takes 20 minutes, and this is a repo where repack -A -d > itself takes 5-8 minutes, probably on the upper end of that with the > bitmap, but once you do that it's relatively snappy with --stdin-commits > --additive when I feed it the new commits. > > I don't have any need really to make this run in 10m instead of 20m, > just something I found interesting, i.e. how it compares to the repack > itself. You should forget '--stdin-packs' and use '--stdin-commits' to generate the initial graph, it's much faster even without '--additive'[1]. See https://public-inbox.org/git/CAM0VKj=wmkBNH=pscrztxfrc13rig1easw89q6ljansdjde...@mail.gmail.com/ I still think that the default behaviour for 'git commit-graph write' should simply walk history from all refs instead of enumerating all objects in all packfiles. [1] - Please excuse the bikeshed: '--additive' is such a strange sounding option name, at least for me. '--append', perhaps?
Re: [PATCH v6 00/14] Serialized Git Commit Graph
> On 14 Mar 2018, at 21:43, Junio C Hamanowrote: > > Derrick Stolee writes: > >> This v6 includes feedback around csum-file.c and the rename of hashclose() >> to finalize_hashfile(). These are the first two commits of the series, so >> they could be pulled out independently. >> >> The only other change since v5 is that I re-ran the performance numbers >> in "commit: integrate commit graph with commit parsing". > > Thanks. > >> Hopefully this version is ready to merge. I have several follow-up topics >> in mind to submit soon after, including: > > A few patches add trailing blank lines and other whitespace > breakages, which will stop my "git merge" later to 'next' and down, > as I have a pre-commit hook to catch them. @stolee: I run "git --no-pager diff --check $BASE_HASH...$HEAD_HASH" to detect these kinds of things. I run this as part of my "prepare patch" [1] script which is inspired by a similar script originally written by Dscho. Do you think it would make sense to mention (or even recommend) such a script in your awesome GfW CONTRIBUTING.md? - Lars [1] https://github.com/larsxschneider/git-list-helper/blob/master/prepare-patch.sh#L71
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On Wed, Mar 14 2018, Derrick Stolee jotted: > Hopefully this version is ready to merge. I have several follow-up topics > in mind to submit soon after, including: I've been doing some preliminary testing of this internally, all good so far, on a relatively small repo (~100k commits) I was using for testing: - git -c core.commitGraph=true -C rev-list --all: - /mnt/ext4_graph => min:273mean:279max:389-- (273 274 275 276 277 279 282 282 345 389) -/mnt/ext4 => min:1087 mean:1123 max:1175 -- (1087 1092 1092 1104 1117 1123 1126 1136 1143 1175) This is on a fresh clone with one giant pack and where the commit graph data was generated afterwards with "git commit-graph write" for the *_graph repo, so it contains all the commits. So less than 25% of the mean time it spent before. Nice. Those are times in milliseconds over 10 runs, for this particular one I didn't get much of an improvement in --graph, but still ~10%: - git -c core.commitGraph=true -C log --oneline --graph: - /mnt/ext4_graph => min:1420 mean:1449 max:1586 -- (1420 1423 1428 1434 1449 1449 1490 1548 1567 1586) -/mnt/ext4 => min:1547 mean:1616 max:2136 -- (1547 1557 1581 1585 1598 1616 1621 1740 1964 2136) I noticed that it takes a *long* time to generate the graph, on a bigger repo I have it takes 20 minutes, and this is a repo where repack -A -d itself takes 5-8 minutes, probably on the upper end of that with the bitmap, but once you do that it's relatively snappy with --stdin-commits --additive when I feed it the new commits. I don't have any need really to make this run in 10m instead of 20m, just something I found interesting, i.e. how it compares to the repack itself.
Re: [PATCH v6 00/14] Serialized Git Commit Graph
Hi Junio, On Thu, 15 Mar 2018, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > Stolee, you definitely want to inspect those changes (`git log --check` > > was introduced to show you whitespace problems). If all of those > > whitespace issues are unintentional, you can fix them using `git rebase > > --whitespace=fix` in the most efficient way. > > Another way that may be easier (depending on the way Derrick works) > is to fetch from me and start working from there, as if they were > the last set of commits that were sent to the list. "git log > --first-parent --oneline master..pu" would show where the tip of the > topic is. That is not really easier. We had that discussion before. Stolee would have to remove your Signed-off-by: lines *manually*. I understand that it is a trade-off between time you have to spend and that others have to spend, and since you do not scale, that trade-off has to be in your favor. My hope is that we will eventually collaborate more effectively using Git itself, then those trade-offs will become a lot less involved because the overall cost will be a lot smaller. Ciao, Dscho
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On 15/03/18 18:41, Junio C Hamano wrote: > Johannes Schindelinwrites: > >> Stolee, you definitely want to inspect those changes (`git log --check` >> was introduced to show you whitespace problems). If all of those >> whitespace issues are unintentional, you can fix them using `git rebase >> --whitespace=fix` in the most efficient way. > > Another way that may be easier (depending on the way Derrick works) > is to fetch from me and start working from there, as if they were > the last set of commits that were sent to the list. "git log > --first-parent --oneline master..pu" would show where the tip of the > topic is. BTW, thanks for adding the 'SQUASH??? sparse fixes' on top of that branch - sparse is now quiet on the 'pu' branch. (The same can't be said of static-check.pl, but that is a different issue. ;-) ). Thanks! ATB, Ramsay Jones
Re: [PATCH v6 00/14] Serialized Git Commit Graph
Johannes Schindelinwrites: > Stolee, you definitely want to inspect those changes (`git log --check` > was introduced to show you whitespace problems). If all of those > whitespace issues are unintentional, you can fix them using `git rebase > --whitespace=fix` in the most efficient way. Another way that may be easier (depending on the way Derrick works) is to fetch from me and start working from there, as if they were the last set of commits that were sent to the list. "git log --first-parent --oneline master..pu" would show where the tip of the topic is.
Re: [PATCH v6 00/14] Serialized Git Commit Graph
Hi Junio, On Wed, 14 Mar 2018, Junio C Hamano wrote: > A few patches add trailing blank lines and other whitespace > breakages, which will stop my "git merge" later to 'next' and down, > as I have a pre-commit hook to catch them. I wonder how you cope with the intentional "whitespace breakage" caused by a TAB after HS in my recreate-merges patch series... > Here is the output from my "git am -s" session. > > Applying: csum-file: rename hashclose() to finalize_hashfile() > Applying: csum-file: refactor finalize_hashfile() method > .git/rebase-apply/patch:109: new blank line at EOF. Stolee, you definitely want to inspect those changes (`git log --check` was introduced to show you whitespace problems). If all of those whitespace issues are unintentional, you can fix them using `git rebase --whitespace=fix` in the most efficient way. Ciao, Dscho
Re: [PATCH v6 00/14] Serialized Git Commit Graph
Derrick Stoleewrites: > This v6 includes feedback around csum-file.c and the rename of hashclose() > to finalize_hashfile(). These are the first two commits of the series, so > they could be pulled out independently. > > The only other change since v5 is that I re-ran the performance numbers > in "commit: integrate commit graph with commit parsing". Thanks. > Hopefully this version is ready to merge. I have several follow-up topics > in mind to submit soon after, including: A few patches add trailing blank lines and other whitespace breakages, which will stop my "git merge" later to 'next' and down, as I have a pre-commit hook to catch them. Here is the output from my "git am -s" session. Applying: csum-file: rename hashclose() to finalize_hashfile() Applying: csum-file: refactor finalize_hashfile() method .git/rebase-apply/patch:109: new blank line at EOF. + warning: 1 line adds whitespace errors. Applying: commit-graph: add format document .git/rebase-apply/patch:175: new blank line at EOF. + warning: 1 line adds whitespace errors. Applying: graph: add commit graph design document .git/rebase-apply/patch:42: new blank line at EOF. + .git/rebase-apply/patch:109: new blank line at EOF. + warning: 2 lines add whitespace errors. Applying: commit-graph: create git-commit-graph builtin .git/rebase-apply/patch:323: space before tab in indent. fd = hold_lock_file_for_update(, graph_name, 0); .git/rebase-apply/patch:334: space before tab in indent. fd = hold_lock_file_for_update(, graph_name, LOCK_DIE_ON_ERROR); .git/rebase-apply/patch:385: new blank line at EOF. + .git/rebase-apply/patch:398: new blank line at EOF. + warning: 2 lines applied after fixing whitespace errors. Applying: commit-graph: implement write_commit_graph() .git/rebase-apply/patch:138: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:144: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:154: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:160: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:197: indent with spaces. cd "$TRASH_DIRECTORY/full" && warning: squelched 6 whitespace errors warning: 10 lines applied after fixing whitespace errors. Applying: commit-graph: implement 'git-commit-graph write' Test number t5318 already taken .git/rebase-apply/patch:346: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:356: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:366: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:374: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:384: indent with spaces. cd "$TRASH_DIRECTORY/bare" && warning: 5 lines add whitespace errors. Applying: commit-graph: implement git commit-graph read Applying: commit-graph: add core.commitGraph setting Applying: commit-graph: close under reachability .git/rebase-apply/patch:302: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:310: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:321: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:331: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:341: indent with spaces. cd "$TRASH_DIRECTORY/full" && warning: squelched 2 whitespace errors warning: 7 lines add whitespace errors. Applying: commit: integrate commit graph with commit parsing .git/rebase-apply/patch:224: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:227: trailing whitespace. graph_read_expect "9" "large_edges" .git/rebase-apply/patch:234: indent with spaces. cd "$TRASH_DIRECTORY" && warning: 2 lines applied after fixing whitespace errors. Applying: commit-graph: read only from specific pack-indexes .git/rebase-apply/patch:196: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:209: indent with spaces. cd "$TRASH_DIRECTORY" && warning: 1 line applied after fixing whitespace errors. Applying: commit-graph: build graph from starting commits .git/rebase-apply/patch:148: indent with spaces. cd "$TRASH_DIRECTORY/full" && .git/rebase-apply/patch:158: indent with spaces. cd "$TRASH_DIRECTORY" && warning: 1 line applied after fixing whitespace errors. Applying: commit-graph: implement "--additive" option
Re: [PATCH v6 00/14] Serialized Git Commit Graph
On 14/03/18 19:27, Derrick Stolee wrote: > This v6 includes feedback around csum-file.c and the rename of hashclose() > to finalize_hashfile(). These are the first two commits of the series, so > they could be pulled out independently. > > The only other change since v5 is that I re-ran the performance numbers > in "commit: integrate commit graph with commit parsing". I haven't looked at v6 (I will wait for it to hit pu), but v5 is still causing sparse to complain. The diff given below (on top of current pu @9e418c7c9), fixes it for me. (Using a plain integer as a NULL pointer, in builtin/commit- graph.c, and the 'commit_graph' symbol should be file-local, in commit-graph.c). Thanks! ATB, Ramsay Jones -- >8 -- diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 62ac26e44..855df66bd 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -31,7 +31,7 @@ static struct opts_commit_graph { static int graph_read(int argc, const char **argv) { - struct commit_graph *graph = 0; + struct commit_graph *graph = NULL; char *graph_name; static struct option builtin_commit_graph_read_options[] = { diff --git a/commit-graph.c b/commit-graph.c index 631edac4c..7b45fe85d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -182,7 +182,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file) } /* global storage */ -struct commit_graph *commit_graph = NULL; +static struct commit_graph *commit_graph = NULL; static void prepare_commit_graph_one(const char *obj_dir) {
[PATCH v6 00/14] Serialized Git Commit Graph
This v6 includes feedback around csum-file.c and the rename of hashclose() to finalize_hashfile(). These are the first two commits of the series, so they could be pulled out independently. The only other change since v5 is that I re-ran the performance numbers in "commit: integrate commit graph with commit parsing". Hopefully this version is ready to merge. I have several follow-up topics in mind to submit soon after, including: * Auto-generate the commit graph as the repo changes: i. teach git-commit-graph an "fsck" subcommand and integrate with git-fsck ii. teach git-repack to call git-commit-graph * Generation numbers: i. teach git-commit-graph to compute generation numbers ii. consume generation numbers in paint_down_to_common() * Move globals from commit-graph.c to the_repository The three bullets (*) are relatively independent but have sub-items that appear in priority order. Derrick Stolee (14): csum-file: rename hashclose() to finalize_hashfile() csum-file: refactor finalize_hashfile() method commit-graph: add format document graph: add commit graph design document commit-graph: create git-commit-graph builtin commit-graph: implement write_commit_graph() commit-graph: implement 'git-commit-graph write' commit-graph: implement git commit-graph read commit-graph: add core.commitGraph setting commit-graph: close under reachability commit: integrate commit graph with commit parsing commit-graph: read only from specific pack-indexes commit-graph: build graph from starting commits commit-graph: implement "--additive" option .gitignore | 1 + Documentation/config.txt| 3 + Documentation/git-commit-graph.txt | 93 +++ Documentation/technical/commit-graph-format.txt | 98 Documentation/technical/commit-graph.txt| 164 ++ Makefile| 2 + alloc.c | 1 + builtin.h | 1 + builtin/commit-graph.c | 172 ++ builtin/index-pack.c| 2 +- builtin/pack-objects.c | 6 +- bulk-checkin.c | 4 +- cache.h | 1 + command-list.txt| 1 + commit-graph.c | 719 commit-graph.h | 47 ++ commit.c| 3 + commit.h| 3 + config.c| 5 + contrib/completion/git-completion.bash | 2 + csum-file.c | 10 +- csum-file.h | 9 +- environment.c | 1 + fast-import.c | 2 +- git.c | 1 + pack-bitmap-write.c | 2 +- pack-write.c| 5 +- packfile.c | 4 +- packfile.h | 2 + t/t5318-commit-graph.sh | 225 30 files changed, 1568 insertions(+), 21 deletions(-) create mode 100644 Documentation/git-commit-graph.txt create mode 100644 Documentation/technical/commit-graph-format.txt create mode 100644 Documentation/technical/commit-graph.txt create mode 100644 builtin/commit-graph.c create mode 100644 commit-graph.c create mode 100644 commit-graph.h create mode 100755 t/t5318-commit-graph.sh base-commit: d0db9edba0050ada6f6eac68061599690d2a4333 -- 2.14.1