Re: [PATCH 00/83] libify apply and use lib in am
Hi Chris, On Sun, 24 Apr 2016, Christian Couder wrote: > On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder >wrote: > > On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones > > wrote: > >> > >> > >> On 24/04/16 14:33, Christian Couder wrote: > >>> This is a patch series about libifying `git apply` functionality, and > >>> using this libified functionality in `git am`, so that no 'git apply' > >>> process is spawn anymore. This makes `git am` significantly faster, so > >>> `git rebase`, when it uses the am backend, is also significantly > >>> faster. > >> > >> I just tried to git-am these patches, but patch #78 did not make it > >> to the list. > > > > That's strange because gmail tells me it has been sent, and it made it > > to chrisc...@tuxfamily.org. > > Instead of waiting for the patch to appear on the list, you might want > to use branch libify-apply-use-in-am25 in my GitHub repo: > > https://github.com/chriscool/git/commits/libify-apply-use-in-am25 Thanks for this. In particular with longer patch series, I find mail-based patch submissions *really* cumbersome, not only on the submitter's side but also on the consumers' side. I wonder, however, why you use numbers in the branch name to version things? I thought Git allowed for more advanced versioning than that... :-) Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On Mon, Apr 25, 2016 at 4:57 PM, Christian Couderwrote: >> But why write so many times when nobody reads it? We only need to >> write before git-apply exits, > > You mean `git am` here I think. > >> either after successfully applying the >> whole series, or after it stops at conflicts, and maybe even at die() >> and SIGINT. Yes if git-apply segfaults, > > Here too. Yep it's git-am. I didn't read the series, I simply ran and misread the traces a bit. >> then the index update is lost, >> but in such a case, it's usually a good idea to restart fresh anyway. >> When you only write index once (or twice) it won't matter if >> split-index is used. > > Yeah I agree, but it would need further work, that can be done after > this series is merged. Sure. > And I am not sure if the potential gains on a typical rebase would be worth > it. I didn't point it out, but in pathological cases where your patch series touches a lot of (or even every) files in the worktree, the gain from split-index lowers and could even disappear. I don't know how often that can happen in real life though. Also, if you start to use split-index often, please note that I haven't addressed the sharedindex.* pruning part (it's labeled "experimental" for a reason), you may have to un-split the index and rm $GIT_DIR/sharedindex.* manually from time to time to keep disk usage down. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On Mon, Apr 25, 2016 at 11:02 AM, Duy Nguyenwrote: > On Sun, Apr 24, 2016 at 8:33 PM, Christian Couder > wrote: >> Sorry if this patch series is a bit long. I can split it into two or >> more series if it is prefered. > > I suspect you deliberately made the series long just to show off how > good new git-rebase is ;-) Yeah, and `git am` too :-) >> Performance numbers: >> >> - A few days ago Ævar did a huge many-hundred commit rebase on the >> kernel with untracked cache. >> >> command: git rebase --onto 1993b17 52bef0c 29dde7c >> >> Vanilla "next" without split index:1m54.953s >> Vanilla "next" with split index: 1m22.476s >> This series on top of "next" without split index: 1m12.034s >> This series on top of "next" with split index: 0m15.678s > > I was a bit puzzled why split-index helped so much. It shouldn't have > in my opinion unless you paired it with index-helper, to cut down both > read and write time. So I ran some tests. Long story short, I think we > can achieve this gain (and a little more) even without split-index. Yeah, perhaps. For now though Ævar and myself would be happy to just have a config option for split-index :-) The other performance numbers I mentioned show that now the `git am` part of a 13 commit long rebase is reduced from 58% to 19% of the whole rebase time. So further improvements in speeding up `git am` could only make such a rebase at most 19% faster. > I ran my measurement patch [1] with and without your series used this > series as rebase material. Without the series, the picture is not so > surprising. We run git-apply 80+ times, each consists of this sequence > > read index > write index (cache tree updates only) > read index again > optionally initialize name hash (when new entries are added, I guess) > read packed-refs > write index > > With this series, we run a single git-apply which does > > read index (and sharedindex too if in split-index mode) > initialize name hash > write index 80+ times > > This explains why I guessed it wrong: when you write only, not read > back, of course index-helper is not required. And with split-index you > only write as many entries as you touch (which is usually a small > number compared to worktree's size), so writing 80+ times with > split-index is a lot faster than write 80+ times with whole index. Yeah, I tried to explain in the cover letter and in the last patch why this series enables split-index to give great results, but I think you are much better at explaining it. > But why write so many times when nobody reads it? We only need to > write before git-apply exits, You mean `git am` here I think. > either after successfully applying the > whole series, or after it stops at conflicts, and maybe even at die() > and SIGINT. Yes if git-apply segfaults, Here too. > then the index update is lost, > but in such a case, it's usually a good idea to restart fresh anyway. > When you only write index once (or twice) it won't matter if > split-index is used. Yeah I agree, but it would need further work, that can be done after this series is merged. And I am not sure if the potential gains on a typical rebase would be worth it. Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On Mon, Apr 25, 2016 at 2:14 AM, Duy Nguyenwrote: > On Mon, Apr 25, 2016 at 12:42 AM, Ramsay Jones > wrote: >> >> >> On 24/04/16 17:56, Christian Couder wrote: >>> On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder >>> wrote: On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones wrote: > > I just tried to git-am these patches, but patch #78 did not make it > to the list. That's strange because gmail tells me it has been sent, and it made it to chrisc...@tuxfamily.org. > > #78 moves 4000k lines around and ends up ~260k in size, I think it may > have hit vger limit. Yeah probably. Thanks for looking at this. I think I will have to split it into smaller patches in a v2. >>> Instead of waiting for the patch to appear on the list, you might want >>> to use branch libify-apply-use-in-am25 in my GitHub repo: >>> >>> https://github.com/chriscool/git/commits/libify-apply-use-in-am25 >> >> Hmm, that branch doesn't correspond directly to the patches you sent >> out (there are 86 commits, some marked with draft. I think commit d13d2ac >> corresponds kinda to patch #83, but ). >> >> I think I'll wait to see the patches as you intend them to be seen. ;-) > > I git-am'd the series then compared with the rebased version of > libify-apply-use-in-am25 on master. 33198a1 (i.e. > libify-apply-use-in-am25^) matches what was sent in content (didn't > compare commit messages). Thanks for checking. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On Sun, Apr 24, 2016 at 8:33 PM, Christian Couderwrote: > Sorry if this patch series is a bit long. I can split it into two or > more series if it is prefered. I suspect you deliberately made the series long just to show off how good new git-rebase is ;-) > Performance numbers: > > - A few days ago Ævar did a huge many-hundred commit rebase on the > kernel with untracked cache. > > command: git rebase --onto 1993b17 52bef0c 29dde7c > > Vanilla "next" without split index:1m54.953s > Vanilla "next" with split index: 1m22.476s > This series on top of "next" without split index: 1m12.034s > This series on top of "next" with split index: 0m15.678s I was a bit puzzled why split-index helped so much. It shouldn't have in my opinion unless you paired it with index-helper, to cut down both read and write time. So I ran some tests. Long story short, I think we can achieve this gain (and a little more) even without split-index. I ran my measurement patch [1] with and without your series used this series as rebase material. Without the series, the picture is not so surprising. We run git-apply 80+ times, each consists of this sequence read index write index (cache tree updates only) read index again optionally initialize name hash (when new entries are added, I guess) read packed-refs write index With this series, we run a single git-apply which does read index (and sharedindex too if in split-index mode) initialize name hash write index 80+ times This explains why I guessed it wrong: when you write only, not read back, of course index-helper is not required. And with split-index you only write as many entries as you touch (which is usually a small number compared to worktree's size), so writing 80+ times with split-index is a lot faster than write 80+ times with whole index. But why write so many times when nobody reads it? We only need to write before git-apply exits, either after successfully applying the whole series, or after it stops at conflicts, and maybe even at die() and SIGINT. Yes if git-apply segfaults, then the index update is lost, but in such a case, it's usually a good idea to restart fresh anyway. When you only write index once (or twice) it won't matter if split-index is used. [1] http://article.gmane.org/gmane.comp.version-control.git/292001 -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On Mon, Apr 25, 2016 at 12:42 AM, Ramsay Joneswrote: > > > On 24/04/16 17:56, Christian Couder wrote: >> On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder >> wrote: >>> On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones >>> wrote: On 24/04/16 14:33, Christian Couder wrote: > This is a patch series about libifying `git apply` functionality, and > using this libified functionality in `git am`, so that no 'git apply' > process is spawn anymore. This makes `git am` significantly faster, so > `git rebase`, when it uses the am backend, is also significantly > faster. I just tried to git-am these patches, but patch #78 did not make it to the list. >>> >>> That's strange because gmail tells me it has been sent, and it made it >>> to chrisc...@tuxfamily.org. #78 moves 4000k lines around and ends up ~260k in size, I think it may have hit vger limit. >> Instead of waiting for the patch to appear on the list, you might want >> to use branch libify-apply-use-in-am25 in my GitHub repo: >> >> https://github.com/chriscool/git/commits/libify-apply-use-in-am25 > > Hmm, that branch doesn't correspond directly to the patches you sent > out (there are 86 commits, some marked with draft. I think commit d13d2ac > corresponds kinda to patch #83, but ). > > I think I'll wait to see the patches as you intend them to be seen. ;-) I git-am'd the series then compared with the rebased version of libify-apply-use-in-am25 on master. 33198a1 (i.e. libify-apply-use-in-am25^) matches what was sent in content (didn't compare commit messages). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On 24/04/16 17:56, Christian Couder wrote: > On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder >wrote: >> On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones >> wrote: >>> >>> >>> On 24/04/16 14:33, Christian Couder wrote: This is a patch series about libifying `git apply` functionality, and using this libified functionality in `git am`, so that no 'git apply' process is spawn anymore. This makes `git am` significantly faster, so `git rebase`, when it uses the am backend, is also significantly faster. >>> >>> I just tried to git-am these patches, but patch #78 did not make it >>> to the list. >> >> That's strange because gmail tells me it has been sent, and it made it >> to chrisc...@tuxfamily.org. > > Instead of waiting for the patch to appear on the list, you might want > to use branch libify-apply-use-in-am25 in my GitHub repo: > > https://github.com/chriscool/git/commits/libify-apply-use-in-am25 Hmm, that branch doesn't correspond directly to the patches you sent out (there are 86 commits, some marked with draft. I think commit d13d2ac corresponds kinda to patch #83, but ). I think I'll wait to see the patches as you intend them to be seen. ;-) ATB, Ramsay Jones -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On Sun, Apr 24, 2016 at 6:27 PM, Christian Couderwrote: > On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones > wrote: >> >> >> On 24/04/16 14:33, Christian Couder wrote: >>> This is a patch series about libifying `git apply` functionality, and >>> using this libified functionality in `git am`, so that no 'git apply' >>> process is spawn anymore. This makes `git am` significantly faster, so >>> `git rebase`, when it uses the am backend, is also significantly >>> faster. >> >> I just tried to git-am these patches, but patch #78 did not make it >> to the list. > > That's strange because gmail tells me it has been sent, and it made it > to chrisc...@tuxfamily.org. Instead of waiting for the patch to appear on the list, you might want to use branch libify-apply-use-in-am25 in my GitHub repo: https://github.com/chriscool/git/commits/libify-apply-use-in-am25 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Joneswrote: > > > On 24/04/16 14:33, Christian Couder wrote: >> This is a patch series about libifying `git apply` functionality, and >> using this libified functionality in `git am`, so that no 'git apply' >> process is spawn anymore. This makes `git am` significantly faster, so >> `git rebase`, when it uses the am backend, is also significantly >> faster. > > I just tried to git-am these patches, but patch #78 did not make it > to the list. That's strange because gmail tells me it has been sent, and it made it to chrisc...@tuxfamily.org. I use send-email and smtp.gmail.com. Just after sending patch #78 the connection to smtp.gmail.com was closed with: 4.7.0 Try again later, closing connection. (MAIL) j6sm6717101wjb.29 - gsmtp Then I had to wait a few minutes before I could send the last patches. > (Also, patches #59 and #62 both issue a 'new blank line at EOF' warning). Ok, I will take a look at that. Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/83] libify apply and use lib in am
On 24/04/16 14:33, Christian Couder wrote: > This is a patch series about libifying `git apply` functionality, and > using this libified functionality in `git am`, so that no 'git apply' > process is spawn anymore. This makes `git am` significantly faster, so > `git rebase`, when it uses the am backend, is also significantly > faster. I just tried to git-am these patches, but patch #78 did not make it to the list. (Also, patches #59 and #62 both issue a 'new blank line at EOF' warning). ATB, Ramsay Jones -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/83] libify apply and use lib in am
This is a patch series about libifying `git apply` functionality, and using this libified functionality in `git am`, so that no 'git apply' process is spawn anymore. This makes `git am` significantly faster, so `git rebase`, when it uses the am backend, is also significantly faster. This has initially been discussed in the following thread: http://thread.gmane.org/gmane.comp.version-control.git/287236/ A RFC patch series was sent previously that only got rid of the global variables and refactored the code around a bit: http://thread.gmane.org/gmane.comp.version-control.git/288489/ This new patch series is built on top of that previous work, so patches 1/83 to 47/83 are from the previous RFC patch series and patches after that are new. Sorry if this patch series is a bit long. I can split it into two or more series if it is prefered. The benefits are not just related to not creating new processes. When `git am` launched a `git apply` process, this new process had to read the index from disk. Then after the `git apply`process had terminated, `git am` dropped its index and read the index from disk to get the index that had been modified by the `git apply`process. This was inefficient and also prevented the split-index mechanism to provide many performance benefits. Performance numbers: - A few days ago Ævar did a huge many-hundred commit rebase on the kernel with untracked cache. command: git rebase --onto 1993b17 52bef0c 29dde7c Vanilla "next" without split index:1m54.953s Vanilla "next" with split index: 1m22.476s This series on top of "next" without split index: 1m12.034s This series on top of "next" with split index: 0m15.678s Ævar used his Debian laptop with SSD. - Some days ago I tested rebasing 13 commits in Booking.com's monorepo on a Red Hat 6.5 server with split-index and GIT_TRACE_PERFORMANCE=1. With Git v2.8.0, the rebase took 6.375888383 s, with the git am command launched by the rebase command taking 3.705677431 s. With this series on top of next, the rebase took 3.044529494 s, with the git am command launched by the rebase command taking 0.583521168 s. No tests on Windows have been performed, but it could be interesting to test on this platform. Christian Couder (83): builtin/apply: make gitdiff_verify_name() return void builtin/apply: avoid parameter shadowing 'p_value' global builtin/apply: avoid parameter shadowing 'linenr' global builtin/apply: avoid local variable shadowing 'len' parameter builtin/apply: extract line_by_line_fuzzy_match() from match_fragment() builtin/apply: move 'options' variable into cmd_apply() builtin/apply: introduce 'struct apply_state' to start libifying builtin/apply: move 'unidiff_zero' global into 'struct apply_state' builtin/apply: move 'check' global into 'struct apply_state' builtin/apply: move 'check_index' global into 'struct apply_state' builtin/apply: move 'apply_in_reverse' global into 'struct apply_state' builtin/apply: move 'apply_with_reject' global into 'struct apply_state' builtin/apply: move 'apply_verbosely' global into 'struct apply_state' builtin/apply: move 'update_index' global into 'struct apply_state' builtin/apply: move 'allow_overlap' global into 'struct apply_state' builtin/apply: move 'cached' global into 'struct apply_state' builtin/apply: move 'diffstat' global into 'struct apply_state' builtin/apply: move 'numstat' global into 'struct apply_state' builtin/apply: move 'summary' global into 'struct apply_state' builtin/apply: move 'threeway' global into 'struct apply_state' builtin/apply: move 'no-add' global into 'struct apply_state' builtin/apply: move 'unsafe_paths' global into 'struct apply_state' builtin/apply: move 'line_termination' global into 'struct apply_state' builtin/apply: move 'fake_ancestor' global into 'struct apply_state' builtin/apply: move 'p_context' global into 'struct apply_state' builtin/apply: move 'apply' global into 'struct apply_state' builtin/apply: move 'read_stdin' global into cmd_apply() builtin/apply: move 'patch_input_file' global into 'struct apply_state' builtin/apply: move 'limit_by_name' global into 'struct apply_state' builtin/apply: move 'has_include' global into 'struct apply_state' builtin/apply: move 'p_value' global into 'struct apply_state' builtin/apply: move 'p_value_known' global into 'struct apply_state' builtin/apply: move 'root' global into 'struct apply_state' builtin/apply: move 'whitespace_error' global into 'struct apply_state' builtin/apply: move 'whitespace_option' into 'struct apply_state' builtin/apply: remove whitespace_option arg from set_default_whitespace_mode() builtin/apply: move 'squelch_whitespace_errors' into 'struct apply_state' builtin/apply: move 'applied_after_fixing_ws' into 'struct apply_state' builtin/apply: move 'ws_error_action' into 'struct apply_state' builtin/apply: move