Re: [PATCH 00/83] libify apply and use lib in am

2016-04-25 Thread Johannes Schindelin
Hi Chris,

On Sun, 24 Apr 2016, Christian Couder wrote:

> On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder
>  wrote:
> > On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones
> >  wrote:
> >>
> >>
> >> On 24/04/16 14:33, Christian Couder wrote:
> >>> This is a patch series about libifying `git apply` functionality, and
> >>> using this libified functionality in `git am`, so that no 'git apply'
> >>> process is spawn anymore. This makes `git am` significantly faster, so
> >>> `git rebase`, when it uses the am backend, is also significantly
> >>> faster.
> >>
> >> I just tried to git-am these patches, but patch #78 did not make it
> >> to the list.
> >
> > That's strange because gmail tells me it has been sent, and it made it
> > to chrisc...@tuxfamily.org.
> 
> Instead of waiting for the patch to appear on the list, you might want
> to use branch libify-apply-use-in-am25 in my GitHub repo:
> 
> https://github.com/chriscool/git/commits/libify-apply-use-in-am25

Thanks for this. In particular with longer patch series, I find mail-based
patch submissions *really* cumbersome, not only on the submitter's side
but also on the consumers' side.

I wonder, however, why you use numbers in the branch name to version
things? I thought Git allowed for more advanced versioning than that...

:-)

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-25 Thread Duy Nguyen
On Mon, Apr 25, 2016 at 4:57 PM, Christian Couder
 wrote:
>> But why write so many times when nobody reads it? We only need to
>> write before git-apply exits,
>
> You mean `git am` here I think.
>
>> either after successfully applying the
>> whole series, or after it stops at conflicts, and maybe even at die()
>> and SIGINT. Yes if git-apply segfaults,
>
> Here too.

Yep it's git-am. I didn't read the series, I simply ran and misread
the traces a bit.

>> then the index update is lost,
>> but in such a case, it's usually a good idea to restart fresh anyway.
>> When you only write index once (or twice) it won't matter if
>> split-index is used.
>
> Yeah I agree, but it would need further work, that can be done after
> this series is merged.

Sure.

> And I am not sure if the potential gains on a typical rebase would be worth 
> it.

I didn't point it out, but in pathological cases where your patch
series touches a lot of (or even every) files in the worktree, the
gain from split-index lowers and could even disappear. I don't know
how often that can happen in real life though.

Also, if you start to use split-index often, please note that I
haven't addressed the sharedindex.* pruning part (it's labeled
"experimental" for a reason), you may have to un-split the index and
rm $GIT_DIR/sharedindex.* manually from time to time to keep disk
usage down.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-25 Thread Christian Couder
On Mon, Apr 25, 2016 at 11:02 AM, Duy Nguyen  wrote:
> On Sun, Apr 24, 2016 at 8:33 PM, Christian Couder
>  wrote:
>> Sorry if this patch series is a bit long. I can split it into two or
>> more series if it is prefered.
>
> I suspect you deliberately made the series long just to show off how
> good new git-rebase is ;-)

Yeah, and `git am` too :-)

>> Performance numbers:
>>
>>   - A few days ago Ævar did a huge many-hundred commit rebase on the
>> kernel with untracked cache.
>>
>> command: git rebase --onto 1993b17 52bef0c 29dde7c
>>
>> Vanilla "next" without split index:1m54.953s
>> Vanilla "next" with split index:   1m22.476s
>> This series on top of "next" without split index:  1m12.034s
>> This series on top of "next" with split index: 0m15.678s
>
> I was a bit puzzled why split-index helped so much. It shouldn't have
> in my opinion unless you paired it with index-helper, to cut down both
> read and write time. So I ran some tests. Long story short, I think we
> can achieve this gain (and a little more) even without split-index.

Yeah, perhaps. For now though Ævar and myself would be happy to just
have a config option for split-index :-)

The other performance numbers I mentioned show that now the `git am`
part of a 13 commit long rebase is reduced from 58% to 19% of the
whole rebase time. So further improvements in speeding up `git am`
could only make such a rebase at most 19% faster.

> I ran my measurement patch [1] with and without your series used this
> series as rebase material. Without the series, the picture is not so
> surprising. We run git-apply 80+ times, each consists of this sequence
>
> read index
> write index (cache tree updates only)
> read index again
> optionally initialize name hash (when new entries are added, I guess)
> read packed-refs
> write index
>
> With this series, we run a single git-apply which does
>
> read index (and sharedindex too if in split-index mode)
> initialize name hash
> write index 80+ times
>
> This explains why I guessed it wrong: when you write only, not read
> back, of course index-helper is not required. And with split-index you
> only write as many entries as you touch (which is usually a small
> number compared to worktree's size), so writing 80+ times with
> split-index is a lot faster than write 80+ times with whole index.

Yeah, I tried to explain in the cover letter and in the last patch why
this series enables split-index to give great results, but I think you
are much better at explaining it.

> But why write so many times when nobody reads it? We only need to
> write before git-apply exits,

You mean `git am` here I think.

> either after successfully applying the
> whole series, or after it stops at conflicts, and maybe even at die()
> and SIGINT. Yes if git-apply segfaults,

Here too.

> then the index update is lost,
> but in such a case, it's usually a good idea to restart fresh anyway.
> When you only write index once (or twice) it won't matter if
> split-index is used.

Yeah I agree, but it would need further work, that can be done after
this series is merged.
And I am not sure if the potential gains on a typical rebase would be worth it.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-25 Thread Christian Couder
On Mon, Apr 25, 2016 at 2:14 AM, Duy Nguyen  wrote:
> On Mon, Apr 25, 2016 at 12:42 AM, Ramsay Jones
>  wrote:
>>
>>
>> On 24/04/16 17:56, Christian Couder wrote:
>>> On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder
>>>  wrote:
 On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones
  wrote:
>
> I just tried to git-am these patches, but patch #78 did not make it
> to the list.

 That's strange because gmail tells me it has been sent, and it made it
 to chrisc...@tuxfamily.org.
>
> #78 moves 4000k lines around and ends up ~260k in size, I think it may
> have hit vger limit.

Yeah probably. Thanks for looking at this.

I think I will have to split it into smaller patches in a v2.

>>> Instead of waiting for the patch to appear on the list, you might want
>>> to use branch libify-apply-use-in-am25 in my GitHub repo:
>>>
>>> https://github.com/chriscool/git/commits/libify-apply-use-in-am25
>>
>> Hmm, that branch doesn't correspond directly to the patches you sent
>> out (there are 86 commits, some marked with draft. I think commit d13d2ac
>> corresponds kinda to patch #83, but  ).
>>
>> I think I'll wait to see the patches as you intend them to be seen. ;-)
>
> I git-am'd the series then compared with the rebased version of
> libify-apply-use-in-am25 on master. 33198a1 (i.e.
> libify-apply-use-in-am25^) matches what was sent in content (didn't
> compare commit messages).

Thanks for checking.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-25 Thread Duy Nguyen
On Sun, Apr 24, 2016 at 8:33 PM, Christian Couder
 wrote:
> Sorry if this patch series is a bit long. I can split it into two or
> more series if it is prefered.

I suspect you deliberately made the series long just to show off how
good new git-rebase is ;-)

> Performance numbers:
>
>   - A few days ago Ævar did a huge many-hundred commit rebase on the
> kernel with untracked cache.
>
> command: git rebase --onto 1993b17 52bef0c 29dde7c
>
> Vanilla "next" without split index:1m54.953s
> Vanilla "next" with split index:   1m22.476s
> This series on top of "next" without split index:  1m12.034s
> This series on top of "next" with split index: 0m15.678s

I was a bit puzzled why split-index helped so much. It shouldn't have
in my opinion unless you paired it with index-helper, to cut down both
read and write time. So I ran some tests. Long story short, I think we
can achieve this gain (and a little more) even without split-index.

I ran my measurement patch [1] with and without your series used this
series as rebase material. Without the series, the picture is not so
surprising. We run git-apply 80+ times, each consists of this sequence

read index
write index (cache tree updates only)
read index again
optionally initialize name hash (when new entries are added, I guess)
read packed-refs
write index

With this series, we run a single git-apply which does

read index (and sharedindex too if in split-index mode)
initialize name hash
write index 80+ times

This explains why I guessed it wrong: when you write only, not read
back, of course index-helper is not required. And with split-index you
only write as many entries as you touch (which is usually a small
number compared to worktree's size), so writing 80+ times with
split-index is a lot faster than write 80+ times with whole index.

But why write so many times when nobody reads it? We only need to
write before git-apply exits, either after successfully applying the
whole series, or after it stops at conflicts, and maybe even at die()
and SIGINT. Yes if git-apply segfaults, then the index update is lost,
but in such a case, it's usually a good idea to restart fresh anyway.
When you only write index once (or twice) it won't matter if
split-index is used.

[1] http://article.gmane.org/gmane.comp.version-control.git/292001
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-24 Thread Duy Nguyen
On Mon, Apr 25, 2016 at 12:42 AM, Ramsay Jones
 wrote:
>
>
> On 24/04/16 17:56, Christian Couder wrote:
>> On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder
>>  wrote:
>>> On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones
>>>  wrote:


 On 24/04/16 14:33, Christian Couder wrote:
> This is a patch series about libifying `git apply` functionality, and
> using this libified functionality in `git am`, so that no 'git apply'
> process is spawn anymore. This makes `git am` significantly faster, so
> `git rebase`, when it uses the am backend, is also significantly
> faster.

 I just tried to git-am these patches, but patch #78 did not make it
 to the list.
>>>
>>> That's strange because gmail tells me it has been sent, and it made it
>>> to chrisc...@tuxfamily.org.

#78 moves 4000k lines around and ends up ~260k in size, I think it may
have hit vger limit.

>> Instead of waiting for the patch to appear on the list, you might want
>> to use branch libify-apply-use-in-am25 in my GitHub repo:
>>
>> https://github.com/chriscool/git/commits/libify-apply-use-in-am25
>
> Hmm, that branch doesn't correspond directly to the patches you sent
> out (there are 86 commits, some marked with draft. I think commit d13d2ac
> corresponds kinda to patch #83, but  ).
>
> I think I'll wait to see the patches as you intend them to be seen. ;-)

I git-am'd the series then compared with the rebased version of
libify-apply-use-in-am25 on master. 33198a1 (i.e.
libify-apply-use-in-am25^) matches what was sent in content (didn't
compare commit messages).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-24 Thread Ramsay Jones


On 24/04/16 17:56, Christian Couder wrote:
> On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder
>  wrote:
>> On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones
>>  wrote:
>>>
>>>
>>> On 24/04/16 14:33, Christian Couder wrote:
 This is a patch series about libifying `git apply` functionality, and
 using this libified functionality in `git am`, so that no 'git apply'
 process is spawn anymore. This makes `git am` significantly faster, so
 `git rebase`, when it uses the am backend, is also significantly
 faster.
>>>
>>> I just tried to git-am these patches, but patch #78 did not make it
>>> to the list.
>>
>> That's strange because gmail tells me it has been sent, and it made it
>> to chrisc...@tuxfamily.org.
> 
> Instead of waiting for the patch to appear on the list, you might want
> to use branch libify-apply-use-in-am25 in my GitHub repo:
> 
> https://github.com/chriscool/git/commits/libify-apply-use-in-am25

Hmm, that branch doesn't correspond directly to the patches you sent
out (there are 86 commits, some marked with draft. I think commit d13d2ac
corresponds kinda to patch #83, but  ).

I think I'll wait to see the patches as you intend them to be seen. ;-)

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-24 Thread Christian Couder
On Sun, Apr 24, 2016 at 6:27 PM, Christian Couder
 wrote:
> On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones
>  wrote:
>>
>>
>> On 24/04/16 14:33, Christian Couder wrote:
>>> This is a patch series about libifying `git apply` functionality, and
>>> using this libified functionality in `git am`, so that no 'git apply'
>>> process is spawn anymore. This makes `git am` significantly faster, so
>>> `git rebase`, when it uses the am backend, is also significantly
>>> faster.
>>
>> I just tried to git-am these patches, but patch #78 did not make it
>> to the list.
>
> That's strange because gmail tells me it has been sent, and it made it
> to chrisc...@tuxfamily.org.

Instead of waiting for the patch to appear on the list, you might want
to use branch libify-apply-use-in-am25 in my GitHub repo:

https://github.com/chriscool/git/commits/libify-apply-use-in-am25
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-24 Thread Christian Couder
On Sun, Apr 24, 2016 at 5:23 PM, Ramsay Jones
 wrote:
>
>
> On 24/04/16 14:33, Christian Couder wrote:
>> This is a patch series about libifying `git apply` functionality, and
>> using this libified functionality in `git am`, so that no 'git apply'
>> process is spawn anymore. This makes `git am` significantly faster, so
>> `git rebase`, when it uses the am backend, is also significantly
>> faster.
>
> I just tried to git-am these patches, but patch #78 did not make it
> to the list.

That's strange because gmail tells me it has been sent, and it made it
to chrisc...@tuxfamily.org.

I use send-email and smtp.gmail.com. Just after sending patch #78 the
connection to smtp.gmail.com was closed with:

4.7.0 Try again later, closing connection. (MAIL) j6sm6717101wjb.29 - gsmtp

Then I had to wait a few minutes before I could send the last patches.

> (Also, patches #59 and #62 both issue a 'new blank line at EOF' warning).

Ok, I will take a look at that.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/83] libify apply and use lib in am

2016-04-24 Thread Ramsay Jones


On 24/04/16 14:33, Christian Couder wrote:
> This is a patch series about libifying `git apply` functionality, and
> using this libified functionality in `git am`, so that no 'git apply'
> process is spawn anymore. This makes `git am` significantly faster, so
> `git rebase`, when it uses the am backend, is also significantly
> faster.

I just tried to git-am these patches, but patch #78 did not make it
to the list.

(Also, patches #59 and #62 both issue a 'new blank line at EOF' warning).

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/83] libify apply and use lib in am

2016-04-24 Thread Christian Couder
This is a patch series about libifying `git apply` functionality, and
using this libified functionality in `git am`, so that no 'git apply'
process is spawn anymore. This makes `git am` significantly faster, so
`git rebase`, when it uses the am backend, is also significantly
faster.

This has initially been discussed in the following thread:

http://thread.gmane.org/gmane.comp.version-control.git/287236/

A RFC patch series was sent previously that only got rid of the global
variables and refactored the code around a bit:

http://thread.gmane.org/gmane.comp.version-control.git/288489/

This new patch series is built on top of that previous work, so
patches 1/83 to 47/83 are from the previous RFC patch series and
patches after that are new.

Sorry if this patch series is a bit long. I can split it into two or
more series if it is prefered.

The benefits are not just related to not creating new processes. When
`git am` launched a `git apply` process, this new process had to read
the index from disk. Then after the `git apply`process had terminated,
`git am` dropped its index and read the index from disk to get the
index that had been modified by the `git apply`process. This was
inefficient and also prevented the split-index mechanism to provide
many performance benefits.

Performance numbers:

  - A few days ago Ævar did a huge many-hundred commit rebase on the
kernel with untracked cache.

command: git rebase --onto 1993b17 52bef0c 29dde7c

Vanilla "next" without split index:1m54.953s
Vanilla "next" with split index:   1m22.476s
This series on top of "next" without split index:  1m12.034s
This series on top of "next" with split index: 0m15.678s

Ævar used his Debian laptop with SSD.

  - Some days ago I tested rebasing 13 commits in Booking.com's
monorepo on a Red Hat 6.5 server with split-index and
GIT_TRACE_PERFORMANCE=1.

With Git v2.8.0, the rebase took 6.375888383 s, with the git am
command launched by the rebase command taking 3.705677431 s.

With this series on top of next, the rebase took 3.044529494 s, with
the git am command launched by the rebase command taking 0.583521168
s.

No tests on Windows have been performed, but it could be interesting
to test on this platform.

Christian Couder (83):
  builtin/apply: make gitdiff_verify_name() return void
  builtin/apply: avoid parameter shadowing 'p_value' global
  builtin/apply: avoid parameter shadowing 'linenr' global
  builtin/apply: avoid local variable shadowing 'len' parameter
  builtin/apply: extract line_by_line_fuzzy_match() from
match_fragment()
  builtin/apply: move 'options' variable into cmd_apply()
  builtin/apply: introduce 'struct apply_state' to start libifying
  builtin/apply: move 'unidiff_zero' global into 'struct apply_state'
  builtin/apply: move 'check' global into 'struct apply_state'
  builtin/apply: move 'check_index' global into 'struct apply_state'
  builtin/apply: move 'apply_in_reverse' global into 'struct
apply_state'
  builtin/apply: move 'apply_with_reject' global into 'struct
apply_state'
  builtin/apply: move 'apply_verbosely' global into 'struct apply_state'
  builtin/apply: move 'update_index' global into 'struct apply_state'
  builtin/apply: move 'allow_overlap' global into 'struct apply_state'
  builtin/apply: move 'cached' global into 'struct apply_state'
  builtin/apply: move 'diffstat' global into 'struct apply_state'
  builtin/apply: move 'numstat' global into 'struct apply_state'
  builtin/apply: move 'summary' global into 'struct apply_state'
  builtin/apply: move 'threeway' global into 'struct apply_state'
  builtin/apply: move 'no-add' global into 'struct apply_state'
  builtin/apply: move 'unsafe_paths' global into 'struct apply_state'
  builtin/apply: move 'line_termination' global into 'struct
apply_state'
  builtin/apply: move 'fake_ancestor' global into 'struct apply_state'
  builtin/apply: move 'p_context' global into 'struct apply_state'
  builtin/apply: move 'apply' global into 'struct apply_state'
  builtin/apply: move 'read_stdin' global into cmd_apply()
  builtin/apply: move 'patch_input_file' global into 'struct
apply_state'
  builtin/apply: move 'limit_by_name' global into 'struct apply_state'
  builtin/apply: move 'has_include' global into 'struct apply_state'
  builtin/apply: move 'p_value' global into 'struct apply_state'
  builtin/apply: move 'p_value_known' global into 'struct apply_state'
  builtin/apply: move 'root' global into 'struct apply_state'
  builtin/apply: move 'whitespace_error' global into 'struct
apply_state'
  builtin/apply: move 'whitespace_option' into 'struct apply_state'
  builtin/apply: remove whitespace_option arg from
set_default_whitespace_mode()
  builtin/apply: move 'squelch_whitespace_errors' into 'struct
apply_state'
  builtin/apply: move 'applied_after_fixing_ws' into 'struct
apply_state'
  builtin/apply: move 'ws_error_action' into 'struct apply_state'
  builtin/apply: move