Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-03-02 Thread Junio C Hamano
Jeff Hostetler  writes:

> Sorry,  $DAYJOB got in the way (again).
>
> This is still on my short-list of things to take care of.
> I should have something for you next week.

That's perfectly OK.  I just wanted a newer articule in my
newsreader I can bookmark so that I won't forget ;-) No hurries.



RE: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-03-02 Thread Jeff Hostetler
Sorry,  $DAYJOB got in the way (again).

This is still on my short-list of things to take care of.
I should have something for you next week.

Thanks again,
Jeff


-Original Message-
From: Junio C Hamano [mailto:gits...@pobox.com] 
Sent: Thursday, March 2, 2017 4:12 PM
To: Jeff Hostetler 
Cc: Jeff King ; Johannes Schindelin 
; git@vger.kernel.org; Jeff Hostetler 

Subject: Re: [PATCH 0/5] A series of performance enhancements in the memihash 
and name-cache area

Jeff Hostetler  writes:

> On 2/14/2017 5:03 PM, Jeff King wrote:
>> On Tue, Feb 14, 2017 at 12:31:46PM +0100, Johannes Schindelin wrote:
>>
>>> On Windows, calls to memihash() and maintaining the istate.name_hash and
>>> istate.dir_hash HashMaps take significant time on very large
>>> repositories. This series of changes reduces the overall time taken for
>>> various operations by reducing the number calls to memihash(), moving
>>> some of them into multi-threaded code, and etc.
>>>
>>> Note: one commenter in 
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgit-for-windows%2Fgit%2Fpull%2F964=02%7C01%7CJeff.Hostetler%40microsoft.com%7C1d493f3031f74657f29308d461b0be80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636240859121403139=16RQH1%2BrDonsanClb3%2Fwue7pcy9l7cUq9lDJenqCgbE%3D=0
>>> pointed out that memihash() only handles ASCII correctly. That is true.
>>> And fixing this is outside the purview of this patch series.
>> Out of curiosity, do you have numbers? Bonus points if the speedup can
>> be shown via a t/perf script.
>>
>> We have a read-cache perf-test already, but I suspect you'd want
>> something more like "git status" or "ls-files -o" that calls into
>> read_directory().
>
> I have some informal numbers in a spreadsheet.  I was seeing
> a 8-9% speed up on a status on my gigantic repo.
>
> I'll try to put together a before/after perf-test to better
> demonstrate this.

Ping?  I do not think there is anything wrong with the changes from
correctness point of view, but as the series is about performance,
it somewhat feels pointless to merge to 'next' without mentioning
the actual numbers.  

It might be sufficient to mention the rough numbers in the log
messages, if additions to t/perf/ are too cumbersome to arrange.




Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-03-02 Thread Junio C Hamano
Jeff Hostetler  writes:

> On 2/14/2017 5:03 PM, Jeff King wrote:
>> On Tue, Feb 14, 2017 at 12:31:46PM +0100, Johannes Schindelin wrote:
>>
>>> On Windows, calls to memihash() and maintaining the istate.name_hash and
>>> istate.dir_hash HashMaps take significant time on very large
>>> repositories. This series of changes reduces the overall time taken for
>>> various operations by reducing the number calls to memihash(), moving
>>> some of them into multi-threaded code, and etc.
>>>
>>> Note: one commenter in https://github.com/git-for-windows/git/pull/964
>>> pointed out that memihash() only handles ASCII correctly. That is true.
>>> And fixing this is outside the purview of this patch series.
>> Out of curiosity, do you have numbers? Bonus points if the speedup can
>> be shown via a t/perf script.
>>
>> We have a read-cache perf-test already, but I suspect you'd want
>> something more like "git status" or "ls-files -o" that calls into
>> read_directory().
>
> I have some informal numbers in a spreadsheet.  I was seeing
> a 8-9% speed up on a status on my gigantic repo.
>
> I'll try to put together a before/after perf-test to better
> demonstrate this.

Ping?  I do not think there is anything wrong with the changes from
correctness point of view, but as the series is about performance,
it somewhat feels pointless to merge to 'next' without mentioning
the actual numbers.  

It might be sufficient to mention the rough numbers in the log
messages, if additions to t/perf/ are too cumbersome to arrange.




Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-19 Thread Junio C Hamano
Jeff Hostetler  writes:

>> But the other Jeff sounded like a follow-up was to follow shortly if
>> not imminent so I decided to allocate my time on other topics still
>> only on the list first while waiting to see what happens.
>
> Sorry, I was out of the office for a family emergency on Thursday
> and Friday.  Add to that the long weekend, and I won't get back around
> to this until Tuesday or Wednesday at the earliest.

The open source process makes progress at the pace of its
participants, and it is expected that some topics come fast while
others don't.

Hope things are all OK for you and your family now.

Thanks.



RE: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-18 Thread Jeff Hostetler


From: Junio C Hamano [mailto:jch2...@gmail.com] On Behalf Of Junio C Hamano
> Jeff King  writes:
>> On Wed, Feb 15, 2017 at 09:27:53AM -0500, Jeff Hostetler wrote:
>>
>>> I have some informal numbers in a spreadsheet.  I was seeing
>>> a 8-9% speed up on a status on my gigantic repo.
>>> 
>>> I'll try to put together a before/after perf-test to better
>>> demonstrate this.
>>
>> Thanks. What I'm mostly curious about is how much each individual step
>> buys. Sometimes when doing a long optimization series, I find that some
>> of the optimizations make other ones somewhat redundant (e.g., if patch
>> 2 causes us to call the optimized code from patch 3 less often).
>
> I am curious too.
>
> To me 1/5 (reduction of redundant calls), 4/5 (correctly size the
> hash that would grow to a known size anyway) and 5/5 (take advantage
> of the fact that adjacent cache entries are often in the same
> directory) look like no brainers to take, regardless of the others
> (including themselves). 

agreed.

> It is not clear to me if 3/5 (preload-index uses available cores to
> compute hashes) is an unconditional win (an operation that is
> pathspec limited may need hashes for only a small fraction of the
> index---would it still be a win to compute the hash for all entries
> upon loading the index, even if we are using otherwise-idel cores?).

I'm not sure about pathspec cases.  What I was seeing was that during
the call to lazy_name_init_hash() was taking 30% of the time in
"git status" and 40% in "git add ".  (Again this was on my
giant repo with a 450MB index).
 
> Of course 2/5 is a prerequisite step for 3/5 and 5/5, so if we want
> either of the latter two, we cannot avoid it.

jeff



RE: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-18 Thread Jeff Hostetler
> Jeff King  writes:
> 
>> On Fri, Feb 17, 2017 at 09:58:21PM -0800, Junio C Hamano wrote:
>>
>>> Jeff Hostetler  writes:
>>> 
>>> > I'll try to put together a before/after perf-test to better
>>> > demonstrate this.
>>> 
>>> I didn't pick up the series while watching these exchanges, as I
>>> didn't know how quick your turnaround would be, but now a few days
>>> have passed.  Just to make sure we won't forget this topic, I'll
>>> pick up these 5 patches in the meantime.
>>
>> Yeah, to be clear my question was not an objection, but mostly
>> curiosity and interest.
>
> Yes, it was very clear that it wasn't an objection.
> 
> But the other Jeff sounded like a follow-up was to follow shortly if
> not imminent so I decided to allocate my time on other topics still
> only on the list first while waiting to see what happens.

Sorry, I was out of the office for a family emergency on Thursday
and Friday.  Add to that the long weekend, and I won't get back around
to this until Tuesday or Wednesday at the earliest.

Jeff



 


Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-18 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Feb 17, 2017 at 09:58:21PM -0800, Junio C Hamano wrote:
>
>> Jeff Hostetler  writes:
>> 
>> > I'll try to put together a before/after perf-test to better
>> > demonstrate this.
>> 
>> I didn't pick up the series while watching these exchanges, as I
>> didn't know how quick your turnaround would be, but now a few days
>> have passed.  Just to make sure we won't forget this topic, I'll
>> pick up these 5 patches in the meantime.
>
> Yeah, to be clear my question was not an objection, but mostly
> curiosity and interest.

Yes, it was very clear that it wasn't an objection.

But the other Jeff sounded like a follow-up was to follow shortly if
not imminent so I decided to allocate my time on other topics still
only on the list first while waiting to see what happens.



Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-17 Thread Jeff King
On Fri, Feb 17, 2017 at 09:58:21PM -0800, Junio C Hamano wrote:

> Jeff Hostetler  writes:
> 
> > I'll try to put together a before/after perf-test to better
> > demonstrate this.
> 
> I didn't pick up the series while watching these exchanges, as I
> didn't know how quick your turnaround would be, but now a few days
> have passed.  Just to make sure we won't forget this topic, I'll
> pick up these 5 patches in the meantime.

Yeah, to be clear my question was not an objection, but mostly
curiosity and interest.

-Peff


Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-17 Thread Junio C Hamano
Jeff Hostetler  writes:

> I'll try to put together a before/after perf-test to better
> demonstrate this.

I didn't pick up the series while watching these exchanges, as I
didn't know how quick your turnaround would be, but now a few days
have passed.  Just to make sure we won't forget this topic, I'll
pick up these 5 patches in the meantime.

Thanks.


Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-17 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Feb 15, 2017 at 09:27:53AM -0500, Jeff Hostetler wrote:
>
>> I have some informal numbers in a spreadsheet.  I was seeing
>> a 8-9% speed up on a status on my gigantic repo.
>> 
>> I'll try to put together a before/after perf-test to better
>> demonstrate this.
>
> Thanks. What I'm mostly curious about is how much each individual step
> buys. Sometimes when doing a long optimization series, I find that some
> of the optimizations make other ones somewhat redundant (e.g., if patch
> 2 causes us to call the optimized code from patch 3 less often).

I am curious too.

To me 1/5 (reduction of redundant calls), 4/5 (correctly size the
hash that would grow to a known size anyway) and 5/5 (take advantage
of the fact that adjacent cache entries are often in the same
directory) look like no brainers to take, regardless of the others
(including themselves).

It is not clear to me if 3/5 (preload-index uses available cores to
compute hashes) is an unconditional win (an operation that is
pathspec limited may need hashes for only a small fraction of the
index---would it still be a win to compute the hash for all entries
upon loading the index, even if we are using otherwise-idel cores?).

Of course 2/5 is a prerequisite step for 3/5 and 5/5, so if we want
either of the latter two, we cannot avoid it.


Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-15 Thread Jeff King
On Wed, Feb 15, 2017 at 09:27:53AM -0500, Jeff Hostetler wrote:

> I have some informal numbers in a spreadsheet.  I was seeing
> a 8-9% speed up on a status on my gigantic repo.
> 
> I'll try to put together a before/after perf-test to better
> demonstrate this.

Thanks. What I'm mostly curious about is how much each individual step
buys. Sometimes when doing a long optimization series, I find that some
of the optimizations make other ones somewhat redundant (e.g., if patch
2 causes us to call the optimized code from patch 3 less often).

-Peff


Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-15 Thread Jeff Hostetler



On 2/14/2017 5:03 PM, Jeff King wrote:

On Tue, Feb 14, 2017 at 12:31:46PM +0100, Johannes Schindelin wrote:


On Windows, calls to memihash() and maintaining the istate.name_hash and
istate.dir_hash HashMaps take significant time on very large
repositories. This series of changes reduces the overall time taken for
various operations by reducing the number calls to memihash(), moving
some of them into multi-threaded code, and etc.

Note: one commenter in https://github.com/git-for-windows/git/pull/964
pointed out that memihash() only handles ASCII correctly. That is true.
And fixing this is outside the purview of this patch series.

Out of curiosity, do you have numbers? Bonus points if the speedup can
be shown via a t/perf script.

We have a read-cache perf-test already, but I suspect you'd want
something more like "git status" or "ls-files -o" that calls into
read_directory().


I have some informal numbers in a spreadsheet.  I was seeing
a 8-9% speed up on a status on my gigantic repo.

I'll try to put together a before/after perf-test to better demonstrate 
this.


Jeff



Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-02-14 Thread Jeff King
On Tue, Feb 14, 2017 at 12:31:46PM +0100, Johannes Schindelin wrote:

> On Windows, calls to memihash() and maintaining the istate.name_hash and
> istate.dir_hash HashMaps take significant time on very large
> repositories. This series of changes reduces the overall time taken for
> various operations by reducing the number calls to memihash(), moving
> some of them into multi-threaded code, and etc.
> 
> Note: one commenter in https://github.com/git-for-windows/git/pull/964
> pointed out that memihash() only handles ASCII correctly. That is true.
> And fixing this is outside the purview of this patch series.

Out of curiosity, do you have numbers? Bonus points if the speedup can
be shown via a t/perf script.

We have a read-cache perf-test already, but I suspect you'd want
something more like "git status" or "ls-files -o" that calls into
read_directory().

-Peff