Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-17 Thread Johannes Sixt

Am 17.04.2017 um 06:05 schrieb Junio C Hamano:

Johannes Sixt  writes:

This is about command line completion. We go a long way to avoid
forking processes there. What is 10x faster on Linux despite of
forking a process may not be so on Windows.


Doesn't this depend on how many paths there are?  If there are only
a few paths, the loop in shell would beat a pipe into sed even on
Linux, I suspect, and if there are tons of paths, at some number,
loop in shell would become slower than a single spawning of sed on
platforms with slower fork, no?


Absolutely. I just want to make sure a suggested change takes into 
account the situation on Windows, not only the "YE!" and "VERY 
WELL!" votes of Linux users ;)


-- Hannes



Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-16 Thread Junio C Hamano
Johannes Sixt  writes:

> Cc Gábor.
>
> Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:
>> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita  
>> wrote:
>>> This is much faster (below 0.1s):
>>>
>>> __git_index_files ()
>>> {
>>> local dir="$(__gitdir)" root="${2-.}" file;
>>> if [ -d "$dir" ]; then
>>> __git_ls_files_helper "$root" "$1" | \
>>> sed -r 's@/.*@@' | uniq | sort | uniq
>>> fi
>>> }
>>>
>>> time __git_index_files
>>>
>>> real0m0.075s
>>> user0m0.083s
>>> sys0m0.010s
>>>
>>> Most of the improvement is due to the simpler, non-grouping, regex.
>>> Since I expect most of the common prefixes to arrive consecutively,
>>> running uniq before sort also improves things a bit. I'm not removing
>>> leading double quotes anymore (this isn't being done by the current
>>> version, anyway) but this doesn't seem to hurt.
>>>
>>> Despite the dependence on sed this is ten times faster than the
>>> original, maybe an option to enable fast index completion or something
>>> like that might be desirable.
>>
>> It's fine to depend on sed, these shell-scripts are POSIX compatible,
>> and so is sed, we use sed in a lot of the built-in shellscripts.
>
> This is about command line completion. We go a long way to avoid
> forking processes there. What is 10x faster on Linux despite of
> forking a process may not be so on Windows.

Doesn't this depend on how many paths there are?  If there are only
a few paths, the loop in shell would beat a pipe into sed even on
Linux, I suspect, and if there are tons of paths, at some number,
loop in shell would become slower than a single spawning of sed on
platforms with slower fork, no?


Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-15 Thread Jacob Keller
On Sat, Apr 15, 2017 at 4:59 AM, Johannes Sixt  wrote:
> Cc Gábor.
>
> Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita 
>> wrote:
>>>
>>> This is much faster (below 0.1s):
>>>
>>> __git_index_files ()
>>> {
>>> local dir="$(__gitdir)" root="${2-.}" file;
>>> if [ -d "$dir" ]; then
>>> __git_ls_files_helper "$root" "$1" | \
>>> sed -r 's@/.*@@' | uniq | sort | uniq
>>> fi
>>> }
>>>
>>> time __git_index_files
>>>
>>> real0m0.075s
>>> user0m0.083s
>>> sys0m0.010s
>>>
>>> Most of the improvement is due to the simpler, non-grouping, regex.
>>> Since I expect most of the common prefixes to arrive consecutively,
>>> running uniq before sort also improves things a bit. I'm not removing
>>> leading double quotes anymore (this isn't being done by the current
>>> version, anyway) but this doesn't seem to hurt.
>>>
>>> Despite the dependence on sed this is ten times faster than the
>>> original, maybe an option to enable fast index completion or something
>>> like that might be desirable.
>>
>>
>> It's fine to depend on sed, these shell-scripts are POSIX compatible,
>> and so is sed, we use sed in a lot of the built-in shellscripts.
>
>
> This is about command line completion. We go a long way to avoid forking
> processes there. What is 10x faster on Linux despite of forking a process
> may not be so on Windows.
>
> (I'm not using bash command line completion on Windows, so I can't tell what
> the effect of your suggested change is on Windows. I hope Gábor can comment
> on it.)
>
> -- Hannes
>

In cases like this, might it be worth somehow splitting it so Linux
can use the best thing, and Windows can continue using what's best for
it, since it is a pretty significant advantage on Linux.

Thanks,
Jake


Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-15 Thread Johannes Sixt
Cc Gábor, resent with working email (hopefully); please follow-up on 
this mail.


Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:

On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita  wrote:

This is much faster (below 0.1s):

__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | \
sed -r 's@/.*@@' | uniq | sort | uniq
fi
}

time __git_index_files

real0m0.075s
user0m0.083s
sys0m0.010s

Most of the improvement is due to the simpler, non-grouping, regex.
Since I expect most of the common prefixes to arrive consecutively,
running uniq before sort also improves things a bit. I'm not removing
leading double quotes anymore (this isn't being done by the current
version, anyway) but this doesn't seem to hurt.

Despite the dependence on sed this is ten times faster than the
original, maybe an option to enable fast index completion or something
like that might be desirable.


It's fine to depend on sed, these shell-scripts are POSIX compatible,
and so is sed, we use sed in a lot of the built-in shellscripts.


This is about command line completion. We go a long way to avoid forking 
processes there. What is 10x faster on Linux despite of forking a 
process may not be so on Windows.


(I'm not using bash command line completion on Windows, so I can't tell 
what the effect of your suggested change is on Windows. I hope Gábor can 
comment on it.)


-- Hannes


Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-15 Thread Johannes Sixt

Cc Gábor.

Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason:

On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita  wrote:

This is much faster (below 0.1s):

__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | \
sed -r 's@/.*@@' | uniq | sort | uniq
fi
}

time __git_index_files

real0m0.075s
user0m0.083s
sys0m0.010s

Most of the improvement is due to the simpler, non-grouping, regex.
Since I expect most of the common prefixes to arrive consecutively,
running uniq before sort also improves things a bit. I'm not removing
leading double quotes anymore (this isn't being done by the current
version, anyway) but this doesn't seem to hurt.

Despite the dependence on sed this is ten times faster than the
original, maybe an option to enable fast index completion or something
like that might be desirable.


It's fine to depend on sed, these shell-scripts are POSIX compatible,
and so is sed, we use sed in a lot of the built-in shellscripts.


This is about command line completion. We go a long way to avoid forking 
processes there. What is 10x faster on Linux despite of forking a 
process may not be so on Windows.


(I'm not using bash command line completion on Windows, so I can't tell 
what the effect of your suggested change is on Windows. I hope Gábor can 
comment on it.)


-- Hannes



Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-15 Thread Junio C Hamano
Jacob Keller  writes:

> On Fri, Apr 14, 2017 at 3:33 PM, Ævar Arnfjörð Bjarmason
>  wrote:
>> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita  
>> wrote:
>>> This is much faster (below 0.1s):
>>>
>>> __git_index_files ()
>>> {
>>> local dir="$(__gitdir)" root="${2-.}" file;
>>> if [ -d "$dir" ]; then
>>> __git_ls_files_helper "$root" "$1" | \
>>> sed -r 's@/.*@@' | uniq | sort | uniq
>>> fi
>>> }
>>>
>>> time __git_index_files
>>>
>>> real0m0.075s
>>> user0m0.083s
>>> sys0m0.010s
>>>
>>> Most of the improvement is due to the simpler, non-grouping, regex.
>>> Since I expect most of the common prefixes to arrive consecutively,
>>> running uniq before sort also improves things a bit. I'm not removing
>>> leading double quotes anymore (this isn't being done by the current
>>> version, anyway) but this doesn't seem to hurt.
>>>
>>> Despite the dependence on sed this is ten times faster than the
>>> original, maybe an option to enable fast index completion or something
>>> like that might be desirable.
>>>
>>> Best regards
>>
>> It's fine to depend on sed, these shell-scripts are POSIX compatible,
>> and so is sed, we use sed in a lot of the built-in shellscripts.
>>
>> I think you should submit this as a patch, see 
>> Documentation/SubmittingPatches.
>
> Yea it should be fine to use sed.

As long as the use of "sed" is in line with POSIX.1; I do not think
you need the non-portable "-r" merely to strip out everything that
follow the first slash, so perhaps "s|-r|-e|" with the above (and do
not write backslash after pipe at the end of the line---shell knows
you haven't finished talking to it yet if you end a line with a
pipe, and there is no need for backslash), you'd be golden.


Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-14 Thread Jacob Keller
On Fri, Apr 14, 2017 at 3:33 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita  
> wrote:
>> This is much faster (below 0.1s):
>>
>> __git_index_files ()
>> {
>> local dir="$(__gitdir)" root="${2-.}" file;
>> if [ -d "$dir" ]; then
>> __git_ls_files_helper "$root" "$1" | \
>> sed -r 's@/.*@@' | uniq | sort | uniq
>> fi
>> }
>>
>> time __git_index_files
>>
>> real0m0.075s
>> user0m0.083s
>> sys0m0.010s
>>
>> Most of the improvement is due to the simpler, non-grouping, regex.
>> Since I expect most of the common prefixes to arrive consecutively,
>> running uniq before sort also improves things a bit. I'm not removing
>> leading double quotes anymore (this isn't being done by the current
>> version, anyway) but this doesn't seem to hurt.
>>
>> Despite the dependence on sed this is ten times faster than the
>> original, maybe an option to enable fast index completion or something
>> like that might be desirable.
>>
>> Best regards
>
> It's fine to depend on sed, these shell-scripts are POSIX compatible,
> and so is sed, we use sed in a lot of the built-in shellscripts.
>
> I think you should submit this as a patch, see 
> Documentation/SubmittingPatches.

Yea it should be fine to use sed.

Thanks,
Jake


Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-14 Thread Ævar Arnfjörð Bjarmason
On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita  wrote:
> This is much faster (below 0.1s):
>
> __git_index_files ()
> {
> local dir="$(__gitdir)" root="${2-.}" file;
> if [ -d "$dir" ]; then
> __git_ls_files_helper "$root" "$1" | \
> sed -r 's@/.*@@' | uniq | sort | uniq
> fi
> }
>
> time __git_index_files
>
> real0m0.075s
> user0m0.083s
> sys0m0.010s
>
> Most of the improvement is due to the simpler, non-grouping, regex.
> Since I expect most of the common prefixes to arrive consecutively,
> running uniq before sort also improves things a bit. I'm not removing
> leading double quotes anymore (this isn't being done by the current
> version, anyway) but this doesn't seem to hurt.
>
> Despite the dependence on sed this is ten times faster than the
> original, maybe an option to enable fast index completion or something
> like that might be desirable.
>
> Best regards

It's fine to depend on sed, these shell-scripts are POSIX compatible,
and so is sed, we use sed in a lot of the built-in shellscripts.

I think you should submit this as a patch, see Documentation/SubmittingPatches.


Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-14 Thread Carlos Pita
This is much faster (below 0.1s):

__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | \
sed -r 's@/.*@@' | uniq | sort | uniq
fi
}

time __git_index_files

real0m0.075s
user0m0.083s
sys0m0.010s

Most of the improvement is due to the simpler, non-grouping, regex.
Since I expect most of the common prefixes to arrive consecutively,
running uniq before sort also improves things a bit. I'm not removing
leading double quotes anymore (this isn't being done by the current
version, anyway) but this doesn't seem to hurt.

Despite the dependence on sed this is ten times faster than the
original, maybe an option to enable fast index completion or something
like that might be desirable.

Best regards
--
Carlos


Index files autocompletion too slow in big repositories (w / suggestion for improvement)

2017-04-14 Thread Carlos Pita
Hi all,

I'm currently using git annex to manage my entire file collection
(including tons of music and books) and I noticed how slow
autocompletion has become for files in the index (say for git add).
The main offender is a while-read-case-echo bash loop in
__git_index_files that can be readily substituted with a much faster
sed invocation, although I guess you didn't want the sed dependency in
the first place. Anyway, here is my benchmark:

__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | while read -r file; do
case "$file" in
?*/*)
echo "${file%%/*}"
;;
*)
echo "$file"
;;
esac;
done | sort | uniq;
fi
}

time __git_index_files > /dev/null


__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | \
sed -r 's@^"?([^/]+)/.*$@\1@' | sort | uniq
fi
}

time __git_index_files > /dev/null

real0m0.830s
user0m0.597s
sys0m0.310s

real0m0.345s
user0m0.357s
sys0m0.000s

Notice I'm also excluding the beginning double quote that appears in
escaped path names.

Best regards
--
Carlos