Re: [PATCH] completion: improve ls-files filter performance

2018-04-04 Thread Johannes Schindelin
Hi drizzd,

On Wed, 4 Apr 2018, Clemens Buchacher wrote:

> From the output of ls-files, we remove all but the leftmost path
> component and then we eliminate duplicates. We do this in a while loop,
> which is a performance bottleneck when the number of iterations is large
> (e.g. for 6 files in linux.git).
> 
> $ COMP_WORDS=(git status -- ar) COMP_CWORD=3; time _git
> 
> real0m11.876s
> user0m4.685s
> sys 0m6.808s
> 
> Replacing the loop with the cut command improves performance
> significantly:
> 
> $ COMP_WORDS=(git status -- ar) COMP_CWORD=3; time _git
> 
> real0m1.372s
> user0m0.263s
> sys 0m0.167s
> 
> The measurements were done with Msys2 bash, which is used by Git for
> Windows.

Those are nice numbers right there, so I am eager to get this into Git for
Windows as quickly as it stabilizes (i.e. when it hits `next` or so).

I was wondering about one thing, though:

> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index 6da95b8..69a2d41 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -384,12 +384,7 @@ __git_index_files ()
>   local root="${2-.}" file
>  
>   __git_ls_files_helper "$root" "$1" |
> - while read -r file; do
> - case "$file" in
> - ?*/*) echo "${file%%/*}" ;;

This is a bit different from the `cut -f1 -d/` logic, as it does *not
necessarily* strip a leading slash: for `/abc` the existing code would
return the string unmodified, for `/abc/def` it would return an empty
string!

Now, I think that this peculiar behavior is most likely bogus as `git
ls-files` outputs only relative paths (that I know of). In any case,
reducing paths to an empty string seems fishy.

I looked through the history of that code and tracked it all the way back
to
https://public-inbox.org/git/1357930123-26310-1-git-send-email-manlio.peri...@gmail.com/
(that is the reason why you are Cc:ed, Manlio). Manlio, do you remember
why you put the `?` in front of `?*/*` here? I know, it's been more than
five years...

Out of curiosity, would the numbers change a lot if you replaced the `cut
-f1 -d/` call by a `sed -e 's/^\//' -e 's/\/.*//'` one?

I am not proposing to change the patch, though, because we really do not
need to expect `ls-files` to print lines with leading slashes.

> - *) echo "$file" ;;
> - esac
> - done | sort | uniq
> + cut -f1 -d/ | sort | uniq
>  }
>  
>  # Lists branches from the local repository.

Ciao,
Dscho


[PATCH] completion: improve ls-files filter performance

2018-04-04 Thread Clemens Buchacher
>From the output of ls-files, we remove all but the leftmost path
component and then we eliminate duplicates. We do this in a while loop,
which is a performance bottleneck when the number of iterations is large
(e.g. for 6 files in linux.git).

$ COMP_WORDS=(git status -- ar) COMP_CWORD=3; time _git

real0m11.876s
user0m4.685s
sys 0m6.808s

Replacing the loop with the cut command improves performance
significantly:

$ COMP_WORDS=(git status -- ar) COMP_CWORD=3; time _git

real0m1.372s
user0m0.263s
sys 0m0.167s

The measurements were done with Msys2 bash, which is used by Git for
Windows.

When filtering the ls-files output we take care not to touch absolute
paths. This is redundant, because ls-files will never output absolute
paths. Remove the unnecessary operations.

The issue was reported here:
https://github.com/git-for-windows/git/issues/1533

Signed-off-by: Clemens Buchacher 
---

On Sun, Mar 18, 2018 at 02:26:18AM +0100, SZEDER Gábor wrote:
> 
> You didn't run the test suite, did you? ;)

My bad. I put the sort back in. Test t9902 is now pass. I did not run
the other tests. I think the completion script is not used there.

I also considered Junio's and Johannes' comments.

> I have a short patch series collecting dust somewhere for a long
> while, [...]
> Will try to dig up those patches.

Cool. Bash completion can certainly use more performance improvements.

 contrib/completion/git-completion.bash | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 6da95b8..69a2d41 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -384,12 +384,7 @@ __git_index_files ()
local root="${2-.}" file
 
__git_ls_files_helper "$root" "$1" |
-   while read -r file; do
-   case "$file" in
-   ?*/*) echo "${file%%/*}" ;;
-   *) echo "$file" ;;
-   esac
-   done | sort | uniq
+   cut -f1 -d/ | sort | uniq
 }
 
 # Lists branches from the local repository.
-- 
2.7.4