Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
Am 17.04.2017 um 06:05 schrieb Junio C Hamano: Johannes Sixtwrites: This is about command line completion. We go a long way to avoid forking processes there. What is 10x faster on Linux despite of forking a process may not be so on Windows. Doesn't this depend on how many paths there are? If there are only a few paths, the loop in shell would beat a pipe into sed even on Linux, I suspect, and if there are tons of paths, at some number, loop in shell would become slower than a single spawning of sed on platforms with slower fork, no? Absolutely. I just want to make sure a suggested change takes into account the situation on Windows, not only the "YE!" and "VERY WELL!" votes of Linux users ;) -- Hannes
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
Johannes Sixtwrites: > Cc Gábor. > > Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason: >> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita >> wrote: >>> This is much faster (below 0.1s): >>> >>> __git_index_files () >>> { >>> local dir="$(__gitdir)" root="${2-.}" file; >>> if [ -d "$dir" ]; then >>> __git_ls_files_helper "$root" "$1" | \ >>> sed -r 's@/.*@@' | uniq | sort | uniq >>> fi >>> } >>> >>> time __git_index_files >>> >>> real0m0.075s >>> user0m0.083s >>> sys0m0.010s >>> >>> Most of the improvement is due to the simpler, non-grouping, regex. >>> Since I expect most of the common prefixes to arrive consecutively, >>> running uniq before sort also improves things a bit. I'm not removing >>> leading double quotes anymore (this isn't being done by the current >>> version, anyway) but this doesn't seem to hurt. >>> >>> Despite the dependence on sed this is ten times faster than the >>> original, maybe an option to enable fast index completion or something >>> like that might be desirable. >> >> It's fine to depend on sed, these shell-scripts are POSIX compatible, >> and so is sed, we use sed in a lot of the built-in shellscripts. > > This is about command line completion. We go a long way to avoid > forking processes there. What is 10x faster on Linux despite of > forking a process may not be so on Windows. Doesn't this depend on how many paths there are? If there are only a few paths, the loop in shell would beat a pipe into sed even on Linux, I suspect, and if there are tons of paths, at some number, loop in shell would become slower than a single spawning of sed on platforms with slower fork, no?
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
On Sat, Apr 15, 2017 at 4:59 AM, Johannes Sixtwrote: > Cc Gábor. > > Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason: >> >> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita >> wrote: >>> >>> This is much faster (below 0.1s): >>> >>> __git_index_files () >>> { >>> local dir="$(__gitdir)" root="${2-.}" file; >>> if [ -d "$dir" ]; then >>> __git_ls_files_helper "$root" "$1" | \ >>> sed -r 's@/.*@@' | uniq | sort | uniq >>> fi >>> } >>> >>> time __git_index_files >>> >>> real0m0.075s >>> user0m0.083s >>> sys0m0.010s >>> >>> Most of the improvement is due to the simpler, non-grouping, regex. >>> Since I expect most of the common prefixes to arrive consecutively, >>> running uniq before sort also improves things a bit. I'm not removing >>> leading double quotes anymore (this isn't being done by the current >>> version, anyway) but this doesn't seem to hurt. >>> >>> Despite the dependence on sed this is ten times faster than the >>> original, maybe an option to enable fast index completion or something >>> like that might be desirable. >> >> >> It's fine to depend on sed, these shell-scripts are POSIX compatible, >> and so is sed, we use sed in a lot of the built-in shellscripts. > > > This is about command line completion. We go a long way to avoid forking > processes there. What is 10x faster on Linux despite of forking a process > may not be so on Windows. > > (I'm not using bash command line completion on Windows, so I can't tell what > the effect of your suggested change is on Windows. I hope Gábor can comment > on it.) > > -- Hannes > In cases like this, might it be worth somehow splitting it so Linux can use the best thing, and Windows can continue using what's best for it, since it is a pretty significant advantage on Linux. Thanks, Jake
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
Cc Gábor, resent with working email (hopefully); please follow-up on this mail. Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason: On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pitawrote: This is much faster (below 0.1s): __git_index_files () { local dir="$(__gitdir)" root="${2-.}" file; if [ -d "$dir" ]; then __git_ls_files_helper "$root" "$1" | \ sed -r 's@/.*@@' | uniq | sort | uniq fi } time __git_index_files real0m0.075s user0m0.083s sys0m0.010s Most of the improvement is due to the simpler, non-grouping, regex. Since I expect most of the common prefixes to arrive consecutively, running uniq before sort also improves things a bit. I'm not removing leading double quotes anymore (this isn't being done by the current version, anyway) but this doesn't seem to hurt. Despite the dependence on sed this is ten times faster than the original, maybe an option to enable fast index completion or something like that might be desirable. It's fine to depend on sed, these shell-scripts are POSIX compatible, and so is sed, we use sed in a lot of the built-in shellscripts. This is about command line completion. We go a long way to avoid forking processes there. What is 10x faster on Linux despite of forking a process may not be so on Windows. (I'm not using bash command line completion on Windows, so I can't tell what the effect of your suggested change is on Windows. I hope Gábor can comment on it.) -- Hannes
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
Cc Gábor. Am 15.04.2017 um 00:33 schrieb Ævar Arnfjörð Bjarmason: On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pitawrote: This is much faster (below 0.1s): __git_index_files () { local dir="$(__gitdir)" root="${2-.}" file; if [ -d "$dir" ]; then __git_ls_files_helper "$root" "$1" | \ sed -r 's@/.*@@' | uniq | sort | uniq fi } time __git_index_files real0m0.075s user0m0.083s sys0m0.010s Most of the improvement is due to the simpler, non-grouping, regex. Since I expect most of the common prefixes to arrive consecutively, running uniq before sort also improves things a bit. I'm not removing leading double quotes anymore (this isn't being done by the current version, anyway) but this doesn't seem to hurt. Despite the dependence on sed this is ten times faster than the original, maybe an option to enable fast index completion or something like that might be desirable. It's fine to depend on sed, these shell-scripts are POSIX compatible, and so is sed, we use sed in a lot of the built-in shellscripts. This is about command line completion. We go a long way to avoid forking processes there. What is 10x faster on Linux despite of forking a process may not be so on Windows. (I'm not using bash command line completion on Windows, so I can't tell what the effect of your suggested change is on Windows. I hope Gábor can comment on it.) -- Hannes
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
Jacob Kellerwrites: > On Fri, Apr 14, 2017 at 3:33 PM, Ævar Arnfjörð Bjarmason > wrote: >> On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita >> wrote: >>> This is much faster (below 0.1s): >>> >>> __git_index_files () >>> { >>> local dir="$(__gitdir)" root="${2-.}" file; >>> if [ -d "$dir" ]; then >>> __git_ls_files_helper "$root" "$1" | \ >>> sed -r 's@/.*@@' | uniq | sort | uniq >>> fi >>> } >>> >>> time __git_index_files >>> >>> real0m0.075s >>> user0m0.083s >>> sys0m0.010s >>> >>> Most of the improvement is due to the simpler, non-grouping, regex. >>> Since I expect most of the common prefixes to arrive consecutively, >>> running uniq before sort also improves things a bit. I'm not removing >>> leading double quotes anymore (this isn't being done by the current >>> version, anyway) but this doesn't seem to hurt. >>> >>> Despite the dependence on sed this is ten times faster than the >>> original, maybe an option to enable fast index completion or something >>> like that might be desirable. >>> >>> Best regards >> >> It's fine to depend on sed, these shell-scripts are POSIX compatible, >> and so is sed, we use sed in a lot of the built-in shellscripts. >> >> I think you should submit this as a patch, see >> Documentation/SubmittingPatches. > > Yea it should be fine to use sed. As long as the use of "sed" is in line with POSIX.1; I do not think you need the non-portable "-r" merely to strip out everything that follow the first slash, so perhaps "s|-r|-e|" with the above (and do not write backslash after pipe at the end of the line---shell knows you haven't finished talking to it yet if you end a line with a pipe, and there is no need for backslash), you'd be golden.
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
On Fri, Apr 14, 2017 at 3:33 PM, Ævar Arnfjörð Bjarmasonwrote: > On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pita > wrote: >> This is much faster (below 0.1s): >> >> __git_index_files () >> { >> local dir="$(__gitdir)" root="${2-.}" file; >> if [ -d "$dir" ]; then >> __git_ls_files_helper "$root" "$1" | \ >> sed -r 's@/.*@@' | uniq | sort | uniq >> fi >> } >> >> time __git_index_files >> >> real0m0.075s >> user0m0.083s >> sys0m0.010s >> >> Most of the improvement is due to the simpler, non-grouping, regex. >> Since I expect most of the common prefixes to arrive consecutively, >> running uniq before sort also improves things a bit. I'm not removing >> leading double quotes anymore (this isn't being done by the current >> version, anyway) but this doesn't seem to hurt. >> >> Despite the dependence on sed this is ten times faster than the >> original, maybe an option to enable fast index completion or something >> like that might be desirable. >> >> Best regards > > It's fine to depend on sed, these shell-scripts are POSIX compatible, > and so is sed, we use sed in a lot of the built-in shellscripts. > > I think you should submit this as a patch, see > Documentation/SubmittingPatches. Yea it should be fine to use sed. Thanks, Jake
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
On Sat, Apr 15, 2017 at 12:08 AM, Carlos Pitawrote: > This is much faster (below 0.1s): > > __git_index_files () > { > local dir="$(__gitdir)" root="${2-.}" file; > if [ -d "$dir" ]; then > __git_ls_files_helper "$root" "$1" | \ > sed -r 's@/.*@@' | uniq | sort | uniq > fi > } > > time __git_index_files > > real0m0.075s > user0m0.083s > sys0m0.010s > > Most of the improvement is due to the simpler, non-grouping, regex. > Since I expect most of the common prefixes to arrive consecutively, > running uniq before sort also improves things a bit. I'm not removing > leading double quotes anymore (this isn't being done by the current > version, anyway) but this doesn't seem to hurt. > > Despite the dependence on sed this is ten times faster than the > original, maybe an option to enable fast index completion or something > like that might be desirable. > > Best regards It's fine to depend on sed, these shell-scripts are POSIX compatible, and so is sed, we use sed in a lot of the built-in shellscripts. I think you should submit this as a patch, see Documentation/SubmittingPatches.
Re: Index files autocompletion too slow in big repositories (w / suggestion for improvement)
This is much faster (below 0.1s): __git_index_files () { local dir="$(__gitdir)" root="${2-.}" file; if [ -d "$dir" ]; then __git_ls_files_helper "$root" "$1" | \ sed -r 's@/.*@@' | uniq | sort | uniq fi } time __git_index_files real0m0.075s user0m0.083s sys0m0.010s Most of the improvement is due to the simpler, non-grouping, regex. Since I expect most of the common prefixes to arrive consecutively, running uniq before sort also improves things a bit. I'm not removing leading double quotes anymore (this isn't being done by the current version, anyway) but this doesn't seem to hurt. Despite the dependence on sed this is ten times faster than the original, maybe an option to enable fast index completion or something like that might be desirable. Best regards -- Carlos
Index files autocompletion too slow in big repositories (w / suggestion for improvement)
Hi all, I'm currently using git annex to manage my entire file collection (including tons of music and books) and I noticed how slow autocompletion has become for files in the index (say for git add). The main offender is a while-read-case-echo bash loop in __git_index_files that can be readily substituted with a much faster sed invocation, although I guess you didn't want the sed dependency in the first place. Anyway, here is my benchmark: __git_index_files () { local dir="$(__gitdir)" root="${2-.}" file; if [ -d "$dir" ]; then __git_ls_files_helper "$root" "$1" | while read -r file; do case "$file" in ?*/*) echo "${file%%/*}" ;; *) echo "$file" ;; esac; done | sort | uniq; fi } time __git_index_files > /dev/null __git_index_files () { local dir="$(__gitdir)" root="${2-.}" file; if [ -d "$dir" ]; then __git_ls_files_helper "$root" "$1" | \ sed -r 's@^"?([^/]+)/.*$@\1@' | sort | uniq fi } time __git_index_files > /dev/null real0m0.830s user0m0.597s sys0m0.310s real0m0.345s user0m0.357s sys0m0.000s Notice I'm also excluding the beginning double quote that appears in escaped path names. Best regards -- Carlos