Re: Bash-specific performance by avoiding sed
On Friday 09 of October 2015 17:16:03 Pavel Raiskup wrote: > > Here's a (lightly-tested) idea of what it would look like, where we'd > > have to audit every caller to deal with the result already including > > full quoting: > > > > if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then > > func_quote() > > { > > printf -v func_quote_result %q "$1" > > } > > else > > func_quote() > > { > > portable version, except add: > > func_quote_result="\"$func_quote_result\"" > > } > > fi > > > > Note that with this variant, the portable version converts 'a *"b' into > > '"a *\"b"', while the bash version converts it into 'a\ \ \*\"b'. > > Thanks! I've done a simple testing too .. and it seems to be equivalent > to '$SED $sed_quote_subst' (in usecases like: string -> quote -> eval). > > Well -- I'm not 100% sure I want to hack this in and risk some issues. On > the other hand, I'm able to review the patches if somebody wanted to give > it a try (the small bash slowdown from v2.4.2 to v2.4.6 is not good > motivation for me). Hm, I was thinking thinking about this over the weekend and I wanted to do some testing before I would definitely reject that idea. But it sounds like the printf builtin helps lets say significantly to bash and does not hurt others. I changed the function names so the changes to libtool code should be pretty transparent (even though it is rather large change). I'm posting results of the performance test (on systemd package) and attaching 3 "planned to be pushed" patches. So testing '2.4.2' version (fast), git-version after applying 0001 and 0002 (patch-2) and including 0003 (patch-3). The dirname of test is in format 'PACKAGE-SHELL-LTVERSION'. Pavel - %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-bash-2.4.2) 0:28.50 56.84 41.42 0:28.38 56.72 41.41 0:28.26 56.44 41.68 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-bash-2.4.6.13-f729) 1:06.13 232.12 23.33 1:06.52 232.28 23.35 1:06.54 232.16 23.52 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-bash-2.4.6.14-7a09) 0:46.52 151.16 24.48 0:46.64 150.80 24.60 0:46.69 151.12 23.85 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-dash-2.4.2) 0:16.34 22.22 28.62 0:17.13 22.57 29.09 0:17.11 22.37 29.24 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-dash-2.4.6.13-f729) 0:23.38 58.80 21.32 0:23.16 58.81 21.16 0:23.21 58.67 21.15 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-dash-2.4.6.14-7a09) 0:21.36 51.20 21.62 0:21.23 50.59 21.76 0:21.25 50.70 21.55 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-ksh-2.4.2) 0:19.65 30.76 34.12 0:21.09 30.68 34.63 0:19.75 30.83 34.29 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-ksh-2.4.6.13-f729) 0:30.75 85.60 25.41 0:30.70 85.51 25.52 0:30.80 85.65 25.41 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-ksh-2.4.6.14-7a09) 0:30.78 84.25 26.14 0:30.63 83.65 26.17 0:30.62 83.85 26.02 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-zsh-2.4.2) 0:27.94 44.47 48.97 0:28.03 44.29 49.31 0:28.60 44.48 48.93 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-zsh-2.4.6.13-f729) 0:47.60 140.30 41.70 0:47.53 139.84 41.89 0:47.56 139.71 42.12 %E %U %S (/home/praiskup/rh/projects/systemd-perf/systemd-zsh-2.4.6.14-7a09) 0:46.21 131.88 41.72 0:46.62 131.31 42.25 0:46.44 131.35 42.03 >From 32f0df9835ac15ac17e04be57c368172c3ad1d19 Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Sun, 4 Oct 2015 21:55:03 +0200 Subject: [PATCH 1/3] libtool: mitigate the $sed_quote_subst slowdown When it is reasonably possible, use shell implementation for quoting. References: http://lists.gnu.org/archive/html/libtool/2015-03/msg5.html http://lists.gnu.org/archive/html/libtool/2015-02/msg0.html https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20006 * gl/build-aux/funclib.sh (func_quote): New function that can be used as substitution for '$SED $sed_quote_subst' call. * build-aux/ltmain.in (func_emit_wrapper): Use func_quote instead of '$SED $sed_quote_subst'. (func_mode_link): Likewise. * NEWS: Document. * bootstrap: Sync with funclib.sh. --- NEWS| 3 +++ bootstrap | 61 +++-- build-aux/ltmain.in | 10 gl/build-aux/funclib.sh | 61 +++-- 4 files changed, 117 insertions(+), 18 deletions(-) diff --git a/NEWS b/NEWS index a3c5b12..7c23d03 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ NEWS - list of user-visible changes between releases of GNU Libtool - Fix significant slowdown of libtoolize for certain projects (regression introduced in 2.4.3 release) caused by infinite m4 macro recursion. + - Mitigate the slowdown of libtool script (introduced in v2.4.3) caused by +increased number of calls to '$SED $sed_quote_subst' (bug#20006). + * Noteworthy changes in release 2.4.6 (2015-02-15) [stable] **
Re: Bash-specific performance by avoiding sed
On Wednesday 07 of October 2015 08:03:01 Eric Blake wrote: > On 10/07/2015 06:28 AM, Pavel Raiskup wrote: > > > .. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s. It's not > > completely back yet but it's much better than v2.4.6. > > Thanks again for working on this. Thanks for perfect review, Eric. > > +case $func_quote_result in > > + *'*'*|*'['*) > > Problem: shouldn't this also include *'?'* to avoid globbing a single > character? Oh, sorry for this - and thanks, I'll fix that! > Could probably be written *[\[\*\?]*) > > The string itself contains globs, so... Ok. > > +;; > > +esac > > + > > +func_quote_old_IFS=$IFS > > +for _G_char in '\' '`' '"' '$' > > +do > > + # STATE($1) PREV($2) SEPARATOR($3) > > + set start "" "" > > + func_quote_result=dummy"$_G_char$func_quote_result$_G_char"dummy > > + IFS=$_G_char > > + for _G_part in $func_quote_result > > + do > > +case $1 in > > +quote) > > + func_append func_quote_result "$3$2" > > + set quote "$_G_part" "\\$_G_char" > > + ;; > > +start) > > + set first "" "" > > + func_quote_result= > > + ;; > > +first) > > + set quote "$_G_part" "" > > + ;; > > +esac > > + done > > + IFS=$func_quote_old_IFS > > +done > > +;; > > The string does not contain globs, so do four linear passes that escape > each problem character. To be honest, in shells without '+=' operator, these are four O(N^2) passes.. But still very acceptable parabola IMO. Firstly I tried to do this without this ugly state machine -- using $@ array to store strings. Very roughly: for _G_char in '\' '`' ... do IFS=$_G_char set dummy for _G_part $func_quote_result do set "$@" "\\$_G_char$_G_part" ... done shift # _now join $@ into one string_ done ... but that ended in quadratic complexity _everywhere_. I was curious why and it was because of the 'set "$@"' command being longer and longer for strings like '"""...'. > converts 'a"b' into 'a\"b', then caller adds outer quotes to become > '"a\"b"'. > > > + *) ;; > > +esac > > +} > > Looks correct. However, is it also worth attempting a shell-specific > speedup? After all, func_append uses shell-specific speedups (it is NOT > as fast on shells that don't support +=). Yes. We could do that (at least in future if not in this thread). > That is, the above function appears to be portable to all shells, but if > we detect that a shell supports printf -v %q, can we use that instead > for some additional speed? Of course, printf -v %q in bash prefers > output that does NOT embed inside "" (it quotes ALL shell metacharacters > using backslash), so we'd have to rework the contract of the function. Well, I was thinking about 'printf %q' firstly, but I rejected that idea because I missed the point that bash's printf has '-v' option (thus we can avoid forking!). Thanks for this point. > That is, instead of the current function which returns a ready-escaped > string but no outer "", we would instead be returning a string that > already includes all necessary quoting. Which would mean rewriting all > callers :( For example, we'd have to figure out what to do for > func_quote_for_eval, which returns two values - > func_quote_for_eval_result being fully quoted is easy with printf -v %q, > and func_quote_for_eval_unquoted_result which is not. Yes, rewriting all callers would raise a chance of my mistake. So I'm not really in favour of this dangerous action :) .. it would be just Bash speedup anyway (even zsh does not have 'printf -v'). Also, we would have to be careful where this optimized function would be used -- e.g. 'sed_quote_subst' has been historically used to generate '*.la' files. We should keep that format as is -- so we would have to have two versions of func_quote (one possibly with printf %q, one producing the old output). > Here's a (lightly-tested) idea of what it would look like, where we'd > have to audit every caller to deal with the result already including > full quoting: > > if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then > func_quote() > { > printf -v func_quote_result %q "$1" > } > else > func_quote() > { > portable version, except add: > func_quote_result="\"$func_quote_result\"" > } > fi > > Note that with this variant, the portable version converts 'a *"b' into > '"a *\"b"', while the bash version converts it into 'a\ \ \*\"b'. Thanks! I've done a simple testing too .. and it seems to be equivalent to '$SED $sed_quote_subst' (in usecases like: string -> quote -> eval). Well -- I'm not 100% sure I want to hack this in and risk some issues. On the other hand, I'm able to
Re: Bash-specific performance by avoiding sed
On 10/07/2015 06:28 AM, Pavel Raiskup wrote: > .. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s. It's not > completely back yet but it's much better than v2.4.6. Thanks again for working on this. > > +# func_quote ARG > +# -- > +# Aesthetically quote one ARG, store the result into $func_quote_result. > Note > +# that we keep attention to performance here (so far O(N) complexity as long > as > +# func_append is O(1)). > +func_quote () > +{ > +$debug_cmd > + > +func_quote_result=$1 Common case - nothing needs escaping. Converts 'abc' to 'abc', as well as 'a b' to 'a b'. The caller can still blindly add outer "" for a printed form '"abc"' (unnecessary but harmless ""), or for '"a b"' (necessary in that case). > + > +case $func_quote_result in > + *[\\\`\"\$]*) Something needs escaping before being placed in double quotes. > +case $func_quote_result in > + *'*'*|*'['*) Problem: shouldn't this also include *'?'* to avoid globbing a single character? Could probably be written *[\[\*\?]*) The string itself contains globs, so... > +func_quote_result=`$ECHO "$func_quote_result" | $SED > "$sed_quote_subst"` > +return 0 ...rather than worry about set +f, we just use sed in this rare case. Converts 'a*b' into 'a*b', converts 'a*"b' into 'a*\"b'. The caller then supplies outer "", for '"a*b"' or '"a*\"b"' (outer quotes necessary in both cases). > +;; > +esac > + > +func_quote_old_IFS=$IFS > +for _G_char in '\' '`' '"' '$' > +do > + # STATE($1) PREV($2) SEPARATOR($3) > + set start "" "" > + func_quote_result=dummy"$_G_char$func_quote_result$_G_char"dummy > + IFS=$_G_char > + for _G_part in $func_quote_result > + do > +case $1 in > +quote) > + func_append func_quote_result "$3$2" > + set quote "$_G_part" "\\$_G_char" > + ;; > +start) > + set first "" "" > + func_quote_result= > + ;; > +first) > + set quote "$_G_part" "" > + ;; > +esac > + done > + IFS=$func_quote_old_IFS > +done > +;; The string does not contain globs, so do four linear passes that escape each problem character. converts 'a"b' into 'a\"b', then caller adds outer quotes to become '"a\"b"'. > + *) ;; > +esac > +} Looks correct. However, is it also worth attempting a shell-specific speedup? After all, func_append uses shell-specific speedups (it is NOT as fast on shells that don't support +=). That is, the above function appears to be portable to all shells, but if we detect that a shell supports printf -v %q, can we use that instead for some additional speed? Of course, printf -v %q in bash prefers output that does NOT embed inside "" (it quotes ALL shell metacharacters using backslash), so we'd have to rework the contract of the function. That is, instead of the current function which returns a ready-escaped string but no outer "", we would instead be returning a string that already includes all necessary quoting. Which would mean rewriting all callers :( For example, we'd have to figure out what to do for func_quote_for_eval, which returns two values - func_quote_for_eval_result being fully quoted is easy with printf -v %q, and func_quote_for_eval_unquoted_result which is not. Here's a (lightly-tested) idea of what it would look like, where we'd have to audit every caller to deal with the result already including full quoting: if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then func_quote() { printf -v func_quote_result %q "$1" } else func_quote() { portable version, except add: func_quote_result="\"$func_quote_result\"" } fi Note that with this variant, the portable version converts 'a *"b' into '"a *\"b"', while the bash version converts it into 'a\ \ \*\"b'. > @@ -8596,8 +8597,8 @@ EOF > relink_command="$var=$func_quote_for_eval_result; export $var; > $relink_command" > fi > done > - relink_command="(cd `pwd`; $relink_command)" > - relink_command=`$ECHO "$relink_command" | $SED "$sed_quote_subst"` > + func_quote "(cd `pwd`; $relink_command)" > + relink_command=$func_quote_result Unrelated to your patch, but doesn't this fail if pwd is called in an absolute directory containing spaces? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: Bash-specific performance by avoiding sed
On Monday 05 of October 2015 15:28:56 Pavel Raiskup wrote: > On Monday 05 of October 2015 09:47:05 Pavel Raiskup wrote: > > On Monday 05 of October 2015 01:25:24 Pavel Raiskup wrote: > > > On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote: > > > > > should we test the size of the string first ? i've written such raw > > > > > shell > > > > > string parsing functions before, and once you hit a certain size > > > > > (like 1k+ > > > > > iirc), forking out to sed is way faster, especially when running in > > > > > multibyte > > > > > locales (like UTF8) which most people are doing nowadays. > > > > > -mike > > > > > > > > Well, that optimization would require (fast) strlen()-like construct. > > > > Anyway, the vast majority of calls to func_quote () function will have > > > > short ARG, and its complexity is still "just" linear. We could optimize > > > > later if that was a real issue. > > > > > > > > I would like to propose solution based on Eric's one, without using of > > > > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even > > > > more portable while it keeps almost the same speed (if we can use += its > > > > even faster). > > > > > > > > I have yet a another patch trying to minimize option-parser overhead > > > > (that is focused on the POV of Richard, but that needs to be cleaned up > > > > a > > > > bit, I'll post hopefully tomorrow). > > > > > > > > Any comment is welcome! > > > > > > Re-attached (fixes for 'make syntax-check' and fixed one comment). > > > > Hmm, one might-be-a-problem with this (catched by testsuite), when you > > have: > > > > $ cat build-aux/test-quoting > > . `echo "$0" |${SED-sed} 's|[^/]*$||'`/funclib.sh > > # source this for "GNU m4" detection methods > > . `echo "$0" |${SED-sed} 's|[^/]*$||'`/extract-trace > > > > func_quote_for_eval "$@" > > echo "$func_quote_for_eval_result" > > > > Then: > > > > $ ./build-aux/test-quoting '"a b"' # fine > > "\"a b\"" > > > > $ ./build-aux/test-quoting '"*tool"' # broken > > ./build-aux/test-quoting '"*tool"' > > \"libtool\" > > > > We would like to have an output \"*\". I'm not aware of portable way > > how to disable wildcard expansion in shell, and autoconf 'Shellology' > > section haven't helped me. In particular, the problem is here: > > > > x='a"[a-z]*"c' > > IFS='"' > > for i in $x; do # Here we wan't to disable wildcard expansion > > echo $i > > done > > > > Any idea other than fallback to $sed_quote_subst in case of '*' or '[' > > exists in ARG? > > Attaching two (yet to be cleaned) patches doing the optimization. Is > anybody able to test/comment on this particular solution? That would be > really appreciated. The cleaned patches are attached. I would like to push those very soon, probably before weekend. If you see any issues worth holding this change, please let me know soon, thanks! FWIW, some numbers (systemd.git build time, right after 'make clean'): The old libtool v2.4.2: $ time make -j5 real2m3.163s user3m54.849s sys 3m28.684s The latest released libtool v2.4.6: $ time make -j5 LIBTOOL=/usr/bin/libtool real8m24.604s user9m56.977s sys 19m45.620s The patched git libtool: $ time make -j5 LIBTOOL=~/rh/projects/libtool/libtool real2m34.682s user6m37.158s sys 2m21.123s .. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s. It's not completely back yet but it's much better than v2.4.6. Pavel >From 74940e9306df18deddd11621791973df886e313a Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Sun, 4 Oct 2015 21:55:03 +0200 Subject: [PATCH 1/2] libtool: mitigate the $sed_quote_subst slowdown When it is reasonably possible, use shell implementation for quoting. References: http://lists.gnu.org/archive/html/libtool/2015-03/msg5.html http://lists.gnu.org/archive/html/libtool/2015-02/msg0.html https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20006 * gl/build-aux/funclib.sh (func_quote): New function that can be used as substitution for '$SED $sed_quote_subst' call. * build-aux/ltmain.in (func_emit_wrapper): Use func_quote instead of '$SED $sed_quote_subst'. (func_mode_link): Likewise. * NEWS: Document. * bootstrap: Sync with funclib.sh. --- NEWS| 3 +++ bootstrap | 56 +++-- build-aux/ltmain.in | 10 + gl/build-aux/funclib.sh | 56 +++-- 4 files changed, 117 insertions(+), 8 deletions(-) diff --git a/NEWS b/NEWS index a3c5b12..7c23d03 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ NEWS - list of user-visible changes between releases of GNU Libtool - Fix significant slowdown of libtoolize for certain projects (regression introduced in 2.4.3 release) caused by infinite m4 macro recursion. + - Mitigate the slowdown of libtool script (introduced in v2.4.3) caused by +increased number of calls to '$SED $sed_quote_subst
Re: Bash-specific performance by avoiding sed
On 10/05/2015 01:47 AM, Pavel Raiskup wrote: > > Hmm, one might-be-a-problem with this (catched by testsuite), when you s/catched/caught/ > We would like to have an output \"*\". I'm not aware of portable way > how to disable wildcard expansion in shell, and autoconf 'Shellology' > section haven't helped me. In particular, the problem is here: It is portable to use 'set +f' to disable wildcard globbing, then 'set -f' to turn them back on. (POSIX requires it). > > x='a"[a-z]*"c' > IFS='"' > for i in $x; do # Here we wan't to disable wildcard expansion s/wan't/want/ -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: Bash-specific performance by avoiding sed
On Monday 05 of October 2015 09:47:05 Pavel Raiskup wrote: > On Monday 05 of October 2015 01:25:24 Pavel Raiskup wrote: > > On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote: > > > > should we test the size of the string first ? i've written such raw > > > > shell > > > > string parsing functions before, and once you hit a certain size (like > > > > 1k+ > > > > iirc), forking out to sed is way faster, especially when running in > > > > multibyte > > > > locales (like UTF8) which most people are doing nowadays. > > > > -mike > > > > > > Well, that optimization would require (fast) strlen()-like construct. > > > Anyway, the vast majority of calls to func_quote () function will have > > > short ARG, and its complexity is still "just" linear. We could optimize > > > later if that was a real issue. > > > > > > I would like to propose solution based on Eric's one, without using of > > > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even > > > more portable while it keeps almost the same speed (if we can use += its > > > even faster). > > > > > > I have yet a another patch trying to minimize option-parser overhead > > > (that is focused on the POV of Richard, but that needs to be cleaned up a > > > bit, I'll post hopefully tomorrow). > > > > > > Any comment is welcome! > > > > Re-attached (fixes for 'make syntax-check' and fixed one comment). > > Hmm, one might-be-a-problem with this (catched by testsuite), when you > have: > > $ cat build-aux/test-quoting > . `echo "$0" |${SED-sed} 's|[^/]*$||'`/funclib.sh > # source this for "GNU m4" detection methods > . `echo "$0" |${SED-sed} 's|[^/]*$||'`/extract-trace > > func_quote_for_eval "$@" > echo "$func_quote_for_eval_result" > > Then: > > $ ./build-aux/test-quoting '"a b"' # fine > "\"a b\"" > > $ ./build-aux/test-quoting '"*tool"' # broken > ./build-aux/test-quoting '"*tool"' > \"libtool\" > > We would like to have an output \"*\". I'm not aware of portable way > how to disable wildcard expansion in shell, and autoconf 'Shellology' > section haven't helped me. In particular, the problem is here: > > x='a"[a-z]*"c' > IFS='"' > for i in $x; do # Here we wan't to disable wildcard expansion > echo $i > done > > Any idea other than fallback to $sed_quote_subst in case of '*' or '[' > exists in ARG? Attaching two (yet to be cleaned) patches doing the optimization. Is anybody able to test/comment on this particular solution? That would be really appreciated. Pavel >From 1b89ae66f88c3822b2afec128f288d409287780f Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Sun, 4 Oct 2015 21:55:03 +0200 Subject: [PATCH 1/2] libtool: mitigate the $sed_quote_subst slowdown When it is reasonably possible, use shell implementation for quoting. References: http://lists.gnu.org/archive/html/libtool/2015-03/msg5.html http://lists.gnu.org/archive/html/libtool/2015-02/msg0.html https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20006 * gl/build-aux/funclib.sh (func_quote): New function that can be used as substitution for '$SED $sed_quote_subst' call. * build-aux/ltmain.in (func_emit_wrapper): Use func_quote instead of '$SED $sed_quote_subst'. (func_mode_link): Likewise. * NEWS: Document. * bootstrap: Sync with funclib.sh. --- NEWS| 3 +++ bootstrap | 56 +++-- build-aux/ltmain.in | 10 + gl/build-aux/funclib.sh | 56 +++-- 4 files changed, 117 insertions(+), 8 deletions(-) diff --git a/NEWS b/NEWS index a3c5b12..7c23d03 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ NEWS - list of user-visible changes between releases of GNU Libtool - Fix significant slowdown of libtoolize for certain projects (regression introduced in 2.4.3 release) caused by infinite m4 macro recursion. + - Mitigate the slowdown of libtool script (introduced in v2.4.3) caused by +increased number of calls to '$SED $sed_quote_subst' (bug#20006). + * Noteworthy changes in release 2.4.6 (2015-02-15) [stable] ** New features: diff --git a/bootstrap b/bootstrap index c179f51..2649478 100755 --- a/bootstrap +++ b/bootstrap @@ -230,7 +230,7 @@ vc_ignore= # Source required external libraries: # Set a version string for this script. -scriptversion=2015-01-20.17; # UTC +scriptversion=2015-10-04.22; # UTC # General shell script boiler plate, and helper functions. # Written by Gary V. Vaughan, 2004 @@ -1257,6 +1257,57 @@ func_relative_path () } +# func_quote ARG +# -- +# Aesthetically quote one ARG, store the result into $func_quote_result. Note +# that we keep attention to performance here (so far O(N) complexity as long as +# func_append is O(1)). +func_quote () +{ +$debug_cmd + +func_quote_result=$1 + +case $func_quote_result in + *[\\\`\"\$]*) +case $func_quote_result in + *'*'*|*'['*) +func_quote_result=`$ECHO "
Re: Bash-specific performance by avoiding sed
On Monday 05 of October 2015 01:25:24 Pavel Raiskup wrote: > On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote: > > > should we test the size of the string first ? i've written such raw > > > shell > > > string parsing functions before, and once you hit a certain size (like > > > 1k+ > > > iirc), forking out to sed is way faster, especially when running in > > > multibyte > > > locales (like UTF8) which most people are doing nowadays. > > > -mike > > > > Well, that optimization would require (fast) strlen()-like construct. > > Anyway, the vast majority of calls to func_quote () function will have > > short ARG, and its complexity is still "just" linear. We could optimize > > later if that was a real issue. > > > > I would like to propose solution based on Eric's one, without using of > > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even > > more portable while it keeps almost the same speed (if we can use += its > > even faster). > > > > I have yet a another patch trying to minimize option-parser overhead > > (that is focused on the POV of Richard, but that needs to be cleaned up a > > bit, I'll post hopefully tomorrow). > > > > Any comment is welcome! > > Re-attached (fixes for 'make syntax-check' and fixed one comment). Hmm, one might-be-a-problem with this (catched by testsuite), when you have: $ cat build-aux/test-quoting . `echo "$0" |${SED-sed} 's|[^/]*$||'`/funclib.sh # source this for "GNU m4" detection methods . `echo "$0" |${SED-sed} 's|[^/]*$||'`/extract-trace func_quote_for_eval "$@" echo "$func_quote_for_eval_result" Then: $ ./build-aux/test-quoting '"a b"' # fine "\"a b\"" $ ./build-aux/test-quoting '"*tool"' # broken ./build-aux/test-quoting '"*tool"' \"libtool\" We would like to have an output \"*\". I'm not aware of portable way how to disable wildcard expansion in shell, and autoconf 'Shellology' section haven't helped me. In particular, the problem is here: x='a"[a-z]*"c' IFS='"' for i in $x; do # Here we wan't to disable wildcard expansion echo $i done Any idea other than fallback to $sed_quote_subst in case of '*' or '[' exists in ARG? Pavel ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: Bash-specific performance by avoiding sed
On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote: > > should we test the size of the string first ? i've written such raw shell > > string parsing functions before, and once you hit a certain size (like 1k+ > > iirc), forking out to sed is way faster, especially when running in > > multibyte > > locales (like UTF8) which most people are doing nowadays. > > -mike > > Well, that optimization would require (fast) strlen()-like construct. > Anyway, the vast majority of calls to func_quote () function will have > short ARG, and its complexity is still "just" linear. We could optimize > later if that was a real issue. > > I would like to propose solution based on Eric's one, without using of > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even > more portable while it keeps almost the same speed (if we can use += its > even faster). > > I have yet a another patch trying to minimize option-parser overhead > (that is focused on the POV of Richard, but that needs to be cleaned up a > bit, I'll post hopefully tomorrow). > > Any comment is welcome! Re-attached (fixes for 'make syntax-check' and fixed one comment). Pavel >From f6935309a07b2a4797ff5612bff00ced4842c3e5 Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Sun, 4 Oct 2015 21:55:03 +0200 Subject: [PATCH] libtool: mitigate the $sed_quote_subst slowdown References: http://lists.gnu.org/archive/html/libtool/2015-03/msg5.html http://lists.gnu.org/archive/html/libtool/2015-02/msg0.html https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20006 * gl/build-aux/funclib.sh (func_quote): New function that can be used as substition for '$SED $sed_quote_subst' call. * build-aux/ltmain.in (func_emit_wrapper): Use func_quote instead of '$SED $sed_quote_subst'. (func_mode_link): Likewise. * NEWS: Document. * bootstrap: Sync with funclib.sh. --- NEWS| 3 +++ bootstrap | 49 +++-- build-aux/ltmain.in | 10 ++ gl/build-aux/funclib.sh | 49 +++-- 4 files changed, 103 insertions(+), 8 deletions(-) diff --git a/NEWS b/NEWS index a3c5b12..7c23d03 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ NEWS - list of user-visible changes between releases of GNU Libtool - Fix significant slowdown of libtoolize for certain projects (regression introduced in 2.4.3 release) caused by infinite m4 macro recursion. + - Mitigate the slowdown of libtool script (introduced in v2.4.3) caused by +increased number of calls to '$SED $sed_quote_subst' (bug#20006). + * Noteworthy changes in release 2.4.6 (2015-02-15) [stable] ** New features: diff --git a/bootstrap b/bootstrap index c179f51..c2f6545 100755 --- a/bootstrap +++ b/bootstrap @@ -230,7 +230,7 @@ vc_ignore= # Source required external libraries: # Set a version string for this script. -scriptversion=2015-01-20.17; # UTC +scriptversion=2015-10-04.22; # UTC # General shell script boiler plate, and helper functions. # Written by Gary V. Vaughan, 2004 @@ -1257,6 +1257,50 @@ func_relative_path () } +# func_quote ARG +# -- +# Aesthetically quote one ARG, store the result into $func_quote_result. Note +# that we keep attention to performance here (so far O(N) complexity as long as +# func_append is O(1)). +func_quote () +{ +$debug_cmd + +func_quote_result=$1 +func_quote_old_IFS=$IFS + +case $func_quote_result in + *[\\\`\"\$]*) +for _G_char in '\' '`' '"' '$' +do + # STATE($1) PREV($2) SEPARATOR($3) + set start "" "" + func_quote_result=dummy"$_G_char$func_quote_result$_G_char"dummy + IFS=$_G_char + for _G_part in $func_quote_result + do +case $1 in +quote) + func_append func_quote_result "$3$2" + set quote "$_G_part" "\\$_G_char" + ;; +start) + set first "" "" + func_quote_result= + ;; +first) + set quote "$_G_part" "" + ;; +esac + done + IFS=$func_quote_old_IFS +done +;; + *) ;; +esac +} + + # func_quote_for_eval ARG... # -- # Aesthetically quote ARGs to be evaled later. @@ -1275,7 +1319,8 @@ func_quote_for_eval () while test 0 -lt $#; do case $1 in *[\\\`\"\$]*) - _G_unquoted_arg=`printf '%s\n' "$1" |$SED "$sed_quote_subst"` ;; + func_quote "$1" + _G_unquoted_arg=$func_quote_result ;; *) _G_unquoted_arg=$1 ;; esac diff --git a/build-aux/ltmain.in b/build-aux/ltmain.in index 0c40da0..24acefd 100644 --- a/build-aux/ltmain.in +++ b/build-aux/ltmain.in @@ -3346,7 +3346,8 @@ else if test \"\$libtool_execute_magic\" != \"$magic\"; then file=\"\$0\"" -qECHO=`$ECHO "$ECHO" | $SED "$sed_quote_subst"` +func_quote "$ECHO" +qECHO=$
Re: Bash-specific performance by avoiding sed
forcemerge 20006 20005 thanks On Monday 09 of March 2015 18:04:34 Mike Frysinger wrote: > On 09 Mar 2015 14:48, Eric Blake wrote: > > On 03/09/2015 01:50 PM, Bob Friesenhahn wrote: > > > On Mon, 9 Mar 2015, Mike Gran wrote: > > >> I don't know if y'all saw this blogpost where a guy pushed > > >> the sed regular expression handling into bash-specific > > >> regular expressions when bash was available. He claims > > >> there's a significant performance improvement because of > > >> reduced forking. > > >> > > >> http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-18-sed-forks/ > > > > > > There is an issue in the libtool bug tracker regarding this. > > > > > > This solution only works with GNU bash. It would be good if volunteers > > > could research to see if there are similar solutions which can work with > > > other common shells (e.g. dash, ksh, zsh). > > > > For context, we're trying to speed up: > > > > sed_quote_subst='s|\([`"$\\]\)|\\\1|g' > > _G_unquoted_arg=`printf '%s\n' "$1" |$SED "$sed_quote_subst"` > > > > How about this, which should be completely portable to XSI shells (alas, > > it still uses ${a#b} and ${a%b} at the end, so it is not portable to > > ancient Solaris /bin/sh): > > > > # func_quote STRING > > # Escapes all \`"$ in STRING with another \, and stores that in $quoted > > func_quote () { > > case $1 in > > *[\\\`\"\$]*) > > save_IFS=$IFS pre=.$1. > > for char in '\' '`' '"' '$'; do > > post= IFS=$char > > for part in $pre; do > > post=${post:+$post\\$char}$part > > done > > pre=$post > > done > > should we test the size of the string first ? i've written such raw shell > string parsing functions before, and once you hit a certain size (like 1k+ > iirc), forking out to sed is way faster, especially when running in multibyte > locales (like UTF8) which most people are doing nowadays. > -mike Well, that optimization would require (fast) strlen()-like construct. Anyway, the vast majority of calls to func_quote () function will have short ARG, and its complexity is still "just" linear. We could optimize later if that was a real issue. I would like to propose solution based on Eric's one, without using of '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even more portable while it keeps almost the same speed (if we can use += its even faster). I have yet a another patch trying to minimize option-parser overhead (that is focused on the POV of Richard, but that needs to be cleaned up a bit, I'll post hopefully tomorrow). Any comment is welcome! Pave >From aa988d0a49f2d2b419519b09fef62fc993a6169f Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Sun, 4 Oct 2015 21:55:03 +0200 Subject: [PATCH] libtool: mitigate the $sed_quote_subst slowdown References: http://lists.gnu.org/archive/html/libtool/2015-03/msg5.html http://lists.gnu.org/archive/html/libtool/2015-02/msg0.html https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20006 * gl/build-aux/funclib.sh (func_quote): New function that can be used as substition for '$SED $sed_quote_subst' call. * build-aux/ltmain.in (func_emit_wrapper): Use func_quote instead of '$SED $sed_quote_subst'. (func_mode_link): Likewise. * NEWS: Document. * bootstrap: Sync with funclib.sh. --- NEWS| 3 +++ bootstrap | 49 +++-- build-aux/ltmain.in | 10 ++ gl/build-aux/funclib.sh | 49 +++-- 4 files changed, 103 insertions(+), 8 deletions(-) diff --git a/NEWS b/NEWS index a3c5b12..7c23d03 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ NEWS - list of user-visible changes between releases of GNU Libtool - Fix significant slowdown of libtoolize for certain projects (regression introduced in 2.4.3 release) caused by infinite m4 macro recursion. + - Mitigate the slowdown of libtool script (introduced in v2.4.3) caused by +increased number of calls to '$SED $sed_quote_subst' (bug#20006). + * Noteworthy changes in release 2.4.6 (2015-02-15) [stable] ** New features: diff --git a/bootstrap b/bootstrap index c179f51..0c73a49 100755 --- a/bootstrap +++ b/bootstrap @@ -230,7 +230,7 @@ vc_ignore= # Source required external libraries: # Set a version string for this script. -scriptversion=2015-01-20.17; # UTC +scriptversion=2015-10-04.22; # UTC # General shell script boiler plate, and helper functions. # Written by Gary V. Vaughan, 2004 @@ -1257,6 +1257,50 @@ func_relative_path () } +# func_quote ARG +# -- +# Aesthetically quote one ARG, store the result into $func_quote_result. Note +# that we keep attention to performance here (so far O(N) complexity as long as +# func_append is O(N) too). +func_quote () +{ +$debug_cmd + +func_quote_result=$1 +func_quote_old_IFS=$IFS + +case $func_quote_result in + *[\\\`\"\$]*) +for _G_char in '\' '`' '"' '$' +do
Re: Bash-specific performance by avoiding sed
On Mon, 2015-03-09 at 18:28 +, Mike Gran wrote: > I don't know if y'all saw this blogpost where a guy pushed > the sed regular expression handling into bash-specific > regular expressions when bash was available. He claims > there's a significant performance improvement because of > reduced forking. > > http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-18-sed-forks/ Please see my reply about this. The issue is the amount of looping through the options parsing code which I did already report here as an issue. I do have a "fix" for it in the build systems I look after: http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=cc5f804483a9a1a60be24475baee0eed5d44bef5 which basically unrolls some of the internal code looping and I believe that with that patch, the speedup to the specific sed invocation you're looking at will be much less relevant. The issue is that in its current form, it can't really be merged and my shell knowledge and lack of contribution agreement to GNU are likely blocking issues along with a lack of time :( Cheers, Richard ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: Bash-specific performance by avoiding sed
On 09 Mar 2015 14:48, Eric Blake wrote: > On 03/09/2015 01:50 PM, Bob Friesenhahn wrote: > > On Mon, 9 Mar 2015, Mike Gran wrote: > >> I don't know if y'all saw this blogpost where a guy pushed > >> the sed regular expression handling into bash-specific > >> regular expressions when bash was available. He claims > >> there's a significant performance improvement because of > >> reduced forking. > >> > >> http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-18-sed-forks/ > > > > There is an issue in the libtool bug tracker regarding this. > > > > This solution only works with GNU bash. It would be good if volunteers > > could research to see if there are similar solutions which can work with > > other common shells (e.g. dash, ksh, zsh). > > For context, we're trying to speed up: > > sed_quote_subst='s|\([`"$\\]\)|\\\1|g' > _G_unquoted_arg=`printf '%s\n' "$1" |$SED "$sed_quote_subst"` > > How about this, which should be completely portable to XSI shells (alas, > it still uses ${a#b} and ${a%b} at the end, so it is not portable to > ancient Solaris /bin/sh): > > # func_quote STRING > # Escapes all \`"$ in STRING with another \, and stores that in $quoted > func_quote () { > case $1 in > *[\\\`\"\$]*) > save_IFS=$IFS pre=.$1. > for char in '\' '`' '"' '$'; do > post= IFS=$char > for part in $pre; do > post=${post:+$post\\$char}$part > done > pre=$post > done should we test the size of the string first ? i've written such raw shell string parsing functions before, and once you hit a certain size (like 1k+ iirc), forking out to sed is way faster, especially when running in multibyte locales (like UTF8) which most people are doing nowadays. -mike signature.asc Description: Digital signature ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: Bash-specific performance by avoiding sed
On 03/09/2015 01:50 PM, Bob Friesenhahn wrote: > On Mon, 9 Mar 2015, Mike Gran wrote: > >> Hello libtool, >> >> I don't know if y'all saw this blogpost where a guy pushed >> the sed regular expression handling into bash-specific >> regular expressions when bash was available. He claims >> there's a significant performance improvement because of >> reduced forking. >> >> http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-18-sed-forks/ >> > > There is an issue in the libtool bug tracker regarding this. > > This solution only works with GNU bash. It would be good if volunteers > could research to see if there are similar solutions which can work with > other common shells (e.g. dash, ksh, zsh). For context, we're trying to speed up: sed_quote_subst='s|\([`"$\\]\)|\\\1|g' _G_unquoted_arg=`printf '%s\n' "$1" |$SED "$sed_quote_subst"` How about this, which should be completely portable to XSI shells (alas, it still uses ${a#b} and ${a%b} at the end, so it is not portable to ancient Solaris /bin/sh): # func_quote STRING # Escapes all \`"$ in STRING with another \, and stores that in $quoted func_quote () { case $1 in *[\\\`\"\$]*) save_IFS=$IFS pre=.$1. for char in '\' '`' '"' '$'; do post= IFS=$char for part in $pre; do post=${post:+$post\\$char}$part done pre=$post done IFS=$save_IFS post=${post%.} quoted=${post#.} ;; *) quoted=$1 ;; esac } (of course, with proper munging of internal variable names [$save_IFS, $pre, $post, $char, $quoted] so as to be less likely to collide with other variables in use) So, where we were previously doing: _G_unquoted_arg=`printf '%s\n' "$1" | $SED "$sed_quote_subst"` we would now do: func_quote "$1" _G_unquoted_arg=$quoted for the same result, without any forking. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: Bash-specific performance by avoiding sed
On Mon, 9 Mar 2015, Mike Gran wrote: Hello libtool, I don't know if y'all saw this blogpost where a guy pushed the sed regular expression handling into bash-specific regular expressions when bash was available. He claims there's a significant performance improvement because of reduced forking. http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-18-sed-forks/ There is an issue in the libtool bug tracker regarding this. This solution only works with GNU bash. It would be good if volunteers could research to see if there are similar solutions which can work with other common shells (e.g. dash, ksh, zsh). Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ https://lists.gnu.org/mailman/listinfo/libtool
Bash-specific performance by avoiding sed
Hello libtool, I don't know if y'all saw this blogpost where a guy pushed the sed regular expression handling into bash-specific regular expressions when bash was available. He claims there's a significant performance improvement because of reduced forking. http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-18-sed-forks/ -Mike Gran ___ https://lists.gnu.org/mailman/listinfo/libtool