Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions

2012-11-12 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Nov 12, 2012 at 12:08:37PM -0800, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > Changing the interface for get_author_ident_from_commit would be a pain,
>> > but if we just wanted to help filter-branch, we could do something like
>> > this:
>> 
>> Yes, that is the direction I was alluding to.
>> 
>> Callers of get_author_ident_from_commit can also do the same and
>> avoid rebuilding the same $pick_author_script over and over again,
>> or get_author_ident_from_commit can do so for its callers.
>
> I don't think get_author_ident_from_commit can do so on demand due to
> the subshell issue I mentioned.

OK, that is what I obviously missed.

> I wonder if we should simply generate these at build time and store them
> in the script.

Yeah, my knee-jerk reaction was that it was wasteful to run script
at runtime to build a constant string, but the engineering cost may
not be worth it.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions

2012-11-12 Thread Jeff King
On Mon, Nov 12, 2012 at 12:08:37PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Changing the interface for get_author_ident_from_commit would be a pain,
> > but if we just wanted to help filter-branch, we could do something like
> > this:
> 
> Yes, that is the direction I was alluding to.
> 
> Callers of get_author_ident_from_commit can also do the same and
> avoid rebuilding the same $pick_author_script over and over again,
> or get_author_ident_from_commit can do so for its callers.

I don't think get_author_ident_from_commit can do so on demand due to
the subshell issue I mentioned. So you'd have to generate the pick
script on inclusion of git-sh-setup, whether the includer wants to call
the function or not.

I wonder if we should simply generate these at build time and store them
in the script.

I'm also not sure that saving one process invocation is worth spending a
lot of thought cycles on. Maybe somebody on Windows could run a
filter-branch with the patch I just sent and measure whether it even
makes a difference.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions

2012-11-12 Thread Junio C Hamano
Jeff King  writes:

> Changing the interface for get_author_ident_from_commit would be a pain,
> but if we just wanted to help filter-branch, we could do something like
> this:

Yes, that is the direction I was alluding to.

Callers of get_author_ident_from_commit can also do the same and
avoid rebuilding the same $pick_author_script over and over again,
or get_author_ident_from_commit can do so for its callers.

> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 5314249..7a693ba 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -74,7 +74,7 @@ finish_ident() {
>  }
>  
>  set_ident () {
> - parse_ident_from_commit author AUTHOR committer COMMITTER
> + parse_ident_from_commit_via_script "$ident_script"
>   finish_ident AUTHOR
>   finish_ident COMMITTER
>  }
> @@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then
>   require_clean_work_tree 'rewrite branches'
>  fi
>  
> +ident_script=$(pick_ident_script author AUTHOR committer COMMITTER)
>  tempdir=.git-rewrite
>  filter_env=
>  filter_tree=
> diff --git a/git-sh-setup.sh b/git-sh-setup.sh
> index 22f0aed..1e20e17 100644
> --- a/git-sh-setup.sh
> +++ b/git-sh-setup.sh
> @@ -225,10 +225,17 @@ pick_ident_script () {
>   echo '/^$/q'
>  }
>  
> +# Feed a pick_ident_script return value to sed. Use this instead of
> +# parse_ident_from_commit below if you are going to be parsing commits in a
> +# tight loop and want to save a process.
> +parse_ident_from_commit_via_script() {
> + LANG=C LC_ALL=C sed -ne "$1"
> +}
> +
>  # Create a pick-script as above and feed it to sed. Stdout is suitable for
>  # feeding to eval.
>  parse_ident_from_commit () {
> - LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
> + parse_ident_from_commit_via_script "$(pick_ident_script "$@")"
>  }
>  
>  # Parse the author from a commit given as an argument. Stdout is suitable for
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions

2012-11-12 Thread Jeff King
On Mon, Nov 12, 2012 at 09:44:01AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The only ident-parsing function we currently provide is
> > get_author_ident_from_commit. This is not very
> > flexible for two reasons:
> >
> >   1. It takes a commit as an argument, and can't read from
> >  commit headers saved on disk.
> >
> >   2. It will only parse authors, not committers.
> >
> > This patch provides a more flexible interface which will
> > parse multiple idents from a commit provide on stdin. We can
> > easily use it as a building block for the current function
> > to retain compatibility.
> >
> > Signed-off-by: Jeff King 
> > ---
> > Since we are counting processes in this series, I should note that this
> > actually adds a subshell invocation for each call, since it went from:
> >
> >   script='...'
> >   sed $script
> >
> > to:
> >
> >   sed "$(make_script)"
> >
> > For filter-branch, which is really the only high-performance caller we
> > have, this is negated by the fact that it will do author and committer
> > at the same time, saving us an extra subshell (in addition to an extra
> > sed invocation).
> 
> Given that pick-ident-script is a const function, a caller that
> repeatedly call is could call it once and use it in a variable, no?

The problem is that it is a helper called from parse_ident_from_commit.
And that function just passes along its arguments, so it does not know
that it is being called repeatedly with the same arguments. So you'd
have to either change the interface or memoize internally.

I don't think memoization is a good option for two reasons:

  1. Storing the arguments to compare to later is complex. You don't
 want to just store "$*" from the last run and see if we got the
 same arguments. You'd have to quote your delimiter (e.g., you would
 not want to confuse ("foo", "bar") with ("foo bar"). Though in this
 instance, we know that our args do not have spaces, so we could get
 away with that.

  2. If you are in a subshell or even a while loop, your memoized
 variable will not be retained.

So unless somebody has some clever scheme for memoizing shell functions
without any process overhead, it is probably not worth it.

Changing the interface for get_author_ident_from_commit would be a pain,
but if we just wanted to help filter-branch, we could do something like
this:

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5314249..7a693ba 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -74,7 +74,7 @@ finish_ident() {
 }
 
 set_ident () {
-   parse_ident_from_commit author AUTHOR committer COMMITTER
+   parse_ident_from_commit_via_script "$ident_script"
finish_ident AUTHOR
finish_ident COMMITTER
 }
@@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then
require_clean_work_tree 'rewrite branches'
 fi
 
+ident_script=$(pick_ident_script author AUTHOR committer COMMITTER)
 tempdir=.git-rewrite
 filter_env=
 filter_tree=
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 22f0aed..1e20e17 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -225,10 +225,17 @@ pick_ident_script () {
echo '/^$/q'
 }
 
+# Feed a pick_ident_script return value to sed. Use this instead of
+# parse_ident_from_commit below if you are going to be parsing commits in a
+# tight loop and want to save a process.
+parse_ident_from_commit_via_script() {
+   LANG=C LC_ALL=C sed -ne "$1"
+}
+
 # Create a pick-script as above and feed it to sed. Stdout is suitable for
 # feeding to eval.
 parse_ident_from_commit () {
-   LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
+   parse_ident_from_commit_via_script "$(pick_ident_script "$@")"
 }
 
 # Parse the author from a commit given as an argument. Stdout is suitable for
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions

2012-11-12 Thread Junio C Hamano
Jeff King  writes:

> The only ident-parsing function we currently provide is
> get_author_ident_from_commit. This is not very
> flexible for two reasons:
>
>   1. It takes a commit as an argument, and can't read from
>  commit headers saved on disk.
>
>   2. It will only parse authors, not committers.
>
> This patch provides a more flexible interface which will
> parse multiple idents from a commit provide on stdin. We can
> easily use it as a building block for the current function
> to retain compatibility.
>
> Signed-off-by: Jeff King 
> ---
> Since we are counting processes in this series, I should note that this
> actually adds a subshell invocation for each call, since it went from:
>
>   script='...'
>   sed $script
>
> to:
>
>   sed "$(make_script)"
>
> For filter-branch, which is really the only high-performance caller we
> have, this is negated by the fact that it will do author and committer
> at the same time, saving us an extra subshell (in addition to an extra
> sed invocation).

Given that pick-ident-script is a const function, a caller that
repeatedly call is could call it once and use it in a variable, no?

>
>  git-sh-setup.sh | 62 
> +++--
>  1 file changed, 43 insertions(+), 19 deletions(-)
>
> diff --git a/git-sh-setup.sh b/git-sh-setup.sh
> index ee0e0bc..22f0aed 100644
> --- a/git-sh-setup.sh
> +++ b/git-sh-setup.sh
> @@ -191,28 +191,52 @@ get_author_ident_from_commit () {
>   fi
>  }
>  
> +# Generate a sed script to parse identities from a commit.
> +#
> +# Reads the commit from stdin, which should be in raw format (e.g., from
> +# cat-file or "--pretty=raw").
> +#
> +# The first argument specifies the ident line to parse (e.g., "author"), and
> +# the second specifies the environment variable to put it in (e.g., "AUTHOR"
> +# for "GIT_AUTHOR_*"). Multiple pairs can be given to parse author and
> +# committer.
> +pick_ident_script () {
> + while test $# -gt 0
> + do
> + lid=$1; shift
> + uid=$1; shift
> + printf '%s' "
> + /^$lid /{
> + s/'/'''/g
> + h
> + s/^$lid "'\([^<]*\) <[^>]*> .*$/\1/'"
> + s/.*/GIT_${uid}_NAME='&'/p
> +
> + g
> + s/^$lid "'[^<]* <\([^>]*\)> .*$/\1/'"
> + s/.*/GIT_${uid}_EMAIL='&'/p
> +
> + g
> + s/^$lid "'[^<]* <[^>]*> \(.*\)$/@\1/'"
> + s/.*/GIT_${uid}_DATE='&'/p
> + }
> + "
> + done
> + echo '/^$/q'
> +}
> +
> +# Create a pick-script as above and feed it to sed. Stdout is suitable for
> +# feeding to eval.
> +parse_ident_from_commit () {
> + LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
> +}
> +
> +# Parse the author from a commit given as an argument. Stdout is suitable for
> +# feeding to eval to set the usual GIT_* ident variables.
>  get_author_ident_from_commit () {
> - pick_author_script='
> - /^author /{
> - s/'\''/'\''\\'\'\''/g
> - h
> - s/^author \([^<]*\) <[^>]*> .*$/\1/
> - s/.*/GIT_AUTHOR_NAME='\''&'\''/p
> -
> - g
> - s/^author [^<]* <\([^>]*\)> .*$/\1/
> - s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p
> -
> - g
> - s/^author [^<]* <[^>]*> \(.*\)$/@\1/
> - s/.*/GIT_AUTHOR_DATE='\''&'\''/p
> -
> - q
> - }
> - '
>   encoding=$(git config i18n.commitencoding || echo UTF-8)
>   git show -s --pretty=raw --encoding="$encoding" "$1" -- |
> - LANG=C LC_ALL=C sed -ne "$pick_author_script"
> + parse_ident_from_commit author AUTHOR
>  }
>  
>  # Clear repo-local GIT_* environment variables. Useful when switching to
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] git-sh-setup: refactor ident-parsing functions

2012-10-18 Thread Jeff King
The only ident-parsing function we currently provide is
get_author_ident_from_commit. This is not very
flexible for two reasons:

  1. It takes a commit as an argument, and can't read from
 commit headers saved on disk.

  2. It will only parse authors, not committers.

This patch provides a more flexible interface which will
parse multiple idents from a commit provide on stdin. We can
easily use it as a building block for the current function
to retain compatibility.

Signed-off-by: Jeff King 
---
Since we are counting processes in this series, I should note that this
actually adds a subshell invocation for each call, since it went from:

  script='...'
  sed $script

to:

  sed "$(make_script)"

For filter-branch, which is really the only high-performance caller we
have, this is negated by the fact that it will do author and committer
at the same time, saving us an extra subshell (in addition to an extra
sed invocation).

 git-sh-setup.sh | 62 +++--
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index ee0e0bc..22f0aed 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -191,28 +191,52 @@ get_author_ident_from_commit () {
fi
 }
 
+# Generate a sed script to parse identities from a commit.
+#
+# Reads the commit from stdin, which should be in raw format (e.g., from
+# cat-file or "--pretty=raw").
+#
+# The first argument specifies the ident line to parse (e.g., "author"), and
+# the second specifies the environment variable to put it in (e.g., "AUTHOR"
+# for "GIT_AUTHOR_*"). Multiple pairs can be given to parse author and
+# committer.
+pick_ident_script () {
+   while test $# -gt 0
+   do
+   lid=$1; shift
+   uid=$1; shift
+   printf '%s' "
+   /^$lid /{
+   s/'/'''/g
+   h
+   s/^$lid "'\([^<]*\) <[^>]*> .*$/\1/'"
+   s/.*/GIT_${uid}_NAME='&'/p
+
+   g
+   s/^$lid "'[^<]* <\([^>]*\)> .*$/\1/'"
+   s/.*/GIT_${uid}_EMAIL='&'/p
+
+   g
+   s/^$lid "'[^<]* <[^>]*> \(.*\)$/@\1/'"
+   s/.*/GIT_${uid}_DATE='&'/p
+   }
+   "
+   done
+   echo '/^$/q'
+}
+
+# Create a pick-script as above and feed it to sed. Stdout is suitable for
+# feeding to eval.
+parse_ident_from_commit () {
+   LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
+}
+
+# Parse the author from a commit given as an argument. Stdout is suitable for
+# feeding to eval to set the usual GIT_* ident variables.
 get_author_ident_from_commit () {
-   pick_author_script='
-   /^author /{
-   s/'\''/'\''\\'\'\''/g
-   h
-   s/^author \([^<]*\) <[^>]*> .*$/\1/
-   s/.*/GIT_AUTHOR_NAME='\''&'\''/p
-
-   g
-   s/^author [^<]* <\([^>]*\)> .*$/\1/
-   s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p
-
-   g
-   s/^author [^<]* <[^>]*> \(.*\)$/@\1/
-   s/.*/GIT_AUTHOR_DATE='\''&'\''/p
-
-   q
-   }
-   '
encoding=$(git config i18n.commitencoding || echo UTF-8)
git show -s --pretty=raw --encoding="$encoding" "$1" -- |
-   LANG=C LC_ALL=C sed -ne "$pick_author_script"
+   parse_ident_from_commit author AUTHOR
 }
 
 # Clear repo-local GIT_* environment variables. Useful when switching to
-- 
1.8.0.rc3.3.gba630e1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html