Re: [PATCH v2] log: grep author/committer using mailmap

2012-12-28 Thread Antoine Pelisse
Actually, gprof seems to be unhappy about the number of call to
strbuf_grow() in map_user() (25% of the time spent in map_user() is
spent in strbuf_grow()).

That probably comes from the repeated call to strbuf_addch() when
lowering the email address.
At this point, we are also copying the '\0' for every char we add,
doubling the copy.
This may not be much of a difference, but it seems to be called 15
millions times when running:
$ git log --author='Junio C Hamano' --use-mailmap

Maybe we should come up with another way to lower this email address afterall.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] log: grep author/committer using mailmap

2012-12-28 Thread Junio C Hamano
Antoine Pelisse apeli...@gmail.com writes:

 Actually, gprof seems to be unhappy about the number of call to
 strbuf_grow() in map_user() (25% of the time spent in map_user() is
 spent in strbuf_grow()).

 That probably comes from the repeated call to strbuf_addch() when
 lowering the email address.

This is about your rewritten implementation that hasn't escaped to
the general public but sitting in 'next', right?

Two things that immediately come to mind are:

 - initialization of lowermail can use strbuf_init() instead;
 - downcasing can be done in place, i.e. lowermail.buf[i] = 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] log: grep author/committer using mailmap

2012-12-28 Thread Antoine Pelisse
 This is about your rewritten implementation that hasn't escaped to
 the general public but sitting in 'next', right?

 Two things that immediately come to mind are:

  - initialization of lowermail can use strbuf_init() instead;
  - downcasing can be done in place, i.e. lowermail.buf[i] = 

Yep,
I don't think it's merged to 'next', I will squash those changes appropriately.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] log: grep author/committer using mailmap

2012-12-27 Thread Junio C Hamano
Antoine Pelisse apeli...@gmail.com writes:

 Currently you can use mailmap to display log authors and committers
 but you can't use the mailmap to find commits with mapped values.

 This commit allows you to run:

 git log --use-mailmap --author mapped_name_or_email
 git log --use-mailmap --committer mapped_name_or_email

 Of course it only works if the --use-mailmap option is used.

 Signed-off-by: Antoine Pelisse apeli...@gmail.com
 ---

Thanks.  I'll queue this on top.

-- 8 --
Subject: [PATCH] log --use-mailmap: optimize for cases without 
--author/--committer search

When we taught the commit_match() mechanism to pay attention to the
new --use-mailmap option, we started to unconditionally copy the
commit object to a temporary buffer, just in case we need the author
and committer lines updated via the mailmap mechanism.

It turns out that this has a rather unpleasant performance
implications.  In the linux kernel repository, running

  $ git log --author='Junio C Hamano' --pretty=short /dev/null

under /usr/bin/time, with and without --use-mailmap (the .mailmap
file is 118 entries long, the particular author does not appear in
it), cost (with warm cache):

  [without --use-mailmap]
  5.34user 0.25system 0:05.60elapsed 100%CPU (0avgtext+0avgdata 
2004832maxresident)k
  0inputs+0outputs (0major+137600minor)pagefaults 0swaps

  [with --use-mailmap]
  6.87user 0.24system 0:07.11elapsed 99%CPU (0avgtext+0avgdata 
2006352maxresident)k
  0inputs+0outputs (0major+137696minor)pagefaults 0swaps

which is with about 29% overhead.  The command is doing extra work,
so the extra cost may be justified.

But it is inexcusable to pay the cost when we do not need
author/committer match.  In the same repository,

  $ git log --grep='fix menuconfig on debian lenny' --pretty=short /dev/null

shows very similar numbers as the above:

  [without --use-mailmap]
  5.30user 0.24system 0:05.55elapsed 99%CPU (0avgtext+0avgdata 
2004896maxresident)k
  0inputs+0outputs (0major+137604minor)pagefaults 0swaps

  [with --use-mailmap]
  6.82user 0.26system 0:07.07elapsed 100%CPU (0avgtext+0avgdata 
2006352maxresident)k
  0inputs+0outputs (0major+137697minor)pagefaults 0swaps

The latter case is an unnecessary performance regression.  We may
want to _show_ the result with mailmap applied, but we do not have
to copy and rewrite the author/committer of all commits we try to
match if we do not query for these fields.

Trivially optimize this performace regression by limiting the
rewrites for only when we are matching with author/committer fields.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 revision.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/revision.c b/revision.c
index fa16b9d..56b72f7 100644
--- a/revision.c
+++ b/revision.c
@@ -2283,7 +2283,7 @@ static int commit_match(struct commit *commit, struct 
rev_info *opt)
if (buf.len)
strbuf_addstr(buf, commit-buffer);
 
-   if (opt-mailmap) {
+   if (opt-grep_filter.header_list  opt-mailmap) {
if (!buf.len)
strbuf_addstr(buf, commit-buffer);
 
-- 
1.8.1.rc3.221.g8d09d94

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] log: grep author/committer using mailmap

2012-12-27 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Thanks.  I'll queue this on top.

 -- 8 --
 Subject: [PATCH] log --use-mailmap: optimize for cases without 
 --author/--committer search

And this I will *not* queue further on top.

-- 8 --
Subject: [PATCH] [DO NOT USE] log --use-mailmap --author/--committer: further
 optimize identity rewriting

We used to always allocate a temporary buffer to search for
author/committer names even when the author/committer does not
appear in the mailmap.  Update the logic to do the allocation
on demand to reduce needless malloc()/free() calls.

It turns out that this does not affect performance at all; all the
time is spent in map_user() which is a look-up in string_list, so
let's not use this patch in the production, but it illustrates an
interesting point.

In map_identities() function, we already know who the author
recorded in the commit is, either in author strbuf, or in buffer
between [a_at..a_len], so we could grep_buffer() the author
regexp(s) specified from the command line right there, and combine
that result with the main grep_buffer() done for the --grep= option
at the end of the commit_match() function.

That may essentially amount to going in the totally opposite
direction from what 2d10c55 (git log: Unify header_filter and
message_filter into one., 2006-09-20) attempted to do.  We used to
have two grep expressions (one for header, the other one for body)
commit_match() runs grep_buffer() on and combined the results.
2d10c55 merged them into one grep expression by introducing a term
that matches only header elements.  But we would instead split the
header expression into author and committer expressions
(making it three from one) if we go that route.

In any case, I *think* the bottleneck is in map_user() so until that
is solved, such an update (or this patch) is not very useful.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 revision.c | 95 +-
 1 file changed, 57 insertions(+), 38 deletions(-)

diff --git a/revision.c b/revision.c
index 56b72f7..4b66598 100644
--- a/revision.c
+++ b/revision.c
@@ -2220,49 +2220,73 @@ static int rewrite_parents(struct rev_info *revs, 
struct commit *commit)
return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct 
string_list *mailmap)
+static void map_person(const char *buf, size_t len, const char *head, int 
headlen,
+  struct strbuf *result, struct string_list *mailmap,
+  int *matchofs, int *matchlen)
 {
-   char *person, *endp;
-   size_t len;
+   struct ident_split ident;
+   const char *endp, *person;
struct strbuf name = STRBUF_INIT;
struct strbuf mail = STRBUF_INIT;
-   struct ident_split ident;
 
-   person = strstr(buf-buf, what);
+   person = memmem(buf, len, head, headlen);
if (!person)
-   goto left_intact;
-
-   person += strlen(what);
+   return;
+   person += headlen;
endp = strchr(person, '\n');
if (!endp)
-   goto left_intact;
-
-   len = endp - person;
-
-   if (split_ident_line(ident, person, len))
-   goto left_intact;
-
+   return;
+   *matchofs = person - buf;
+   *matchlen = endp - person;
+   if (split_ident_line(ident, person, *matchlen))
+   return;
strbuf_add(name, ident.name_begin, ident.name_end - ident.name_begin);
strbuf_add(mail, ident.mail_begin, ident.mail_end - ident.mail_begin);
-
-   if (map_user(mailmap, mail, name)) {
-   strbuf_addf(name,  %s, mail.buf);
-
-   strbuf_splice(buf, ident.name_begin - buf-buf,
- ident.mail_end - ident.name_begin + 1,
- name.buf, name.len);
-
-   strbuf_release(name);
-   strbuf_release(mail);
-
-   return 1;
-   }
-
-left_intact:
+   if (map_user(mailmap, mail, name))
+   strbuf_addf(result, %s %s, name.buf, mail.buf);
strbuf_release(name);
strbuf_release(mail);
+}
 
-   return 0;
+static void map_identities(struct strbuf *buf, const char *buffer, struct 
string_list *mailmap)
+{
+   const char *eob;
+   struct strbuf author = STRBUF_INIT;
+   struct strbuf committer = STRBUF_INIT;
+   int a_at = -1, a_len, c_at = -1, c_len;
+
+   eob = strstr(buffer, \n\n);
+   if (!eob)
+   eob = buffer + strlen(buffer);
+   map_person(buffer, eob - buffer, \nauthor , 8, author, mailmap,
+  a_at, a_len);
+   map_person(buffer, eob - buffer, \ncommitter , 11, committer, 
mailmap,
+  c_at, c_len);
+   if (!author.len  !committer.len)
+   goto done;
+
+   /* Now, we know we need rewriting */
+   if (!buf-len)
+   strbuf_addstr(buf, buffer);
+
+   if (c_at  0 || !committer.len)