Re: [PATCH v2] log: grep author/committer using mailmap
Actually, gprof seems to be unhappy about the number of call to strbuf_grow() in map_user() (25% of the time spent in map_user() is spent in strbuf_grow()). That probably comes from the repeated call to strbuf_addch() when lowering the email address. At this point, we are also copying the '\0' for every char we add, doubling the copy. This may not be much of a difference, but it seems to be called 15 millions times when running: $ git log --author='Junio C Hamano' --use-mailmap Maybe we should come up with another way to lower this email address afterall. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] log: grep author/committer using mailmap
Antoine Pelisse apeli...@gmail.com writes: Actually, gprof seems to be unhappy about the number of call to strbuf_grow() in map_user() (25% of the time spent in map_user() is spent in strbuf_grow()). That probably comes from the repeated call to strbuf_addch() when lowering the email address. This is about your rewritten implementation that hasn't escaped to the general public but sitting in 'next', right? Two things that immediately come to mind are: - initialization of lowermail can use strbuf_init() instead; - downcasing can be done in place, i.e. lowermail.buf[i] = -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] log: grep author/committer using mailmap
This is about your rewritten implementation that hasn't escaped to the general public but sitting in 'next', right? Two things that immediately come to mind are: - initialization of lowermail can use strbuf_init() instead; - downcasing can be done in place, i.e. lowermail.buf[i] = Yep, I don't think it's merged to 'next', I will squash those changes appropriately. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] log: grep author/committer using mailmap
Antoine Pelisse apeli...@gmail.com writes: Currently you can use mailmap to display log authors and committers but you can't use the mailmap to find commits with mapped values. This commit allows you to run: git log --use-mailmap --author mapped_name_or_email git log --use-mailmap --committer mapped_name_or_email Of course it only works if the --use-mailmap option is used. Signed-off-by: Antoine Pelisse apeli...@gmail.com --- Thanks. I'll queue this on top. -- 8 -- Subject: [PATCH] log --use-mailmap: optimize for cases without --author/--committer search When we taught the commit_match() mechanism to pay attention to the new --use-mailmap option, we started to unconditionally copy the commit object to a temporary buffer, just in case we need the author and committer lines updated via the mailmap mechanism. It turns out that this has a rather unpleasant performance implications. In the linux kernel repository, running $ git log --author='Junio C Hamano' --pretty=short /dev/null under /usr/bin/time, with and without --use-mailmap (the .mailmap file is 118 entries long, the particular author does not appear in it), cost (with warm cache): [without --use-mailmap] 5.34user 0.25system 0:05.60elapsed 100%CPU (0avgtext+0avgdata 2004832maxresident)k 0inputs+0outputs (0major+137600minor)pagefaults 0swaps [with --use-mailmap] 6.87user 0.24system 0:07.11elapsed 99%CPU (0avgtext+0avgdata 2006352maxresident)k 0inputs+0outputs (0major+137696minor)pagefaults 0swaps which is with about 29% overhead. The command is doing extra work, so the extra cost may be justified. But it is inexcusable to pay the cost when we do not need author/committer match. In the same repository, $ git log --grep='fix menuconfig on debian lenny' --pretty=short /dev/null shows very similar numbers as the above: [without --use-mailmap] 5.30user 0.24system 0:05.55elapsed 99%CPU (0avgtext+0avgdata 2004896maxresident)k 0inputs+0outputs (0major+137604minor)pagefaults 0swaps [with --use-mailmap] 6.82user 0.26system 0:07.07elapsed 100%CPU (0avgtext+0avgdata 2006352maxresident)k 0inputs+0outputs (0major+137697minor)pagefaults 0swaps The latter case is an unnecessary performance regression. We may want to _show_ the result with mailmap applied, but we do not have to copy and rewrite the author/committer of all commits we try to match if we do not query for these fields. Trivially optimize this performace regression by limiting the rewrites for only when we are matching with author/committer fields. Signed-off-by: Junio C Hamano gits...@pobox.com --- revision.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/revision.c b/revision.c index fa16b9d..56b72f7 100644 --- a/revision.c +++ b/revision.c @@ -2283,7 +2283,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt) if (buf.len) strbuf_addstr(buf, commit-buffer); - if (opt-mailmap) { + if (opt-grep_filter.header_list opt-mailmap) { if (!buf.len) strbuf_addstr(buf, commit-buffer); -- 1.8.1.rc3.221.g8d09d94 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] log: grep author/committer using mailmap
Junio C Hamano gits...@pobox.com writes: Thanks. I'll queue this on top. -- 8 -- Subject: [PATCH] log --use-mailmap: optimize for cases without --author/--committer search And this I will *not* queue further on top. -- 8 -- Subject: [PATCH] [DO NOT USE] log --use-mailmap --author/--committer: further optimize identity rewriting We used to always allocate a temporary buffer to search for author/committer names even when the author/committer does not appear in the mailmap. Update the logic to do the allocation on demand to reduce needless malloc()/free() calls. It turns out that this does not affect performance at all; all the time is spent in map_user() which is a look-up in string_list, so let's not use this patch in the production, but it illustrates an interesting point. In map_identities() function, we already know who the author recorded in the commit is, either in author strbuf, or in buffer between [a_at..a_len], so we could grep_buffer() the author regexp(s) specified from the command line right there, and combine that result with the main grep_buffer() done for the --grep= option at the end of the commit_match() function. That may essentially amount to going in the totally opposite direction from what 2d10c55 (git log: Unify header_filter and message_filter into one., 2006-09-20) attempted to do. We used to have two grep expressions (one for header, the other one for body) commit_match() runs grep_buffer() on and combined the results. 2d10c55 merged them into one grep expression by introducing a term that matches only header elements. But we would instead split the header expression into author and committer expressions (making it three from one) if we go that route. In any case, I *think* the bottleneck is in map_user() so until that is solved, such an update (or this patch) is not very useful. Signed-off-by: Junio C Hamano gits...@pobox.com --- revision.c | 95 +- 1 file changed, 57 insertions(+), 38 deletions(-) diff --git a/revision.c b/revision.c index 56b72f7..4b66598 100644 --- a/revision.c +++ b/revision.c @@ -2220,49 +2220,73 @@ static int rewrite_parents(struct rev_info *revs, struct commit *commit) return 0; } -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap) +static void map_person(const char *buf, size_t len, const char *head, int headlen, + struct strbuf *result, struct string_list *mailmap, + int *matchofs, int *matchlen) { - char *person, *endp; - size_t len; + struct ident_split ident; + const char *endp, *person; struct strbuf name = STRBUF_INIT; struct strbuf mail = STRBUF_INIT; - struct ident_split ident; - person = strstr(buf-buf, what); + person = memmem(buf, len, head, headlen); if (!person) - goto left_intact; - - person += strlen(what); + return; + person += headlen; endp = strchr(person, '\n'); if (!endp) - goto left_intact; - - len = endp - person; - - if (split_ident_line(ident, person, len)) - goto left_intact; - + return; + *matchofs = person - buf; + *matchlen = endp - person; + if (split_ident_line(ident, person, *matchlen)) + return; strbuf_add(name, ident.name_begin, ident.name_end - ident.name_begin); strbuf_add(mail, ident.mail_begin, ident.mail_end - ident.mail_begin); - - if (map_user(mailmap, mail, name)) { - strbuf_addf(name, %s, mail.buf); - - strbuf_splice(buf, ident.name_begin - buf-buf, - ident.mail_end - ident.name_begin + 1, - name.buf, name.len); - - strbuf_release(name); - strbuf_release(mail); - - return 1; - } - -left_intact: + if (map_user(mailmap, mail, name)) + strbuf_addf(result, %s %s, name.buf, mail.buf); strbuf_release(name); strbuf_release(mail); +} - return 0; +static void map_identities(struct strbuf *buf, const char *buffer, struct string_list *mailmap) +{ + const char *eob; + struct strbuf author = STRBUF_INIT; + struct strbuf committer = STRBUF_INIT; + int a_at = -1, a_len, c_at = -1, c_len; + + eob = strstr(buffer, \n\n); + if (!eob) + eob = buffer + strlen(buffer); + map_person(buffer, eob - buffer, \nauthor , 8, author, mailmap, + a_at, a_len); + map_person(buffer, eob - buffer, \ncommitter , 11, committer, mailmap, + c_at, c_len); + if (!author.len !committer.len) + goto done; + + /* Now, we know we need rewriting */ + if (!buf-len) + strbuf_addstr(buf, buffer); + + if (c_at 0 || !committer.len)