Re: [PATCH] use strbuf_add_unique_abbrev() for adding short hashes, part 3

2016-10-10 Thread Jeff King
On Mon, Oct 10, 2016 at 10:46:21PM +0200, René Scharfe wrote:

> Good question.  ALLOC_GROW() doesn't double exactly, but indeed the
> number of reallocations depends on the size of the added pieces.  I
> always thought of strbuf_addf() as an expensive function for
> convenience, but never timed it.
> 
> Numbers vary a bit, but here's what I get for the crude test program
> at the end:
> 
> $ time t/helper/test-strbuf strbuf_addf 123 123456789012345678901234567890
> 123123456789012345678901234567890
> 
> real0m0.168s
> user0m0.164s
> sys 0m0.000s
> $ time t/helper/test-strbuf strbuf_addstr 123 123456789012345678901234567890
> 123123456789012345678901234567890
> 
> real0m0.141s
> user0m0.140s
> sys 0m0.000s
> 
> Just a data-point, but it confirms my bias, so I stop here. :)

Heh. I'm surprised it's that big a difference, as processing simple
printf strings should be pretty quick. I guess what happens in your
program is that your strings almost always require a re-allocation
(because you've just released, and we start small), and we literally end
up doing a partial copy via vsnprintf(), realizing we're out of space,
reallocating, and then running it again.

So it's noticeably worse when we _do_ reallocate, but usually that
should be amortized across many calls (and if it isn't, then you are
paying the much bigger price of lots of mallocs, and you should optimize
that first :) ).

That being said, it doesn't seem like it would be _worse_ to move from
addf to multiple addstrs.

-Peff


Re: [PATCH] use strbuf_add_unique_abbrev() for adding short hashes, part 3

2016-10-10 Thread René Scharfe
Am 10.10.2016 um 02:00 schrieb Jeff King:
> On Sat, Oct 08, 2016 at 05:38:47PM +0200, René Scharfe wrote:
> 
>> Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
>> instead of taking detours through find_unique_abbrev() and its static
>> buffer.  This is shorter in most cases and a bit more efficient.
>>
>> The changes here are not easily handled by a semantic patch because
>> they involve removing temporary variables and deconstructing format
>> strings for strbuf_addf().
> 
> Yeah, the thing to look for here is whether the sha1 variable holds the
> same value at both times.
> 
> These all look OK to me. Mild rambling below.
> 
>> diff --git a/merge-recursive.c b/merge-recursive.c
>> index 5200d5c..9041c2f 100644
>> --- a/merge-recursive.c
>> +++ b/merge-recursive.c
>> @@ -202,9 +202,9 @@ static void output_commit_title(struct merge_options *o, 
>> struct commit *commit)
>>  strbuf_addf(>obuf, "virtual %s\n",
>>  merge_remote_util(commit)->name);
>>  else {
>> -strbuf_addf(>obuf, "%s ",
>> -find_unique_abbrev(commit->object.oid.hash,
>> -DEFAULT_ABBREV));
>> +strbuf_add_unique_abbrev(>obuf, commit->object.oid.hash,
>> + DEFAULT_ABBREV);
>> +strbuf_addch(>obuf, ' ');
> 
> I've often wondered whether a big strbuf_addf() is more efficient than
> several strbuf_addstrs. It amortizes the size-checks, certainly, though
> those are probably not very big. It shouldn't matter much for amortizing
> the cost of malloc, as it's very unlikely to have a case where:
> 
>   strbuf_addf("%s%s", foo, bar);
> 
> would require one malloc, but:
> 
>   strbuf_addstr(foo);
>   strbuf_addstr(bar);
> 
> would require two (one of the strings would have to be around the same
> size as the ALLOC_GROW() doubling).
> 
> So it probably doesn't matter much in practice (not that most of these
> cases are very performance sensitive anyway). Mostly just something I've
> pondered while tweaking strbuf invocations.

Good question.  ALLOC_GROW() doesn't double exactly, but indeed the
number of reallocations depends on the size of the added pieces.  I
always thought of strbuf_addf() as an expensive function for
convenience, but never timed it.

Numbers vary a bit, but here's what I get for the crude test program
at the end:

$ time t/helper/test-strbuf strbuf_addf 123 123456789012345678901234567890
123123456789012345678901234567890

real0m0.168s
user0m0.164s
sys 0m0.000s
$ time t/helper/test-strbuf strbuf_addstr 123 123456789012345678901234567890
123123456789012345678901234567890

real0m0.141s
user0m0.140s
sys 0m0.000s

Just a data-point, but it confirms my bias, so I stop here. :)

> Just thinking aloud, I've also wondered if strbuf_addoid() would be
> handy.  We already have oid_to_hex_r(). In fact, you could write it
> already as:
> 
>   strbuf_add_unique_abbrev(sb, oidp->hash, 0);
> 
> but that is probably not helping maintainability. ;)

Yes, a function for adding full hashes without using a static
variable is useful for library functions that may end up being
called often or in parallel.  I'd call it strbuf_add_hash,
though.



diff --git a/Makefile b/Makefile
index 1aad150..ad5e98c 100644
--- a/Makefile
+++ b/Makefile
@@ -628,6 +628,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sha1-array
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-strbuf
 TEST_PROGRAMS_NEED_X += test-string-list
 TEST_PROGRAMS_NEED_X += test-submodule-config
 TEST_PROGRAMS_NEED_X += test-subprocess
diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c
new file mode 100644
index 000..177f3e2
--- /dev/null
+++ b/t/helper/test-strbuf.c
@@ -0,0 +1,25 @@
+#include "cache.h"
+
+int cmd_main(int argc, const char **argv)
+{
+   struct strbuf sb = STRBUF_INIT;
+   int i = 100;
+
+   if (argc == 4) {
+   if (!strcmp(argv[1], "strbuf_addf")) {
+   while (i--) {
+   strbuf_release();
+   strbuf_addf(, "%s%s", argv[2], argv[3]);
+   }
+   }
+   if (!strcmp(argv[1], "strbuf_addstr")) {
+   while (i--) {
+   strbuf_release();
+   strbuf_addstr(, argv[2]);
+   strbuf_addstr(, argv[3]);
+   }
+   }
+   puts(sb.buf);
+   }
+   return 0;
+}



Re: [PATCH] use strbuf_add_unique_abbrev() for adding short hashes, part 3

2016-10-09 Thread Jeff King
On Sat, Oct 08, 2016 at 05:38:47PM +0200, René Scharfe wrote:

> Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
> instead of taking detours through find_unique_abbrev() and its static
> buffer.  This is shorter in most cases and a bit more efficient.
> 
> The changes here are not easily handled by a semantic patch because
> they involve removing temporary variables and deconstructing format
> strings for strbuf_addf().

Yeah, the thing to look for here is whether the sha1 variable holds the
same value at both times.

These all look OK to me. Mild rambling below.

> diff --git a/merge-recursive.c b/merge-recursive.c
> index 5200d5c..9041c2f 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -202,9 +202,9 @@ static void output_commit_title(struct merge_options *o, 
> struct commit *commit)
>   strbuf_addf(>obuf, "virtual %s\n",
>   merge_remote_util(commit)->name);
>   else {
> - strbuf_addf(>obuf, "%s ",
> - find_unique_abbrev(commit->object.oid.hash,
> - DEFAULT_ABBREV));
> + strbuf_add_unique_abbrev(>obuf, commit->object.oid.hash,
> +  DEFAULT_ABBREV);
> + strbuf_addch(>obuf, ' ');

I've often wondered whether a big strbuf_addf() is more efficient than
several strbuf_addstrs. It amortizes the size-checks, certainly, though
those are probably not very big. It shouldn't matter much for amortizing
the cost of malloc, as it's very unlikely to have a case where:

  strbuf_addf("%s%s", foo, bar);

would require one malloc, but:

  strbuf_addstr(foo);
  strbuf_addstr(bar);

would require two (one of the strings would have to be around the same
size as the ALLOC_GROW() doubling).

So it probably doesn't matter much in practice (not that most of these
cases are very performance sensitive anyway). Mostly just something I've
pondered while tweaking strbuf invocations.

And anyway, in this case replacing find_unique_abbrev lets us write
directly into the final buffer rather than copying through a static
buffer, so it's almost certainly a win there.

> diff --git a/pretty.c b/pretty.c
> index 25efbca..0c31495 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -544,15 +544,13 @@ static void add_merge_info(const struct 
> pretty_print_context *pp,
>   strbuf_addstr(sb, "Merge:");
>  
>   while (parent) {
> - struct commit *p = parent->item;
> - const char *hex = NULL;
> + struct object_id *oidp = >item->object.oid;
> + strbuf_addch(sb, ' ');
>   if (pp->abbrev)
> - hex = find_unique_abbrev(p->object.oid.hash, 
> pp->abbrev);
> - if (!hex)
> - hex = oid_to_hex(>object.oid);

Wow, this existing code was hard to follow. I wondered when
find_unique_abbrev() would return NULL, but it never does. This "if
(!hex)" should have been an "else" all along.

> + strbuf_add_unique_abbrev(sb, oidp->hash, pp->abbrev);
> + else
> + strbuf_addstr(sb, oid_to_hex(oidp));

...which I see you changed here, though perhaps it is not immediately
obvious that it is correct without knowing find_unique_abbrev's return
value assumptions.

Just thinking aloud, I've also wondered if strbuf_addoid() would be
handy.  We already have oid_to_hex_r(). In fact, you could write it
already as:

  strbuf_add_unique_abbrev(sb, oidp->hash, 0);

but that is probably not helping maintainability. ;)

> diff --git a/submodule.c b/submodule.c
> index 2de06a3..476f60b 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -392,10 +392,9 @@ static void show_submodule_header(FILE *f, const char 
> *path,
>   }
>  
>  output_header:
> - strbuf_addf(, "%s%sSubmodule %s %s..", line_prefix, meta, path,
> - find_unique_abbrev(one->hash, DEFAULT_ABBREV));
> - if (!fast_backward && !fast_forward)
> - strbuf_addch(, '.');
> + strbuf_addf(, "%s%sSubmodule %s ", line_prefix, meta, path);
> + strbuf_add_unique_abbrev(, one->hash, DEFAULT_ABBREV);
> + strbuf_addstr(, (fast_backward || fast_forward) ? ".." : "...");

Spelling out ".." and "..." as you do makes this much clearer, IMHO.

-Peff


[PATCH] use strbuf_add_unique_abbrev() for adding short hashes, part 3

2016-10-08 Thread René Scharfe
Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
instead of taking detours through find_unique_abbrev() and its static
buffer.  This is shorter in most cases and a bit more efficient.

The changes here are not easily handled by a semantic patch because
they involve removing temporary variables and deconstructing format
strings for strbuf_addf().

Signed-off-by: Rene Scharfe 
---
 merge-recursive.c |  6 +++---
 pretty.c  | 12 +---
 submodule.c   |  7 +++
 wt-status.c   | 10 --
 4 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 5200d5c..9041c2f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -202,9 +202,9 @@ static void output_commit_title(struct merge_options *o, 
struct commit *commit)
strbuf_addf(>obuf, "virtual %s\n",
merge_remote_util(commit)->name);
else {
-   strbuf_addf(>obuf, "%s ",
-   find_unique_abbrev(commit->object.oid.hash,
-   DEFAULT_ABBREV));
+   strbuf_add_unique_abbrev(>obuf, commit->object.oid.hash,
+DEFAULT_ABBREV);
+   strbuf_addch(>obuf, ' ');
if (parse_commit(commit) != 0)
strbuf_addstr(>obuf, _("(bad commit)\n"));
else {
diff --git a/pretty.c b/pretty.c
index 25efbca..0c31495 100644
--- a/pretty.c
+++ b/pretty.c
@@ -544,15 +544,13 @@ static void add_merge_info(const struct 
pretty_print_context *pp,
strbuf_addstr(sb, "Merge:");
 
while (parent) {
-   struct commit *p = parent->item;
-   const char *hex = NULL;
+   struct object_id *oidp = >item->object.oid;
+   strbuf_addch(sb, ' ');
if (pp->abbrev)
-   hex = find_unique_abbrev(p->object.oid.hash, 
pp->abbrev);
-   if (!hex)
-   hex = oid_to_hex(>object.oid);
+   strbuf_add_unique_abbrev(sb, oidp->hash, pp->abbrev);
+   else
+   strbuf_addstr(sb, oid_to_hex(oidp));
parent = parent->next;
-
-   strbuf_addf(sb, " %s", hex);
}
strbuf_addch(sb, '\n');
 }
diff --git a/submodule.c b/submodule.c
index 2de06a3..476f60b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -392,10 +392,9 @@ static void show_submodule_header(FILE *f, const char 
*path,
}
 
 output_header:
-   strbuf_addf(, "%s%sSubmodule %s %s..", line_prefix, meta, path,
-   find_unique_abbrev(one->hash, DEFAULT_ABBREV));
-   if (!fast_backward && !fast_forward)
-   strbuf_addch(, '.');
+   strbuf_addf(, "%s%sSubmodule %s ", line_prefix, meta, path);
+   strbuf_add_unique_abbrev(, one->hash, DEFAULT_ABBREV);
+   strbuf_addstr(, (fast_backward || fast_forward) ? ".." : "...");
strbuf_add_unique_abbrev(, two->hash, DEFAULT_ABBREV);
if (message)
strbuf_addf(, " %s%s\n", message, reset);
diff --git a/wt-status.c b/wt-status.c
index 99d1b0a..ca5c45f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1110,7 +1110,6 @@ static void abbrev_sha1_in_line(struct strbuf *line)
split = strbuf_split_max(line, ' ', 3);
if (split[0] && split[1]) {
unsigned char sha1[20];
-   const char *abbrev;
 
/*
 * strbuf_split_max left a space. Trim it and re-add
@@ -1118,9 +1117,10 @@ static void abbrev_sha1_in_line(struct strbuf *line)
 */
strbuf_trim(split[1]);
if (!get_sha1(split[1]->buf, sha1)) {
-   abbrev = find_unique_abbrev(sha1, DEFAULT_ABBREV);
strbuf_reset(split[1]);
-   strbuf_addf(split[1], "%s ", abbrev);
+   strbuf_add_unique_abbrev(split[1], sha1,
+DEFAULT_ABBREV);
+   strbuf_addch(split[1], ' ');
strbuf_reset(line);
for (i = 0; split[i]; i++)
strbuf_addbuf(line, split[i]);
@@ -1343,10 +1343,8 @@ static char *get_branch(const struct worktree *wt, const 
char *path)
else if (starts_with(sb.buf, "refs/"))
;
else if (!get_sha1_hex(sb.buf, sha1)) {
-   const char *abbrev;
-   abbrev = find_unique_abbrev(sha1, DEFAULT_ABBREV);
strbuf_reset();
-   strbuf_addstr(, abbrev);
+   strbuf_add_unique_abbrev(, sha1, DEFAULT_ABBREV);
} else if (!strcmp(sb.buf, "detached HEAD")) /* rebase */
goto got_nothing;
else/* bisect */
-- 
2.10.1



Re: [PATCH] use strbuf_add_unique_abbrev() for adding short hashes

2016-08-07 Thread Jeff King
On Sun, Aug 07, 2016 at 10:53:34AM +0200, Johannes Schindelin wrote:

> On Sat, 6 Aug 2016, René Scharfe wrote:
> 
> > Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
> > instead of taking detours through find_unique_abbrev() and its static
> > buffer.  This is shorter and a bit more efficient.
> 
> And less error-prone.
> 
> It is also a thing I need to change in my upcoming rebase--helper patches:
> I had not known about this function.

Great. I added it several months ago to avoid some hairy allocation
code, so I am glad to see it finding new uses (both this patch and
potential new ones).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use strbuf_add_unique_abbrev() for adding short hashes

2016-08-07 Thread Johannes Schindelin
Hi René,

On Sat, 6 Aug 2016, René Scharfe wrote:

> Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
> instead of taking detours through find_unique_abbrev() and its static
> buffer.  This is shorter and a bit more efficient.

And less error-prone.

It is also a thing I need to change in my upcoming rebase--helper patches:
I had not known about this function.

Thank you,
Dscho

[PATCH] use strbuf_add_unique_abbrev() for adding short hashes

2016-08-06 Thread René Scharfe
Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
instead of taking detours through find_unique_abbrev() and its static
buffer.  This is shorter and a bit more efficient.

Signed-off-by: Rene Scharfe 
---
 builtin/checkout.c |  3 +--
 pretty.c   | 13 ++---
 transport.c| 11 ---
 3 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 27c1a05..0a7d24c 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -703,8 +703,7 @@ static int add_pending_uninteresting_ref(const char 
*refname,
 static void describe_one_orphan(struct strbuf *sb, struct commit *commit)
 {
strbuf_addstr(sb, "  ");
-   strbuf_addstr(sb,
-   find_unique_abbrev(commit->object.oid.hash, DEFAULT_ABBREV));
+   strbuf_add_unique_abbrev(sb, commit->object.oid.hash, DEFAULT_ABBREV);
strbuf_addch(sb, ' ');
if (!parse_commit(commit))
pp_commit_easy(CMIT_FMT_ONELINE, commit, sb);
diff --git a/pretty.c b/pretty.c
index 9fa42c2..9609afb 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1143,8 +1143,8 @@ static size_t format_commit_one(struct strbuf *sb, /* in 
UTF-8 */
strbuf_addstr(sb, diff_get_color(c->auto_color, 
DIFF_RESET));
return 1;
}
-   strbuf_addstr(sb, find_unique_abbrev(commit->object.oid.hash,
-c->pretty_ctx->abbrev));
+   strbuf_add_unique_abbrev(sb, commit->object.oid.hash,
+c->pretty_ctx->abbrev);
strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET));
c->abbrev_commit_hash.len = sb->len - c->abbrev_commit_hash.off;
return 1;
@@ -1154,8 +1154,8 @@ static size_t format_commit_one(struct strbuf *sb, /* in 
UTF-8 */
case 't':   /* abbreviated tree hash */
if (add_again(sb, >abbrev_tree_hash))
return 1;
-   strbuf_addstr(sb, 
find_unique_abbrev(commit->tree->object.oid.hash,
-c->pretty_ctx->abbrev));
+   strbuf_add_unique_abbrev(sb, commit->tree->object.oid.hash,
+c->pretty_ctx->abbrev);
c->abbrev_tree_hash.len = sb->len - c->abbrev_tree_hash.off;
return 1;
case 'P':   /* parent hashes */
@@ -1171,9 +1171,8 @@ static size_t format_commit_one(struct strbuf *sb, /* in 
UTF-8 */
for (p = commit->parents; p; p = p->next) {
if (p != commit->parents)
strbuf_addch(sb, ' ');
-   strbuf_addstr(sb, find_unique_abbrev(
-   p->item->object.oid.hash,
-   c->pretty_ctx->abbrev));
+   strbuf_add_unique_abbrev(sb, p->item->object.oid.hash,
+c->pretty_ctx->abbrev);
}
c->abbrev_parent_hashes.len = sb->len -
  c->abbrev_parent_hashes.off;
diff --git a/transport.c b/transport.c
index 4ba48b0..557f399 100644
--- a/transport.c
+++ b/transport.c
@@ -321,11 +321,6 @@ static void print_ref_status(char flag, const char 
*summary, struct ref *to, str
}
 }
 
-static const char *status_abbrev(unsigned char sha1[20])
-{
-   return find_unique_abbrev(sha1, DEFAULT_ABBREV);
-}
-
 static void print_ok_ref_status(struct ref *ref, int porcelain)
 {
if (ref->deletion)
@@ -340,7 +335,8 @@ static void print_ok_ref_status(struct ref *ref, int 
porcelain)
char type;
const char *msg;
 
-   strbuf_addstr(, status_abbrev(ref->old_oid.hash));
+   strbuf_add_unique_abbrev(, ref->old_oid.hash,
+DEFAULT_ABBREV);
if (ref->forced_update) {
strbuf_addstr(, "...");
type = '+';
@@ -350,7 +346,8 @@ static void print_ok_ref_status(struct ref *ref, int 
porcelain)
type = ' ';
msg = NULL;
}
-   strbuf_addstr(, status_abbrev(ref->new_oid.hash));
+   strbuf_add_unique_abbrev(, ref->new_oid.hash,
+DEFAULT_ABBREV);
 
print_ref_status(type, quickref.buf, ref, ref->peer_ref, msg, 
porcelain);
strbuf_release();
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html