Re: [PATCH v2 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf

2017-10-05 Thread Jeff King
On Thu, Oct 05, 2017 at 04:55:53AM -0400, Jeff King wrote:

> On Mon, Sep 25, 2017 at 05:54:49AM -0400, Derrick Stolee wrote:
> 
> > Create helper program test-abbrev to compute the minimum length of a
> > disambiguating short-sha for an input list of object ids.
> 
> This seems like something that Git ought to be able to do via real
> commands.
> 
> Using "log --stdin --no-walk --format=%h" doesn't quite work, since it
> only handles commits. We ought to be able to ask "cat-file" for this,
> though. E.g., with the patch below you can do:
> 
>   git cat-file --batch-check='%(objectsize:short)'  
> Or even just dispense with your earlier randomization helper and do:
> 
>   git cat-file --batch-all-objects --batch-check='%(objectsize:short)'
> 
> to compute the abbreviation for every object.

Of course it would help if I bothered to include the patch. Here it is.

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd75a..a5f911a632 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -225,6 +225,9 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
if (is_atom("objectname", atom, len)) {
if (!data->mark_query)
strbuf_addstr(sb, oid_to_hex(>oid));
+   } else if (is_atom("objectname:short", atom, len)) {
+   if (!data->mark_query)
+   strbuf_add_unique_abbrev(sb, data->oid.hash, 
MINIMUM_ABBREV);
} else if (is_atom("objecttype", atom, len)) {
if (data->mark_query)
data->info.typep = >type;

-Peff


Re: [PATCH v2 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf

2017-10-05 Thread Jeff King
On Mon, Sep 25, 2017 at 05:54:49AM -0400, Derrick Stolee wrote:

> Create helper program test-abbrev to compute the minimum length of a
> disambiguating short-sha for an input list of object ids.

This seems like something that Git ought to be able to do via real
commands.

Using "log --stdin --no-walk --format=%h" doesn't quite work, since it
only handles commits. We ought to be able to ask "cat-file" for this,
though. E.g., with the patch below you can do:

  git cat-file --batch-check='%(objectsize:short)'  Perf test p0008-abbrev.sh runs test-abbrev for 100,000 object ids. For
> test 0008.1, these objects exist. For test 0008.2 these objects do not
> exist in the repo (with high probability). For each test, use `sort -R`
> to (deterministically) shuffle the sample of object ids to not check
> abbreviations in lexicographic order.

I know you're not the first to add a test like this, but I'm somewhat
negative on these sorts of micro-benchmark perf tests. They're nice for
showing off an improvement, but there's little indication of how they
impact things that users actually do.

We know that this series makes finding abbreviations cheaper. But how
much does it speed up "git log --oneline" on a real repository
(including absurdly-sized ones)?

-Peff


Re: [PATCH v2 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf

2017-09-26 Thread Junio C Hamano
Derrick Stolee  writes:

> diff --git a/t/helper/test-abbrev.c b/t/helper/test-abbrev.c
> new file mode 100644
> index 0..6866896eb
> --- /dev/null
> +++ b/t/helper/test-abbrev.c
> @@ -0,0 +1,19 @@
> +#include "cache.h"
> +#include 

Same comment on  as [1/5] applies.

> +
> +int cmd_main(int ac, const char **av)
> +{
> + struct object_id oid;
> + char hex[GIT_MAX_HEXSZ + 2];

Why +2 (as opposed to +1)?

> + const char *end;
> +
> + setup_git_directory();
> +
> + while (fgets(hex, GIT_MAX_HEXSZ + 2, stdin)) {
> + hex[GIT_MAX_HEXSZ] = 0;
> + if (!parse_oid_hex(hex, , ))
> + find_unique_abbrev(oid.hash, MINIMUM_ABBREV);
> + }
> +
> + exit(0);
> +}
> diff --git a/t/perf/p0008-abbrev.sh b/t/perf/p0008-abbrev.sh
> new file mode 100755
> index 0..ba25e7824
> --- /dev/null
> +++ b/t/perf/p0008-abbrev.sh
> @@ -0,0 +1,22 @@
> +#!/bin/bash
> +
> +test_description='Test object disambiguation through abbreviations'
> +. ./perf-lib.sh
> +
> +test_perf_large_repo
> +
> +test-list-objects 10 | sort -R > objs.txt

I thought "sort randomly" was a GNUism.  Does it work across
platforms?  I think not.

> +
> +test_perf 'find_unique_abbrev() for existing objects' '
> + test-abbrev < objs.txt
> +'
> +
> +test-list-objects 10 --missing | sort -R > objs.txt
> +
> +test_perf 'find_unique_abbrev() for missing objects' '
> + test-abbrev < objs.txt
> +'
> +
> +rm objs.txt
> +
> +test_done


[PATCH v2 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf

2017-09-25 Thread Derrick Stolee
Create helper program test-abbrev to compute the minimum length of a
disambiguating short-sha for an input list of object ids.

Perf test p0008-abbrev.sh runs test-abbrev for 100,000 object ids. For
test 0008.1, these objects exist. For test 0008.2 these objects do not
exist in the repo (with high probability). For each test, use `sort -R`
to (deterministically) shuffle the sample of object ids to not check
abbreviations in lexicographic order.

Signed-off-by: Derrick Stolee 
---
 Makefile   |  1 +
 t/helper/.gitignore|  1 +
 t/helper/test-abbrev.c | 19 +++
 t/perf/p0008-abbrev.sh | 22 ++
 4 files changed, 43 insertions(+)
 create mode 100644 t/helper/test-abbrev.c
 create mode 100755 t/perf/p0008-abbrev.sh

diff --git a/Makefile b/Makefile
index af94b655a..75835c7c0 100644
--- a/Makefile
+++ b/Makefile
@@ -633,6 +633,7 @@ X =
 
 PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 
+TEST_PROGRAMS_NEED_X += test-abbrev
 TEST_PROGRAMS_NEED_X += test-chmtime
 TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 9dd1c9f3c..ab9df8369 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -1,3 +1,4 @@
+/test-abbrev
 /test-chmtime
 /test-ctype
 /test-config
diff --git a/t/helper/test-abbrev.c b/t/helper/test-abbrev.c
new file mode 100644
index 0..6866896eb
--- /dev/null
+++ b/t/helper/test-abbrev.c
@@ -0,0 +1,19 @@
+#include "cache.h"
+#include 
+
+int cmd_main(int ac, const char **av)
+{
+   struct object_id oid;
+   char hex[GIT_MAX_HEXSZ + 2];
+   const char *end;
+
+   setup_git_directory();
+
+   while (fgets(hex, GIT_MAX_HEXSZ + 2, stdin)) {
+   hex[GIT_MAX_HEXSZ] = 0;
+   if (!parse_oid_hex(hex, , ))
+   find_unique_abbrev(oid.hash, MINIMUM_ABBREV);
+   }
+
+   exit(0);
+}
diff --git a/t/perf/p0008-abbrev.sh b/t/perf/p0008-abbrev.sh
new file mode 100755
index 0..ba25e7824
--- /dev/null
+++ b/t/perf/p0008-abbrev.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+test_description='Test object disambiguation through abbreviations'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test-list-objects 10 | sort -R > objs.txt
+
+test_perf 'find_unique_abbrev() for existing objects' '
+   test-abbrev < objs.txt
+'
+
+test-list-objects 10 --missing | sort -R > objs.txt
+
+test_perf 'find_unique_abbrev() for missing objects' '
+   test-abbrev < objs.txt
+'
+
+rm objs.txt
+
+test_done
-- 
2.14.1.538.g56ec8fc98.dirty