Re: [PATCH 0/6] address packed-refs speed regressions

2015-04-05 Thread Jeff King
On Mon, Apr 06, 2015 at 12:39:15AM +0200, René Scharfe wrote:

> >...this. I think that effort might be better spent on a ref storage
> >format that's more efficient, simpler (with respect to subtle races and
> >such), and could provide other features (e.g., transactional atomicity).
> 
> Such as a DBMS? :-)  Leaving storage details to SQLite or whatever sounds
> attractive to me because I'm lazy.

Exactly. Though I think some folks were worried about the extra
dependency (e.g., I think SQLite is hard for JGit, because there's no
pure-java implementation, which makes Eclipse unhappy).

With pluggable backends we can make something like a SQLite backend
optional. I.e., use it if you want the benefits and can accept the
portability downsides. But that also risks fracturing the community, and
people on the "old" format being left behind.

> Forgot to say: I like your changes.  But if strbuf_getline can only be made
> fast enough beyond that by duplicating stdio buffering then I feel it's
> better to take a different way.  E.g. dropping the requirement to handle NUL
> chars and basing it on fgets as Junio suggested in his reply to patch 3
> sounds good.

Yeah, though we probably need to either audit the callers, or provide a
flag for each caller to turn on the speed-over-NULs behavior. I'll look
into that, but it may not be this week, as I'll be traveling starting
tomorrow.

> In any case, the packed refs file seems special enough to receive special
> treatment.  Using mmap would make the most sense if we could also avoid
> copying lines to a strbuf for parsing, though.

I had a similar thought. Below is hacky patch, on top of your mmap
patch, that does this. It does shave off another 300ms (around 5%).

I think we may be getting into a useless area of micro-optimizing here,
though.  The results are noticeable on this ridiculous repository, but
probably not so much on real ones. The low-hanging fruit (e.g., dropping
time in half by using getc_unlocked) seems to provide the most bang for
the buck.

---
diff --git a/refs.c b/refs.c
index 144255f..708b49b 100644
--- a/refs.c
+++ b/refs.c
@@ -334,27 +334,40 @@ static int refname_is_safe(const char *refname)
return 1;
 }
 
-static struct ref_entry *create_ref_entry(const char *refname,
- const unsigned char *sha1, int flag,
- int check_name)
+static struct ref_entry *create_ref_entry_len(const char *refname, size_t len,
+ const unsigned char *sha1, int 
flag,
+ int check_name)
 {
-   int len;
struct ref_entry *ref;
 
+   /*
+* allocate before checking, since the check functions require
+* a NUL-terminated refname. And since we die() anyway if
+* the check fails, the overhead of the extra malloc is negligible
+*/
+   ref = xmalloc(sizeof(struct ref_entry) + len + 1);
+   hashcpy(ref->u.value.sha1, sha1);
+   hashclr(ref->u.value.peeled);
+   memcpy(ref->name, refname, len);
+   ref->name[len] = '\0';
+   ref->flag = flag;
+
if (check_name &&
check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
die("Reference has invalid format: '%s'", refname);
if (!check_name && !refname_is_safe(refname))
die("Reference has invalid name: '%s'", refname);
-   len = strlen(refname) + 1;
-   ref = xmalloc(sizeof(struct ref_entry) + len);
-   hashcpy(ref->u.value.sha1, sha1);
-   hashclr(ref->u.value.peeled);
-   memcpy(ref->name, refname, len);
-   ref->flag = flag;
return ref;
 }
 
+static struct ref_entry *create_ref_entry(const char *refname,
+const unsigned char *sha1, int flag,
+int check_name)
+{
+   return create_ref_entry_len(refname, strlen(refname), sha1, flag,
+   check_name);
+}
+
 static void clear_ref_dir(struct ref_dir *dir);
 
 static void free_ref_entry(struct ref_entry *entry)
@@ -1095,7 +1108,9 @@ static const char PACKED_REFS_HEADER[] =
  * Return a pointer to the refname within the line (null-terminated),
  * or NULL if there was a problem.
  */
-static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
+static const char *parse_ref_line(const char *line, int len,
+ unsigned char *sha1,
+ size_t *refname_len)
 {
const char *ref;
 
@@ -1107,22 +1122,22 @@ static const char *parse_ref_line(struct strbuf *line, 
unsigned char *sha1)
 *  +1 (space in between hex and name)
 *  +1 (newline at the end of the line)
 */
-   if (line->len <= 42)
+   if (len <= 42)
return NULL;
 
-   if (get_sha1_hex(line->buf, sha1) < 0)
+   if (get_sha1_hex(line, sha1) < 0)
return 

Re: [PATCH 0/6] address packed-refs speed regressions

2015-04-05 Thread René Scharfe

Am 05.04.2015 um 20:59 schrieb Jeff King:

Still, the numbers are promising. Here's are comparisons
against for-each-ref on torvalds/linux, which has a 218M
packed-refs file:

   $ time git for-each-ref \
   --format='%(objectname) %(refname)' \
   refs/remotes/2325298/ |
   wc -c
   44139

   real0m1.649s
   user0m1.332s
   sys 0m0.304s

   $ time ~peff/git-quick-list refs/remotes/2325298/ | wc -c
   44139

   real0m0.012s
   user0m0.004s
   sys 0m0.004s


Sweet numbers. :-P

I'm not familiar with refs.c, but its sheer size alone suggests that it 
won't be easy to integrate this prototype code there. :-/


René


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] address packed-refs speed regressions

2015-04-05 Thread René Scharfe

Am 05.04.2015 um 20:52 schrieb Jeff King:

On Sun, Apr 05, 2015 at 03:41:39PM +0200, René Scharfe wrote:

I wonder if pluggable reference backends could help here.  Storing refs
in a database table indexed by refname should simplify things.


...this. I think that effort might be better spent on a ref storage
format that's more efficient, simpler (with respect to subtle races and
such), and could provide other features (e.g., transactional atomicity).


Such as a DBMS? :-)  Leaving storage details to SQLite or whatever 
sounds attractive to me because I'm lazy.



The big plus side of packed-refs improvements is that they "just work"
without worrying about compatibility issues. But ref storage is local,
so I'm not sure how big a deal that is in practice.


Adding a dependency is a big step, admittedly, so native improvements 
might be a better fit.  There's a chance that we'd run into issues 
already solved by specialized database engines long ago, though.



Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
the packed refs file?  What numbers do you get with the following patch?


It's about 9% faster than my series + the fgets optimization I posted
(or about 25% than using getc).  Which is certainly nice, but I was
really hoping to just make strbuf_getline faster for all callers, rather
than introducing special code for one call-site.  Certainly we could
generalize the technique (i.e., a struct with the mmap data), but then I
feel we are somewhat reinventing stdio. Which is maybe a good thing,
because stdio has a lot of rough edges (as seen here), but it does feel
a bit like NIH syndrome.


Forgot to say: I like your changes.  But if strbuf_getline can only be 
made fast enough beyond that by duplicating stdio buffering then I feel 
it's better to take a different way.  E.g. dropping the requirement to 
handle NUL chars and basing it on fgets as Junio suggested in his reply 
to patch 3 sounds good.


In any case, the packed refs file seems special enough to receive 
special treatment.  Using mmap would make the most sense if we could 
also avoid copying lines to a strbuf for parsing, though.


René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] address packed-refs speed regressions

2015-04-05 Thread Jeff King
On Sun, Apr 05, 2015 at 02:52:59PM -0400, Jeff King wrote:

> Right now we parse all of the packed-refs file into an in-memory cache,
> and then do single lookups from that cache. Doing an mmap() and a binary
> search is way faster (and costs less memory) for doing individual
> lookups. It relies on the list being sorted. This is generally true, but
> not something we currently rely on (however, it would be easy to add a
> "sorted" flag to top of the file and have the readers fall back when the
> flag is missing). I've played with a patch to do this (it's not entirely
> trivial, because you jump into the middle of a line, and then have to
> walk backwards to find the start of the record).
> 
> For traversals, it's more complicated. Obviously if you are traversing
> all refs, you have to read the whole thing anyway. If you are traversing
> a subset of the refs, you can binary-search the start of the subset, and
> then walk forward. But that's where it gets tricky with the current
> code.

In case you are curious, here is my proof-of-concept for the packed-refs
binary search. You'll note that it's a separate program, and not
integrated into refs.c. I wrote this last August, and after trying to
integrate it into refs.c, I found the ref_cache problems I described,
and I haven't touched it since.

I also seem to have saved the patch for stuffing it into refs.c, but I
am not sure if it even compiles (I wrote only "horrible wip" in the
commit message ;) ).

-- >8 --
Subject: [PATCH] add git-quick-list

This is a proof of concept for binary-searching the
packed-refs file in order to traverse an ordered subset of
it. Note that it _only_ reads the packed-refs file
currently. To really compare to for-each-ref, it would need
to also walk the loose ref area for its prefix. On a
mostly-packed repository that shouldn't make a big speed
difference, though.

And of course we don't _really_ want a separate command here
at all. This should be part of refs.c, and everyone who
calls for_each_ref should benefit from it.

Still, the numbers are promising. Here's are comparisons
against for-each-ref on torvalds/linux, which has a 218M
packed-refs file:

  $ time git for-each-ref \
  --format='%(objectname) %(refname)' \
  refs/remotes/2325298/ |
  wc -c
  44139

  real0m1.649s
  user0m1.332s
  sys 0m0.304s

  $ time ~peff/git-quick-list refs/remotes/2325298/ | wc -c
  44139

  real0m0.012s
  user0m0.004s
  sys 0m0.004s
---
 Makefile |   1 +
 quick-list.c | 174 +++
 2 files changed, 175 insertions(+)
 create mode 100644 quick-list.c

diff --git a/Makefile b/Makefile
index 2457065..aa32598 100644
--- a/Makefile
+++ b/Makefile
@@ -541,6 +541,7 @@ PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
 PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += remote-testsvn.o
+PROGRAM_OBJS += quick-list.o
 
 # Binary suffix, set to .exe for Windows builds
 X =
diff --git a/quick-list.c b/quick-list.c
new file mode 100644
index 000..e423f1f
--- /dev/null
+++ b/quick-list.c
@@ -0,0 +1,174 @@
+#include "cache.h"
+#include "refs.h"
+
+struct packed_refs_iterator {
+   const char *start;
+   const char *end;
+
+   const char *cur;
+   const char *ref;
+   const char *eol;
+   const char *next;
+};
+
+static void iterator_init(struct packed_refs_iterator *pos,
+ const char *buf, size_t len)
+{
+   pos->start = buf;
+   pos->end = buf + len;
+
+   /* skip past header line */
+   if (pos->start < pos->end && *pos->start == '#') {
+   while (pos->start < pos->end && *pos->start != '\n')
+   pos->start++;
+   if (pos->start < pos->end)
+   pos->start++;
+   }
+}
+
+static int iterator_cmp(const char *key, struct packed_refs_iterator *pos)
+{
+   const char *ref = pos->ref;
+   for (; *key && ref < pos->eol; key++, ref++)
+   if (*key != *ref)
+   return (unsigned char)*key - (unsigned char)*ref;
+   return ref == pos->eol ? *key ? 1 : 0 : -1;
+}
+
+static const char *find_eol(const char *p, const char *end)
+{
+   p = memchr(p, '\n', end - p);
+   return p ? p : end;
+}
+
+static void parse_line(struct packed_refs_iterator *pos, const char *p)
+{
+   pos->cur = p;
+   if (pos->end - p < 41)
+   die("truncated packed-refs file");
+   p += 41;
+
+   pos->ref = p;
+   pos->eol = p = find_eol(p, pos->end);
+
+   /* skip newline, and then past any peel records */
+   if (p < pos->end)
+   p++;
+   while (p < pos->end && *p == '^') {
+   p = find_eol(p, pos->end);
+   if (p < pos->end)
+   p++;
+   }
+   pos->next = p;
+}
+
+static void iterator_next(struct packed_refs_iterator *pos)
+{
+   if (pos->next < pos->end)
+   parse_line(pos, pos->next);
+   else

Re: [PATCH 0/6] address packed-refs speed regressions

2015-04-05 Thread Jeff King
On Sun, Apr 05, 2015 at 03:41:39PM +0200, René Scharfe wrote:

> > The main culprits seem to be d0f810f (which introduced some extra
> > expensive code for each ref) and my 10c497a, which switched from fgets()
> > to strbuf_getwholeline. It turns out that strbuf_getwholeline is really
> > slow.
> 
> 10c497a changed read_packed_refs(), which reads *all* packed refs.
> Each is checked for validity.  That sounds expensive if the goal is
> just to look up a single (non-existing) ref.
> 
> Would it help to defer any checks until a ref is actually accessed?
> Can a binary search be used instead of reading the whole file?

Yes, but addressing that is much more invasive.

Right now we parse all of the packed-refs file into an in-memory cache,
and then do single lookups from that cache. Doing an mmap() and a binary
search is way faster (and costs less memory) for doing individual
lookups. It relies on the list being sorted. This is generally true, but
not something we currently rely on (however, it would be easy to add a
"sorted" flag to top of the file and have the readers fall back when the
flag is missing). I've played with a patch to do this (it's not entirely
trivial, because you jump into the middle of a line, and then have to
walk backwards to find the start of the record).

For traversals, it's more complicated. Obviously if you are traversing
all refs, you have to read the whole thing anyway. If you are traversing
a subset of the refs, you can binary-search the start of the subset, and
then walk forward. But that's where it gets tricky with the current
code.

The ref_cache code expects to fill in from outer to inner. So if you
have "refs/foo", you should also have filled in all of "refs/" (but not
necessarily "refs/bar"). This matches the way we traverse loose ref
directories; we opendir "refs/", find out that it has "foo" and "bar",
and the descend into "foo", and so forth. But reading a subset of the
packed-ref file is "inside out". You fill in all of "refs/foo", but you
have no idea what else is in "refs/".

So going in that direction would involve some surgery to the ref_cache
code. It might even involve throwing it out entirely (i.e., just mmap
the packed-refs file and look through it directly, without any kind of
in-memory cache; we don't tend to do more than one ref-iteration per
program anyway, so I'm not sure the caching is buying us much anyway).
My big concern there would be that there are a lot of subtle race issues
between packed and loose refs, and the current state is the result of a
lot of tweaking. I'd be worried that a heavy rewrite there would risk
introducing subtle and rare corruptions.

Plus it would be a lot of work, which leads me to...

> I wonder if pluggable reference backends could help here.  Storing refs
> in a database table indexed by refname should simplify things.

...this. I think that effort might be better spent on a ref storage
format that's more efficient, simpler (with respect to subtle races and
such), and could provide other features (e.g., transactional atomicity).

The big plus side of packed-refs improvements is that they "just work"
without worrying about compatibility issues. But ref storage is local,
so I'm not sure how big a deal that is in practice.

> Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
> the packed refs file?  What numbers do you get with the following patch?

It's about 9% faster than my series + the fgets optimization I posted
(or about 25% than using getc).  Which is certainly nice, but I was
really hoping to just make strbuf_getline faster for all callers, rather
than introducing special code for one call-site. Certainly we could
generalize the technique (i.e., a struct with the mmap data), but then I
feel we are somewhat reinventing stdio. Which is maybe a good thing,
because stdio has a lot of rough edges (as seen here), but it does feel
a bit like NIH syndrome.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] address packed-refs speed regressions

2015-04-05 Thread René Scharfe
Am 05.04.2015 um 03:06 schrieb Jeff King:
> As I've mentioned before, I have some repositories with rather large
> numbers of refs. The worst one has ~13 million refs, for a 1.6GB
> packed-refs file. So I was saddened by this:
> 
>$ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1
>real0m6.840s
>user0m6.404s
>sys 0m0.440s
> 
>$ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1
>real0m19.432s
>user0m18.996s
>sys 0m0.456s
> 
> The command isn't important; what I'm really measuring is loading the
> packed-refs file. And yes, of course this repository is absolutely
> ridiculous. But the slowdowns here are linear with the number of refs.
> So _every_ git command got a little bit slower, even in less crazy
> repositories. We just didn't notice it as much.
> 
> Here are the numbers after this series:
> 
>real0m8.539s
>user0m8.052s
>sys 0m0.496s
> 
> Much better, but I'm frustrated that they are still 20% slower than the
> original.
> 
> The main culprits seem to be d0f810f (which introduced some extra
> expensive code for each ref) and my 10c497a, which switched from fgets()
> to strbuf_getwholeline. It turns out that strbuf_getwholeline is really
> slow.

10c497a changed read_packed_refs(), which reads *all* packed refs.
Each is checked for validity.  That sounds expensive if the goal is
just to look up a single (non-existing) ref.

Would it help to defer any checks until a ref is actually accessed?
Can a binary search be used instead of reading the whole file?

I wonder if pluggable reference backends could help here.  Storing refs
in a database table indexed by refname should simplify things.

Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
the packed refs file?  What numbers do you get with the following patch?

---
 refs.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/refs.c b/refs.c
index 47e4e53..144255f 100644
--- a/refs.c
+++ b/refs.c
@@ -1153,16 +1153,35 @@ static const char *parse_ref_line(struct strbuf *line, 
unsigned char *sha1)
  *  compatibility with older clients, but we do not require it
  *  (i.e., "peeled" is a no-op if "fully-peeled" is set).
  */
-static void read_packed_refs(FILE *f, struct ref_dir *dir)
+static void read_packed_refs(int fd, struct ref_dir *dir)
 {
struct ref_entry *last = NULL;
struct strbuf line = STRBUF_INIT;
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
+   struct stat st;
+   void *map;
+   size_t mapsz, len;
+   const char *p;
+
+   fstat(fd, &st);
+   mapsz = xsize_t(st.st_size);
+   if (!mapsz)
+   return;
+   map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0);
 
-   while (strbuf_getwholeline(&line, f, '\n') != EOF) {
+   for (p = map, len = mapsz; len; ) {
unsigned char sha1[20];
const char *refname;
const char *traits;
+   const char *nl;
+   size_t linelen;
+
+   nl = memchr(p, '\n', len);
+   linelen = nl ? nl - p + 1 : len;
+   strbuf_reset(&line);
+   strbuf_add(&line, p, linelen);
+   p += linelen;
+   len -= linelen;
 
if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
if (strstr(traits, " fully-peeled "))
@@ -1204,6 +1223,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
}
 
strbuf_release(&line);
+   munmap(map, mapsz);
 }
 
 /*
@@ -1224,16 +1244,16 @@ static struct packed_ref_cache 
*get_packed_ref_cache(struct ref_cache *refs)
clear_packed_ref_cache(refs);
 
if (!refs->packed) {
-   FILE *f;
+   int fd;
 
refs->packed = xcalloc(1, sizeof(*refs->packed));
acquire_packed_ref_cache(refs->packed);
refs->packed->root = create_dir_entry(refs, "", 0, 0);
-   f = fopen(packed_refs_file, "r");
-   if (f) {
-   stat_validity_update(&refs->packed->validity, 
fileno(f));
-   read_packed_refs(f, get_ref_dir(refs->packed->root));
-   fclose(f);
+   fd = open(packed_refs_file, O_RDONLY);
+   if (fd >= 0) {
+   stat_validity_update(&refs->packed->validity, fd);
+   read_packed_refs(fd, get_ref_dir(refs->packed->root));
+   close(fd);
}
}
return refs->packed;
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html