[PATCH 1/2] sha1_object_info_extended: provide delta base sha1s

2013-12-21 Thread Jeff King
A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.

However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.

For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check oi.whence == OI_PACKED  oi.u.packed.is_delta
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).

Signed-off-by: Jeff King p...@peff.net
---
 cache.h |  1 +
 sha1_file.c | 53 +
 2 files changed, 54 insertions(+)

diff --git a/cache.h b/cache.h
index ce377e1..67356db 100644
--- a/cache.h
+++ b/cache.h
@@ -1074,6 +1074,7 @@ struct object_info {
enum object_type *typep;
unsigned long *sizep;
unsigned long *disk_sizep;
+   unsigned char *delta_base_sha1;
 
/* Response */
enum {
diff --git a/sha1_file.c b/sha1_file.c
index daacc0c..4e8dd8b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1667,6 +1667,38 @@ static off_t get_delta_base(struct packed_git *p,
return base_offset;
 }
 
+/*
+ * Like get_delta_base above, but we return the sha1 instead of the pack
+ * offset. This means it is cheaper for REF deltas (we do not have to do
+ * the final object lookup), but more expensive for OFS deltas (we
+ * have to load the revidx to convert the offset back into a sha1).
+ */
+static const unsigned char *get_delta_base_sha1(struct packed_git *p,
+   struct pack_window **w_curs,
+   off_t curpos,
+   enum object_type type,
+   off_t delta_obj_offset)
+{
+   if (type == OBJ_REF_DELTA) {
+   unsigned char *base = use_pack(p, w_curs, curpos, NULL);
+   return base;
+   } else if (type == OBJ_OFS_DELTA) {
+   struct revindex_entry *revidx;
+   off_t base_offset = get_delta_base(p, w_curs, curpos,
+  type, delta_obj_offset);
+
+   if (!base_offset)
+   return NULL;
+
+   revidx = find_pack_revindex(p, base_offset);
+   if (!revidx)
+   return NULL;
+
+   return nth_packed_object_sha1(p, revidx-nr);
+   } else
+   return NULL;
+}
+
 int unpack_object_header(struct packed_git *p,
 struct pack_window **w_curs,
 off_t *curpos,
@@ -1824,6 +1856,22 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
}
}
 
+   if (oi-delta_base_sha1) {
+   if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
+   const unsigned char *base;
+
+   base = get_delta_base_sha1(p, w_curs, curpos,
+  type, obj_offset);
+   if (!base) {
+   type = OBJ_BAD;
+   goto out;
+   }
+
+   hashcpy(oi-delta_base_sha1, base);
+   } else
+   hashclr(oi-delta_base_sha1);
+   }
+
 out:
unuse_pack(w_curs);
return type;
@@ -2407,6 +2455,9 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
git_zstream stream;
char hdr[32];
 
+   if (oi-delta_base_sha1)
+   hashclr(oi-delta_base_sha1);
+
/*
 * If we don't care about type or size, then we don't
 * need to look inside the object at all. Note that we
@@ -2457,6 +2508,8 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
*(oi-sizep) = co-size;
if (oi-disk_sizep)
*(oi-disk_sizep) = 0;
+   if (oi-delta_base_sha1)
+   hashclr(oi-delta_base_sha1);
oi-whence = OI_CACHED;
return 0;
}
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] cat-file --batch-check='%(deltabase)'

2013-12-21 Thread Jeff King
This series lets you query the delta base for a particular object (or
set of objects), like:

  $ git rev-list --objects --all |
git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'

The only other way I know of to get this information is using
verify-pack (or index-pack), which is much slower (and less convenient
to parse).

I needed this recently to write tests for another (not yet published)
series. But I think it stands on its own as a debugging / introspection
tool.

  [1/2]: sha1_object_info_extended: provide delta base sha1s
  [2/2]: cat-file: provide %(deltabase) batch format

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] cat-file: provide %(deltabase) batch format

2013-12-21 Thread Jeff King
It can be useful for debugging or analysis to see which
objects are stored as delta bases on top of others. This
information is available by running `git verify-pack`, but
that is extremely expensive (and is harder than necessary to
parse).

Instead, let's make it available as a cat-file query format,
which makes it fast and simple to get the bases for a subset
of the objects.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-cat-file.txt | 12 +---
 builtin/cat-file.c |  6 ++
 t/t1006-cat-file.sh| 34 ++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 322f5ed..f6a16f4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -109,6 +109,11 @@ newline. The available atoms are:
The size, in bytes, that the object takes up on disk. See the
note about on-disk sizes in the `CAVEATS` section below.
 
+`deltabase`::
+   If the object is stored as a delta on-disk, this expands to the
+   40-hex sha1 of the delta base object. Otherwise, expands to the
+   null sha1 (40 zeroes). See `CAVEATS` below.
+
 `rest`::
If this atom is used in the output string, input lines are split
at the first whitespace boundary. All characters before that
@@ -152,10 +157,11 @@ should be taken in drawing conclusions about which refs 
or objects are
 responsible for disk usage. The size of a packed non-delta object may be
 much larger than the size of objects which delta against it, but the
 choice of which object is the base and which is the delta is arbitrary
-and is subject to change during a repack. Note also that multiple copies
-of an object may be present in the object database; in this case, it is
-undefined which copy's size will be reported.
+and is subject to change during a repack.
 
+Note also that multiple copies of an object may be present in the object
+database; in this case, it is undefined which copy's size or delta base
+will be reported.
 
 GIT
 ---
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b2ca775..2e0af2e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -118,6 +118,7 @@ struct expand_data {
unsigned long size;
unsigned long disk_size;
const char *rest;
+   unsigned char delta_base_sha1[20];
 
/*
 * If mark_query is true, we do not expand anything, but rather
@@ -174,6 +175,11 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
data-split_on_whitespace = 1;
else if (data-rest)
strbuf_addstr(sb, data-rest);
+   } else if (is_atom(deltabase, atom, len)) {
+   if (data-mark_query)
+   data-info.delta_base_sha1 = data-delta_base_sha1;
+   else
+   strbuf_addstr(sb, sha1_to_hex(data-delta_base_sha1));
} else
die(unknown format element: %.*s, len, atom);
 }
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8a1bc5c..633dc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -240,4 +240,38 @@ test_expect_success --batch-check with multiple sha1s 
gives correct format '
 $(echo_without_newline $batch_check_input | git cat-file --batch-check)
 '
 
+test_expect_success 'setup blobs which are likely to delta' '
+   test-genrandom foo 10240 foo 
+   { cat foo; echo plus; } foo-plus 
+   git add foo foo-plus 
+   git commit -m foo 
+   cat blobs -\EOF
+   HEAD:foo
+   HEAD:foo-plus
+   EOF
+'
+
+test_expect_success 'confirm that neither loose blob is a delta' '
+   cat expect -EOF
+   $_z40
+   $_z40
+   EOF
+   git cat-file --batch-check=%(deltabase) blobs actual 
+   test_cmp expect actual
+'
+
+# To avoid relying too much on the current delta heuristics,
+# we will check only that one of the two objects is a delta
+# against the other, but not the order. We can do so by just
+# asking for the base of both, and checking whether either
+# sha1 appears in the output.
+test_expect_success '%(deltabase) reports packed delta bases' '
+   git repack -ad 
+   git cat-file --batch-check=%(deltabase) blobs actual 
+   {
+   grep $(git rev-parse HEAD:foo) actual ||
+   grep $(git rev-parse HEAD:foo-plus) actual
+   }
+'
+
 test_done
-- 
1.8.5.1.399.g900e7cd
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-mv-submodule

2013-12-21 Thread Jens Lehmann
Am 21.12.2013 10:48, schrieb fREW Schmidt:
 Hello all,
 
 I was on a plane, moving around some of the many (30ish) submodules in
 my dotfiles and got really annoyed at how much work it is (move the
 dir, remove old from git, add new to git, fix .gitmodules, fix
 .git/config, fix all the parts of the submodule config) so I wrote a
 perl script to work for the most common case.
 
 As far as I know it should work for anyone not doing Something Weird,
 ie manually fiddling with their submodules.  The main case it does not
 support that I'd like to in the future is submodules containing
 submodules, and also at some point I'd like to wrap git mv to invoke
 this script on demand automatically.

Thanks for sharing! Form a cursory look over your perl script it
looks like it does what stock git mv will do since 1.8.5 (except
for changing the name of the submodule, which I would not advise
to do when only moving the submodule location in the work tree).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-mv-submodule

2013-12-21 Thread fREW Schmidt
On Sat, Dec 21, 2013 at 05:08:59PM +0100, Jens Lehmann wrote:
 Am 21.12.2013 10:48, schrieb fREW Schmidt:
 Thanks for sharing! Form a cursory look over your perl script it
 looks like it does what stock git mv will do since 1.8.5 (except
 for changing the name of the submodule, which I would not advise
 to do when only moving the submodule location in the work tree).

See, I thought I read that in the changelog; unfortunately I don'g
thing it does the final set of book-keeping (changing the .git file if
you changed the depth of the submodule in the mv and changing the path
of the worktree in the actual git repo in .git/modules)

I'd love to be wrong on that as this script is clearly not perfect.  I
think my second little script in my previous email re git submodule
bugs shows the issue.  I'll include it here for simplicity though:

 mkdir -p test/a test/b
 cd test/a
 git init
 touch a.txt
 git add a.txt
 git ci -m 'initial commit'
 cd ../b
 git init
 mkdir c
 touch c/c.txt
 git submodule add ../a c/a
 git ci -m 'initial commit'
 git mv c d
 git status

-- 
fREW Schmidt
http://blog.afoolishmanifesto.com


pgpZne3BJMPSW.pgp
Description: PGP signature


Hello dear

2013-12-21 Thread annstokes78
Hello dear
I will be very happy to be your friend.
My name is miss. Ann Stokes. Please i will like you to
write me through my email address ( ann...@yahoo.fr ) .
I will send my pictures to you and also tell you more about
myself when i receive your email.
I will be waiting for your mail in my mail box.
Your new friend.
Ann
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/22] pack bitmaps

2013-12-21 Thread Thomas Rast
Jeff King p...@peff.net writes:

 Here's the v4 re-roll of the pack bitmap series.

 The changes from v3 are:

  - reworked add_object_entry refactoring (see patch 11, which is new,
and patch 12 which builds on it in a more natural way)

This now looks like this (pasting because it is hard to see in the diffs):

  static int add_object_entry(const unsigned char *sha1, enum object_type type,
  const char *name, int exclude)
  {
  struct packed_git *found_pack;
  off_t found_offset;
  uint32_t index_pos;

  if (have_duplicate_entry(sha1, exclude, index_pos))
  return 0;

  if (!want_object_in_pack(sha1, exclude, found_pack, found_offset))
  return 0;

  create_object_entry(sha1, type, pack_name_hash(name),
  exclude, name  no_try_delta(name),
  index_pos, found_pack, found_offset);

  display_progress(progress_state, to_pack.nr_objects);
  return 1;
  }

  static int add_object_entry_from_bitmap(const unsigned char *sha1,
  enum object_type type,
  int flags, uint32_t name_hash,
  struct packed_git *pack, off_t offset)
  {
  uint32_t index_pos;

  if (have_duplicate_entry(sha1, 0, index_pos))
  return 0;

  create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, 
offset);

  display_progress(progress_state, to_pack.nr_objects);
  return 1;
  }


Much nicer.  Thanks for going the extra mile!

-- 
Thomas Rast
t...@thomasrast.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] wt-status.c: disable those distracting -Wformat-zero-length warnings

2013-12-21 Thread Samuel Bronson
On Sat, Dec 21, 2013 at 4:42 AM, Jeff King p...@peff.net wrote:
 On Fri, Dec 20, 2013 at 10:45:01AM -0500, Samuel Bronson wrote:

 These warnings don't really seem to make much sense for this file.

 Agreed, though the advice so far has been to put -Wno-format-zero-length
 in your CFLAGS.

Yes, auto-detecting acceptance of this flag and using it automatically
might be a reasonable approach as well, but I thought this warning
might potentially be useful WRT other printf-like functions.

 +/* We have good reasons for using zero-length format strings, and
 + * there's unfortunately no way to turn this off on a per-function
 + * basis ... */
 +#pragma GCC diagnostic ignored -Wformat-zero-length

 Are other compilers happy to ignore this pragma? I guess we could wrap
 it in an #ifdef, if so.

I assume you meant we could use an #ifdef if *not*?

 It's also really not about this file in particular. The whole concept of
 format-zero-length is questionable, as it ignores the concept that a
 format function might actually do something useful with an empty format
 (e.g., by adding boilerplate, or having a side-effect). It's just that
 this file is the only one that happens to do so.

Hmm, I think I saw one other instance of this warning, actually, but
it didn't seem worth adding the pragma to a file for just one warning.

 Annotating the _function_ to say it's useful to pass an empty format
 into this function would make sense, but as you note, there is no way
 to do that.

I made a note about this at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47901#c9 (comment #9 on
-Wall should not imply -Wformat-zero-length by default)

 So I dunno. This seems like it does not quite specify what we want to
 say as well as just -Wno-format-zero-length, but it is more convenient
 in practice (because we take care of it in the source code, rather than
 relying on the user's build settings).

Yeah.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] commit.c: make tree a const pointer in commit_tree*()

2013-12-21 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 commit.c | 4 ++--
 commit.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 5df1df7..5ff5538 100644
--- a/commit.c
+++ b/commit.c
@@ -1356,7 +1356,7 @@ void free_commit_extra_headers(struct commit_extra_header 
*extra)
}
 }
 
-int commit_tree(const struct strbuf *msg, unsigned char *tree,
+int commit_tree(const struct strbuf *msg, const unsigned char *tree,
struct commit_list *parents, unsigned char *ret,
const char *author, const char *sign_commit)
 {
@@ -1485,7 +1485,7 @@ static const char commit_utf8_warn[] =
 You may want to amend it after fixing the message, or set the config\n
 variable i18n.commitencoding to the encoding your project uses.\n;
 
-int commit_tree_extended(const struct strbuf *msg, unsigned char *tree,
+int commit_tree_extended(const struct strbuf *msg, const unsigned char *tree,
 struct commit_list *parents, unsigned char *ret,
 const char *author, const char *sign_commit,
 struct commit_extra_header *extra)
diff --git a/commit.h b/commit.h
index 934af88..f8a451d 100644
--- a/commit.h
+++ b/commit.h
@@ -232,11 +232,11 @@ struct commit_extra_header {
 extern void append_merge_tag_headers(struct commit_list *parents,
 struct commit_extra_header ***tail);
 
-extern int commit_tree(const struct strbuf *msg, unsigned char *tree,
+extern int commit_tree(const struct strbuf *msg, const unsigned char *tree,
   struct commit_list *parents, unsigned char *ret,
   const char *author, const char *sign_commit);
 
-extern int commit_tree_extended(const struct strbuf *msg, unsigned char *tree,
+extern int commit_tree_extended(const struct strbuf *msg, const unsigned char 
*tree,
struct commit_list *parents, unsigned char *ret,
const char *author, const char *sign_commit,
struct commit_extra_header *);
-- 
1.8.5.2.221.g812bf82.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-21 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 I am not happy with the choice of main/HEAD that would squat on a
 good name for remote-tracking branch (i.e. s/origin/main/), though.
 $GIT_DIR/COMMON_HEAD perhaps?

 It's not just about HEAD. Anything worktree-specific of the main
 checkout can be accessed this way, e.g. main/index,
 main/FETCH_HEAD and it's not exactly common because it's
 worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...)
 ?

Do we even need to expose them as ref-like things as a part of the
external API/UI in the first place?  For end-user scripts that want
to operate in a real or borrowing worktree, there should be no
reason to touch this other repository directly.  Things like if
one of the wortrees tries to check out a branch that is already
checked out elsewhere, error out policy may need to consult the
other worktrees via $GIT_COMMON_DIR mechanism but at that level we
have all the control without contaminating end-user facing ref
namespace in a way main/FETCH_HEAD... do.  You said

This makes it possible for a linked checkout to detach HEAD of
the main one.

but I think that is misguided---it just makes it easier to confuse
users, if done automatically and without any policy knob. It instead
should error out, while saying which worktree has the branch in
question checked out. After all, checking out a branch that is
checked out in another worktree (not the common one) needs the same
caution, so main/HEAD is not the only special one.

And if your updated git checkout 'frotz' gives a clear report of
which worktree has the branch 'frotz' checked out, the user can go
there to detach the HEAD in that worktree to detach with

git -C $the_other_one checkout HEAD^0

if he chooses to.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: questions / suggestions about history simplification

2013-12-21 Thread Junio C Hamano
Adam Spiers g...@adamspiers.org writes:

 I doubt it.  75% of the work for such a person to understand the
 behaviour from such an example is to understand what kind of history
 the example is building.

 Agreed.  And that's precisely why I wanted a real repository
 manifesting the given example: being able to view it in gitk makes
 that a lot easier to understand, for obvious reasons.
 ...
 Well I didn't roll my own; I just copied the example from the man
 page.  So it only tells you that I was spending a fair amount of
 effort trying to understand the man page ;-)

Oh, that part I would agree, but then ...

 Not if the man page says if you wish to experiment with these options
 yourself, you can easily recreate the repository in this example by
 running the script contrib/foo bundled in the source distribution.
 ...
 The goal of sharing the series of commands is not to educate users
 through reading them, but simply to save them the time they would have
 to spend manually recreating the example scenario given in the man
 page.

... this part could be even easier if distro ships a sample repository,
not a recipe to create such a sample repository, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-cherry.txt: cross reference git log --cherry

2013-12-21 Thread Junio C Hamano
Samuel Bronson naes...@gmail.com writes:

 I learned of git cherry some days ago, but only learned of --cherry
 and related options to git log today[1] (more-or-less by chance).

 If the git-cherry(1) manpage had mentioned --cherry, I would have
 learned of these options sooner.

This proposed log message is of an unusual style.  It is curious
that somebody learn cherry first before log.

  SEE ALSO
  
 -linkgit:git-patch-id[1]
 +linkgit:git-patch-id[1],
 +the `--cherry` option to linkgit:git-log[1]

The description text talks about changeset (or diff), compares
the changeset rather than the commit id, diffs are compared,
etc., which is a hint that a lone reference to patch-id would be a
page to read about that comparison, which is a good motivation that
may entice the readers to follow that reference.

I am not sure what value the readers of this man page would see to
this addition, though. Unlike the reference to patch-id, it is not
so obvious what gives them motivation to follow this new reference
to log --cherry.  A new sentence in DESCRIPTION section to mention
it (e.g. 'log --cherry' gives similar information) may be needed
at the same time.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] safe_create_leading_directories(): fix a mkdir/rmdir race

2013-12-21 Thread Michael Haggerty
It could be that some other process is trying to clean up empty
directories at the same time that safe_create_leading_directories() is
attempting to create them.  In this case, it could happen that
directory a/b was present at the end of one iteration of the loop
(either it was already present or we just created it ourselves), but
by the time we try to create directory a/b/c, directory a/b has
been deleted.  In fact, directory a might also have been deleted.

So, if a call to mkdir() fails with ENOENT, then try checking/making
all directories again from the beginning.  Attempt up to three times
before giving up.

Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 sha1_file.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/sha1_file.c b/sha1_file.c
index dcfd35a..abcb56b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -108,6 +108,7 @@ int mkdir_in_gitdir(const char *path)
 int safe_create_leading_directories(char *path)
 {
char *next_component = path + offset_1st_component(path);
+   int attempts = 3;
int retval = 0;
 
while (!retval  next_component) {
@@ -132,6 +133,16 @@ int safe_create_leading_directories(char *path)
if (errno == EEXIST 
!stat(path, st)  S_ISDIR(st.st_mode)) {
; /* somebody created it since we checked */
+   } else if (errno == ENOENT  --attempts) {
+   /*
+* Either mkdir() failed bacause
+* somebody just pruned the containing
+* directory, or stat() failed because
+* the file that was in our way was
+* just removed.  Either way, try
+* again from the beginning:
+*/
+   next_component = path + 
offset_1st_component(path);
} else {
retval = -1;
}
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] safe_create_leading_directories(): reduce scope of local variable

2013-12-21 Thread Michael Haggerty
Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 sha1_file.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index c9245a6..cc9957e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -108,9 +108,10 @@ int mkdir_in_gitdir(const char *path)
 int safe_create_leading_directories(char *path)
 {
char *pos = path + offset_1st_component(path);
-   struct stat st;
 
while (pos) {
+   struct stat st;
+
pos = strchr(pos, '/');
if (!pos)
break;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] safe_create_leading_directories(): add slash pointer

2013-12-21 Thread Michael Haggerty
Keep track of the position of the slash character separately, and
restore the slash character at a single place, at the end of the while
loop.  This makes the next change easier to implement.

Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 sha1_file.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index cc9957e..dcfd35a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -107,40 +107,40 @@ int mkdir_in_gitdir(const char *path)
 
 int safe_create_leading_directories(char *path)
 {
-   char *pos = path + offset_1st_component(path);
+   char *next_component = path + offset_1st_component(path);
+   int retval = 0;
 
-   while (pos) {
+   while (!retval  next_component) {
struct stat st;
+   char *slash = strchr(next_component, '/');
 
-   pos = strchr(pos, '/');
-   if (!pos)
-   break;
-   while (*++pos == '/')
-   ;
-   if (!*pos)
-   break;
-   *--pos = '\0';
+   if (!slash)
+   return 0;
+   while (*(slash + 1) == '/')
+   slash++;
+   next_component = slash + 1;
+   if (!*next_component)
+   return 0;
+
+   *slash = '\0';
if (!stat(path, st)) {
/* path exists */
if (!S_ISDIR(st.st_mode)) {
-   *pos = '/';
-   return -3;
+   retval = -3;
}
} else if (mkdir(path, 0777)) {
if (errno == EEXIST 
!stat(path, st)  S_ISDIR(st.st_mode)) {
; /* somebody created it since we checked */
} else {
-   *pos = '/';
-   return -1;
+   retval = -1;
}
} else if (adjust_shared_perm(path)) {
-   *pos = '/';
-   return -2;
+   retval = -2;
}
-   *pos++ = '/';
+   *slash = '/';
}
-   return 0;
+   return retval;
 }
 
 int safe_create_leading_directories_const(const char *path)
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] rename_ref(): fix a mkdir()/rmdir() race

2013-12-21 Thread Michael Haggerty
When renaming a reflog file, it was possible that an empty directory
that we just created using safe_create_leading_directories() might get
deleted by another process before we have a chance to move the new
file into it.

So if the rename fails with ENOENT, then retry from the beginning.
Make up to three attempts before giving up.

It could theoretically happen that the ENOENT comes from the
disappearance of TMP_RENAMED_LOG.  In that case three pointless
attempts will be made to move the nonexistent file, but no other harm
should come of it.

Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 refs.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 3926136..3ab1491 100644
--- a/refs.c
+++ b/refs.c
@@ -2516,6 +2516,7 @@ int rename_ref(const char *oldrefname, const char 
*newrefname, const char *logms
struct stat loginfo;
int log = !lstat(git_path(logs/%s, oldrefname), loginfo);
const char *symref = NULL;
+   int attempts = 3;
 
if (log  S_ISLNK(loginfo.st_mode))
return error(reflog for %s is a symlink, oldrefname);
@@ -2555,12 +2556,12 @@ int rename_ref(const char *oldrefname, const char 
*newrefname, const char *logms
}
}
 
+ retry:
if (log  safe_create_leading_directories(git_path(logs/%s, 
newrefname))) {
error(unable to create directory for %s, newrefname);
goto rollback;
}
 
- retry:
if (log  rename(git_path(TMP_RENAMED_LOG), git_path(logs/%s, 
newrefname))) {
if (errno==EISDIR || errno==ENOTDIR) {
/*
@@ -2574,6 +2575,13 @@ int rename_ref(const char *oldrefname, const char 
*newrefname, const char *logms
}
goto retry;
} else {
+   if (errno == ENOENT  --attempts)
+   /*
+* Perhaps somebody just pruned the empty
+* directory into which we wanted to move the
+* file.
+*/
+   goto retry;
error(unable to move logfile TMP_RENAMED_LOG to 
logs/%s: %s,
newrefname, strerror(errno));
goto rollback;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Fix two mkdir/rmdir races

2013-12-21 Thread Michael Haggerty
There seem to be a lot of mkdir/rmdir races in the Git code.  Fix two
of them.

It is hard to construct test cases for races without a lot of extra
test infrastructure.  So I have tested them as best I can using
instrumented code (not included here).

Michael Haggerty (5):
  safe_create_leading_directories(): modernize format of if chaining
  safe_create_leading_directories(): reduce scope of local variable
  safe_create_leading_directories(): add slash pointer
  safe_create_leading_directories(): fix a mkdir/rmdir race
  rename_ref(): fix a mkdir()/rmdir() race

 refs.c  | 10 +-
 sha1_file.c | 56 +---
 2 files changed, 42 insertions(+), 24 deletions(-)

-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] safe_create_leading_directories(): modernize format of if chaining

2013-12-21 Thread Michael Haggerty
Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 sha1_file.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index daacc0c..c9245a6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -125,8 +125,7 @@ int safe_create_leading_directories(char *path)
*pos = '/';
return -3;
}
-   }
-   else if (mkdir(path, 0777)) {
+   } else if (mkdir(path, 0777)) {
if (errno == EEXIST 
!stat(path, st)  S_ISDIR(st.st_mode)) {
; /* somebody created it since we checked */
@@ -134,8 +133,7 @@ int safe_create_leading_directories(char *path)
*pos = '/';
return -1;
}
-   }
-   else if (adjust_shared_perm(path)) {
+   } else if (adjust_shared_perm(path)) {
*pos = '/';
return -2;
}
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/10] builtin/replace: teach listing using short, medium or full formats

2013-12-21 Thread Christian Couder
On Thu, Dec 19, 2013 at 7:58 PM, Junio C Hamano gits...@pobox.com wrote:
 Christian Couder christian.cou...@gmail.com writes:

 I think this last one might be useful for people replacing objects
 with objects that have another type.

 ... which IIUC is strongly discouraged---didn't you have to tighten
 it recently?

 And that makes it useful primarily for debugging unusual
 situations.

Ok, so would you prefer the following:

- NAME_ONLY_REPLACE_FMT and --format=name_only instead of
SHORT_REPLACE_FMT and --format=short

- NAME_AND_VALUE_REPLACE_FMT and --format=name_and_value instead of
MEDIUM_REPLACE_FMT and --format=medium

- DEBUG_REPLACE_FMT and --format=debug instead of FULL _REPLACE_FMT
and --format=full

?

Thanks,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] wt-status.c: disable those distracting -Wformat-zero-length warnings

2013-12-21 Thread Jeff King
On Fri, Dec 20, 2013 at 10:45:01AM -0500, Samuel Bronson wrote:

 These warnings don't really seem to make much sense for this file.

Agreed, though the advice so far has been to put -Wno-format-zero-length
in your CFLAGS.

 +/* We have good reasons for using zero-length format strings, and
 + * there's unfortunately no way to turn this off on a per-function
 + * basis ... */
 +#pragma GCC diagnostic ignored -Wformat-zero-length

Are other compilers happy to ignore this pragma? I guess we could wrap
it in an #ifdef, if so.

It's also really not about this file in particular. The whole concept of
format-zero-length is questionable, as it ignores the concept that a
format function might actually do something useful with an empty format
(e.g., by adding boilerplate, or having a side-effect). It's just that
this file is the only one that happens to do so.

Annotating the _function_ to say it's useful to pass an empty format
into this function would make sense, but as you note, there is no way
to do that.

So I dunno. This seems like it does not quite specify what we want to
say as well as just -Wno-format-zero-length, but it is more convenient
in practice (because we take care of it in the source code, rather than
relying on the user's build settings).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git-mv-submodule

2013-12-21 Thread fREW Schmidt
Hello all,

I was on a plane, moving around some of the many (30ish) submodules in
my dotfiles and got really annoyed at how much work it is (move the
dir, remove old from git, add new to git, fix .gitmodules, fix
.git/config, fix all the parts of the submodule config) so I wrote a
perl script to work for the most common case.

As far as I know it should work for anyone not doing Something Weird,
ie manually fiddling with their submodules.  The main case it does not
support that I'd like to in the future is submodules containing
submodules, and also at some point I'd like to wrap git mv to invoke
this script on demand automatically.

Note that this script requires perl 5.10.1, released in 2009.  If you
are stuck with something inferior to that you can comment out the
version at the top and the autodie usage and it should work further
back, but won't be quite as robust.

Enjoy!
-- 
fREW Schmidt
http://blog.afoolishmanifesto.com
#!/usr/bin/env perl

use 5.10.1;
use strict;
use warnings;

use autodie;
use File::Basename 'dirname';
use File::Path 'make_path';

die you must pass both a from and a to unless @ARGV == 2;

my ($from, $to) = @ARGV;

die you have changes in your working copy!
   unless is_clean();

# move the real dir
make_path(dirname($to));
safe_system('mv', $from, $to);

# move the git dir (not really that important)
make_path(dirname(.git/modules/$to));
safe_system('mv', .git/modules/$from, .git/modules/$to);

# update .gitmodules and .git/config book keeping
spew($_, slurp($_) =~ s/\Q$from\E/$to/gr)
   for qw( .gitmodules .git/config );

my $dir_count = scalar split qr(/), $to =~ s(/$)()r;

my $derp = ('../' x (2 + $dir_count)) . $to;

# update .git/modules/$to/config book keeping
spew(
   .git/modules/$to/config,
   slurp(.git/modules/$to/config) =~ s/worktree.*/worktree = $derp/gr
);

# update $to book keeping
spew(
   $to/.git,
   'gitdir: ' . ('../' x $dir_count) . .git/modules/$to
);

safe_system(qw( git add -A ), $from, $to, '.gitmodules' );

sub safe_system {
   system(@_);

   die @_ exited poorly :(
  if $?  8;
}

sub safe_capture {
   my $ret = qx(@_);

   die @_ exited poorly :(
  if $?  8;

   return $ret;
}

sub slurp {
   open my $fh, '', $_[0];
   local $/ = undef;
   scalar $fh
}

sub spew {
   open my $fh, '', $_[0];
   print {$fh} $_[1]
}

sub is_clean { !safe_capture(qw(git status --porcelain)) }


pgpEyJW_kdh5r.pgp
Description: PGP signature


Re: [PATCH v3 11/21] pack-objects: use bitmaps when packing objects

2013-12-21 Thread Jeff King
On Sat, Dec 07, 2013 at 04:47:20PM +0100, Thomas Rast wrote:

  +static off_t write_reused_pack(struct sha1file *f)
  +{
  +   uint8_t buffer[8192];
 
 We usually just call this 'unsigned char'.  I can see why this would be
 more portable, but git would already fall apart badly on an architecture
 where char is not 8 bits.

I think it's worth switching just for consistency with the rest of git.
Fixed.

 } +   packfile_size = write_reused_pack(f);
 } +   if (!packfile_size)
 } +   die_errno(failed to re-use existing pack);
 
 So if you just died here, when the error happens, you could take the
 chance to tell the user _which_ syscall failed.

Yeah, agreed (and especially the fact that we may get bogus errno
values). Fixed.

 Not your fault, but sha1write() is an odd function -- it purportedly is
 
   int sha1write(struct sha1file *f, const void *buf, unsigned int count);
 
 but it can only return 0.  This goes back all the way to c38138c
 (git-pack-objects: write the pack files with a SHA1 csum, 2005-06-26).

It looks like there's exactly one site that checks its return value, and
it's just to die. We should drop the return value from sha1write
entirely to make it clear that it dies on error. But that's orthogonal
to this series.

  -static int add_object_entry(const unsigned char *sha1, enum object_type 
  type,
  -   const char *name, int exclude)
  +static int add_object_entry_1(const unsigned char *sha1, enum object_type 
  type,
  + int flags, uint32_t name_hash,
  + struct packed_git *found_pack, off_t found_offset)
 [...]
 This function makes my head spin, and you're indenting it yet another
 level.

Yeah. In addition, the use of the flags here is somewhat questionable.
We use them for internal values in the call from add_object_entry to
add_object_entry_1. But we also pass the latter as a traversal callback,
meaning that what it would get in flags is totally different. It's not
actually a bug in the current code, since the bitmap traversal always
passes empty flags, but it's still rather confusing.

 If it's not too much work, can you split it into the three parts that it
 really is?  IIUC it boils down to
 
   do we have this already?
   possibly apply 'exclude', then return
   are we coming from a call path that doesn't tell us which pack to take
   it from?
   find _all_ instances in packs
   check if any of them are local .keep packs
   if so, return
   construct a packlist entry to taste

I did this split. By itself, I was on the fence, as there are actually
some interdependencies between the three parts that make it hairy.

But then I realized that instead of making the weird relationship
between add_object_entry and add_object_entry_1, we can simply make a
new function that composes the functions differently.

So we get:

+static int add_object_entry_from_bitmap(const unsigned char *sha1,
+   enum object_type type,
+   int flags, uint32_t name_hash,
+   struct packed_git *pack, off_t offset)
+{
+   uint32_t index_pos;
+
+   if (have_duplicate_entry(sha1, 0, index_pos))
+   return 0;
+
+   create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, 
offset);
+
+   display_progress(progress_state, to_pack.nr_objects);
+   return 1;
+}
[...]
+   traverse_bitmap_commit_list(add_object_entry_from_bitmap);

which makes much more sense.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/21] pack-objects: implement bitmap writing

2013-12-21 Thread Jeff King
On Sat, Dec 07, 2013 at 05:32:43PM +0100, Thomas Rast wrote:

  +pack.writebitmaps::
  +   When true, git will write a bitmap index when packing all
  +   objects to disk (e.g., as when `git repack -a` is run).  This
^^
 
 Doesn't sound right in my ears.  Remove the as?

It was meant to be such as when..., but I think it is just as clear to
drop it. Will do.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 19/21] t: add basic bitmap functionality tests

2013-12-21 Thread Jeff King
On Sat, Dec 07, 2013 at 05:43:29PM +0100, Thomas Rast wrote:

 One nit:
 
  +test_expect_success JGIT 'jgit can read our bitmaps' '
  +   git clone . compat-us.git 
  +   (
  +   cd compat-us.git 
 
 The name suggests a bare repo, but it is a full clone.  Not that it
 matters.

It was originally supposed to be a bare repo, but I had trouble
convincing jgit to actually run in a bare repo. The solution was to
switch to a non-bare one. :)

But obviously I forgot to update the name. I agree it's better to use a
more obvious name. Fixed.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 20/21] t/perf: add tests for pack bitmaps

2013-12-21 Thread Jeff King
On Sat, Dec 07, 2013 at 05:51:43PM +0100, Thomas Rast wrote:

 Jeff King p...@peff.net writes:
 
  +test_perf 'simulated fetch' '
  +   have=$(git rev-list HEAD --until=1.week.ago -1) 
 
 This will give you HEAD if your GIT_PERF_LARGE_REPO hasn't seen any
 activity lately.  I'd prefer something that always takes a fixed commit,
 e.g. HEAD~1000, keeping the perf test reproducible over time (not over
 changing GIT_PERF_LARGE_REPO, of course).

Good point. I just did a git pull on the kernel before my tests, but
of course the same test with my 3-week-stale repo would be a lot less
interesting.

I tried to figure out what is the right N for HEAD~N to represent a
week's worth of commits. It turns out the kernel is really bursty
depending on where they are in the cycle. Sometimes N is 50, and
sometimes 200. I'm picking 100. It doesn't have to be a week; the main
goal was to approximate here's how it performs if you don't repack for
a week, so anything in the ballpark is fine.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/22] pack bitmaps

2013-12-21 Thread Jeff King
Here's the v4 re-roll of the pack bitmap series.

The changes from v3 are:

 - reworked add_object_entry refactoring (see patch 11, which is new,
   and patch 12 which builds on it in a more natural way)

 - better error/die reporting from write_reused_pack

 - added Ramsay's PRIx64 compat fix

 - fixed a user-after-free in the warning message of open_pack_bitmap_1

 - minor typo/thinko fixes from Thomas in docs and tests

Interdiff is below.

  [01/23]: sha1write: make buffer const-correct
  [02/23]: revindex: Export new APIs
  [03/23]: pack-objects: Refactor the packing list
  [04/23]: pack-objects: factor out name_hash
  [05/23]: revision: allow setting custom limiter function
  [06/23]: sha1_file: export `git_open_noatime`
  [07/23]: compat: add endianness helpers
  [08/23]: ewah: compressed bitmap implementation
  [09/23]: documentation: add documentation for the bitmap format
  [10/23]: pack-bitmap: add support for bitmap indexes
  [11/23]: pack-objects: split add_object_entry
  [12/23]: pack-objects: use bitmaps when packing objects
  [13/23]: rev-list: add bitmap mode to speed up object lists
  [14/23]: pack-objects: implement bitmap writing
  [15/23]: repack: stop using magic number for ARRAY_SIZE(exts)
  [16/23]: repack: turn exts array into array-of-struct
  [17/23]: repack: handle optional files created by pack-objects
  [18/23]: repack: consider bitmaps when performing repacks
  [19/23]: count-objects: recognize .bitmap in garbage-checking
  [20/23]: t: add basic bitmap functionality tests
  [21/23]: t/perf: add tests for pack bitmaps
  [22/23]: pack-bitmap: implement optional name_hash cache
  [23/23]: compat/mingw.h: Fix the MinGW and msvc builds

---
diff --git a/Documentation/config.txt b/Documentation/config.txt
index e6d3922..499a3c4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1866,7 +1866,7 @@ pack.useBitmaps::
 
 pack.writebitmaps::
When true, git will write a bitmap index when packing all
-   objects to disk (e.g., as when `git repack -a` is run).  This
+   objects to disk (e.g., when `git repack -a` is run).  This
index can speed up the counting objects phase of subsequent
packs created for clones and fetches, at the cost of some disk
space and extra time spent on the initial repack.  Defaults to
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4504789..fd74197 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -72,11 +72,6 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
-enum {
-   OBJECT_ENTRY_EXCLUDE = (1  0),
-   OBJECT_ENTRY_NO_TRY_DELTA = (1  1)
-};
-
 /*
  * stats
  */
@@ -712,21 +707,20 @@ static struct object_entry **compute_write_order(void)
 
 static off_t write_reused_pack(struct sha1file *f)
 {
-   uint8_t buffer[8192];
+   unsigned char buffer[8192];
off_t to_write;
int fd;
 
if (!is_pack_valid(reuse_packfile))
-   return 0;
+   die(packfile is invalid: %s, reuse_packfile-pack_name);
 
fd = git_open_noatime(reuse_packfile-pack_name);
if (fd  0)
-   return 0;
+   die_errno(unable to open packfile for reuse: %s,
+ reuse_packfile-pack_name);
 
-   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) {
-   close(fd);
-   return 0;
-   }
+   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
+   die_errno(unable to seek in reused packfile);
 
if (reuse_packfile_offset  0)
reuse_packfile_offset = reuse_packfile-pack_size - 20;
@@ -736,10 +730,8 @@ static off_t write_reused_pack(struct sha1file *f)
while (to_write) {
int read_pack = xread(fd, buffer, sizeof(buffer));
 
-   if (read_pack = 0) {
-   close(fd);
-   return 0;
-   }
+   if (read_pack = 0)
+   die_errno(unable to read from reused packfile);
 
if (read_pack  to_write)
read_pack = to_write;
@@ -785,9 +777,6 @@ static void write_pack_file(void)
assert(pack_to_stdout);
 
packfile_size = write_reused_pack(f);
-   if (!packfile_size)
-   die_errno(failed to re-use existing pack);
-
offset += packfile_size;
}
 
@@ -909,86 +898,143 @@ static int no_try_delta(const char *path)
return 0;
 }
 
-static int add_object_entry_1(const unsigned char *sha1, enum object_type type,
- int flags, uint32_t name_hash,
- struct packed_git *found_pack, off_t found_offset)
+/*
+ * When adding an object, check whether we have already added it
+ * to our packing list. If so, we can skip. However, if we are
+ * being 

[PATCH v4 01/23] sha1write: make buffer const-correct

2013-12-21 Thread Jeff King
We are passed a void * and write it out without ever
touching it; let's indicate that by using const.

Signed-off-by: Jeff King p...@peff.net
---
 csum-file.c | 6 +++---
 csum-file.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 53f5375..465971c 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -11,7 +11,7 @@
 #include progress.h
 #include csum-file.h
 
-static void flush(struct sha1file *f, void *buf, unsigned int count)
+static void flush(struct sha1file *f, const void *buf, unsigned int count)
 {
if (0 = f-check_fd  count)  {
unsigned char check_buffer[8192];
@@ -86,13 +86,13 @@ int sha1close(struct sha1file *f, unsigned char *result, 
unsigned int flags)
return fd;
 }
 
-int sha1write(struct sha1file *f, void *buf, unsigned int count)
+int sha1write(struct sha1file *f, const void *buf, unsigned int count)
 {
while (count) {
unsigned offset = f-offset;
unsigned left = sizeof(f-buffer) - offset;
unsigned nr = count  left ? left : count;
-   void *data;
+   const void *data;
 
if (f-do_crc)
f-crc32 = crc32(f-crc32, buf, nr);
diff --git a/csum-file.h b/csum-file.h
index 3b540bd..9dedb03 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -34,7 +34,7 @@ extern struct sha1file *sha1fd(int fd, const char *name);
 extern struct sha1file *sha1fd_check(const char *name);
 extern struct sha1file *sha1fd_throughput(int fd, const char *name, struct 
progress *tp);
 extern int sha1close(struct sha1file *, unsigned char *, unsigned int);
-extern int sha1write(struct sha1file *, void *, unsigned int);
+extern int sha1write(struct sha1file *, const void *, unsigned int);
 extern void sha1flush(struct sha1file *f);
 extern void crc32_begin(struct sha1file *);
 extern uint32_t crc32_end(struct sha1file *);
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 06/23] sha1_file: export `git_open_noatime`

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The `git_open_noatime` helper can be of general interest for other
consumers of git's different on-disk formats.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 cache.h | 1 +
 sha1_file.c | 4 +---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 5e3fc72..f2e5aa7 100644
--- a/cache.h
+++ b/cache.h
@@ -780,6 +780,7 @@ extern int hash_sha1_file(const void *buf, unsigned long 
len, const char *type,
 extern int write_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *return_sha1);
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
+extern int git_open_noatime(const char *name);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
 extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
diff --git a/sha1_file.c b/sha1_file.c
index f80bbe4..4714bd8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -239,8 +239,6 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
-static int git_open_noatime(const char *name);
-
 /*
  * Prepare alternate object database registry.
  *
@@ -1357,7 +1355,7 @@ int check_sha1_signature(const unsigned char *sha1, void 
*map,
return hashcmp(sha1, real_sha1) ? -1 : 0;
 }
 
-static int git_open_noatime(const char *name)
+int git_open_noatime(const char *name)
 {
static int sha1_file_open_flag = O_NOATIME;
 
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 03/23] pack-objects: Refactor the packing list

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The hash table that stores the packing list for a given `pack-objects`
run was tightly coupled to the pack-objects code.

In this commit, we refactor the hash table and the underlying storage
array into a `packing_data` struct. The functionality for accessing and
adding entries to the packing list is hence accessible from other parts
of Git besides the `pack-objects` builtin.

This refactoring is a requirement for further patches in this series
that will require accessing the commit packing list from outside of
`pack-objects`.

The hash table implementation has been minimally altered: we now
use table sizes which are always a power of two, to ensure a uniform
index distribution in the array.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile   |   2 +
 builtin/pack-objects.c | 175 +++--
 pack-objects.c | 111 +++
 pack-objects.h |  47 +
 4 files changed, 200 insertions(+), 135 deletions(-)
 create mode 100644 pack-objects.c
 create mode 100644 pack-objects.h

diff --git a/Makefile b/Makefile
index af847f8..48ff0bd 100644
--- a/Makefile
+++ b/Makefile
@@ -694,6 +694,7 @@ LIB_H += notes-merge.h
 LIB_H += notes-utils.h
 LIB_H += notes.h
 LIB_H += object.h
+LIB_H += pack-objects.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
 LIB_H += parse-options.h
@@ -831,6 +832,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
+LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += pager.o
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 36273dd..f3f0cf9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -14,6 +14,7 @@
 #include diff.h
 #include revision.h
 #include list-objects.h
+#include pack-objects.h
 #include progress.h
 #include refs.h
 #include streaming.h
@@ -25,42 +26,15 @@ static const char *pack_usage[] = {
NULL
 };
 
-struct object_entry {
-   struct pack_idx_entry idx;
-   unsigned long size; /* uncompressed size */
-   struct packed_git *in_pack; /* already in pack */
-   off_t in_pack_offset;
-   struct object_entry *delta; /* delta base object */
-   struct object_entry *delta_child; /* deltified objects who bases me */
-   struct object_entry *delta_sibling; /* other deltified objects who
-* uses the same base as me
-*/
-   void *delta_data;   /* cached delta (uncompressed) */
-   unsigned long delta_size;   /* delta data size (uncompressed) */
-   unsigned long z_delta_size; /* delta data size (compressed) */
-   enum object_type type;
-   enum object_type in_pack_type;  /* could be delta */
-   uint32_t hash;  /* name hint hash */
-   unsigned char in_pack_header_size;
-   unsigned preferred_base:1; /*
-   * we do not pack this, but is available
-   * to be used as the base object to delta
-   * objects against.
-   */
-   unsigned no_try_delta:1;
-   unsigned tagged:1; /* near the very tip of refs */
-   unsigned filled:1; /* assigned write-order */
-};
-
 /*
- * Objects we are going to pack are collected in objects array (dynamically
- * expanded).  nr_objects  nr_alloc controls this array.  They are stored
- * in the order we see -- typically rev-list --objects order that gives us
- * nice minimum seek order.
+ * Objects we are going to pack are collected in the `to_pack` structure.
+ * It contains an array (dynamically expanded) of the object data, and a map
+ * that can resolve SHA1s to their position in the array.
  */
-static struct object_entry *objects;
+static struct packing_data to_pack;
+
 static struct pack_idx_entry **written_list;
-static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
+static uint32_t nr_result, nr_written;
 
 static int non_empty;
 static int reuse_delta = 1, reuse_object = 1;
@@ -90,21 +64,11 @@ static unsigned long cache_max_small_delta_size = 1000;
 static unsigned long window_memory_limit = 0;
 
 /*
- * The object names in objects array are hashed with this hashtable,
- * to help looking up the entry by object name.
- * This hashtable is built after all the objects are seen.
- */
-static int *object_ix;
-static int object_ix_hashsz;
-static struct object_entry *locate_object_entry(const unsigned char *sha1);
-
-/*
  * stats
  */
 static uint32_t written, written_delta;
 static uint32_t reused, reused_delta;
 
-
 static void *get_delta(struct object_entry *entry)
 {
unsigned long size, base_size, delta_size;
@@ -553,12 +517,12 @@ static int mark_tagged(const char *path, const unsigned 
char *sha1, int 

[PATCH v4 05/23] revision: allow setting custom limiter function

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

This commit enables users of `struct rev_info` to peform custom limiting
during a revision walk (i.e. `get_revision`).

If the field `include_check` has been set to a callback, this callback
will be issued once for each commit before it is added to the pending
list of the revwalk. If the include check returns 0, the commit will be
marked as added but won't be pushed to the pending list, effectively
limiting the walk.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 revision.c | 4 
 revision.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/revision.c b/revision.c
index 0173e01..cddd605 100644
--- a/revision.c
+++ b/revision.c
@@ -779,6 +779,10 @@ static int add_parents_to_list(struct rev_info *revs, 
struct commit *commit,
return 0;
commit-object.flags |= ADDED;
 
+   if (revs-include_check 
+   !revs-include_check(commit, revs-include_check_data))
+   return 0;
+
/*
 * If the commit is uninteresting, don't try to
 * prune parents - we want the maximal uninteresting
diff --git a/revision.h b/revision.h
index e7f1d21..9957f3c 100644
--- a/revision.h
+++ b/revision.h
@@ -168,6 +168,8 @@ struct rev_info {
unsigned long min_age;
int min_parents;
int max_parents;
+   int (*include_check)(struct commit *, void *);
+   void *include_check_data;
 
/* diff info for patches and for paths limiting */
struct diff_options diffopt;
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 07/23] compat: add endianness helpers

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The POSIX standard doesn't currently define a `ntohll`/`htonll`
function pair to perform network-to-host and host-to-network
swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk
storage of EWAH bitmaps if they are not in native byte order.

Many thanks to Ramsay Jones ram...@ramsay1.demon.co.uk and
Torsten Bögershausen tbo...@web.de for cygwin/mingw/msvc
portability fixes.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 compat/bswap.h | 76 +-
 1 file changed, 75 insertions(+), 1 deletion(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index 5061214..c18a78e 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -17,7 +17,20 @@ static inline uint32_t default_swab32(uint32_t val)
((val  0x00ff)  24));
 }
 
+static inline uint64_t default_bswap64(uint64_t val)
+{
+   return (((val  (uint64_t)0x00ffULL)  56) |
+   ((val  (uint64_t)0xff00ULL)  40) |
+   ((val  (uint64_t)0x00ffULL)  24) |
+   ((val  (uint64_t)0xff00ULL)   8) |
+   ((val  (uint64_t)0x00ffULL)   8) |
+   ((val  (uint64_t)0xff00ULL)  24) |
+   ((val  (uint64_t)0x00ffULL)  40) |
+   ((val  (uint64_t)0xff00ULL)  56));
+}
+
 #undef bswap32
+#undef bswap64
 
 #if defined(__GNUC__)  (defined(__i386__) || defined(__x86_64__))
 
@@ -32,15 +45,42 @@ static inline uint32_t git_bswap32(uint32_t x)
return result;
 }
 
+#define bswap64 git_bswap64
+#if defined(__x86_64__)
+static inline uint64_t git_bswap64(uint64_t x)
+{
+   uint64_t result;
+   if (__builtin_constant_p(x))
+   result = default_bswap64(x);
+   else
+   __asm__(bswap %q0 : =r (result) : 0 (x));
+   return result;
+}
+#else
+static inline uint64_t git_bswap64(uint64_t x)
+{
+   union { uint64_t i64; uint32_t i32[2]; } tmp, result;
+   if (__builtin_constant_p(x))
+   result.i64 = default_bswap64(x);
+   else {
+   tmp.i64 = x;
+   result.i32[0] = git_bswap32(tmp.i32[1]);
+   result.i32[1] = git_bswap32(tmp.i32[0]);
+   }
+   return result.i64;
+}
+#endif
+
 #elif defined(_MSC_VER)  (defined(_M_IX86) || defined(_M_X64))
 
 #include stdlib.h
 
 #define bswap32(x) _byteswap_ulong(x)
+#define bswap64(x) _byteswap_uint64(x)
 
 #endif
 
-#ifdef bswap32
+#if defined(bswap32)
 
 #undef ntohl
 #undef htonl
@@ -48,3 +88,37 @@ static inline uint32_t git_bswap32(uint32_t x)
 #define htonl(x) bswap32(x)
 
 #endif
+
+#if defined(bswap64)
+
+#undef ntohll
+#undef htonll
+#define ntohll(x) bswap64(x)
+#define htonll(x) bswap64(x)
+
+#else
+
+#undef ntohll
+#undef htonll
+
+#if !defined(__BYTE_ORDER)
+# if defined(BYTE_ORDER)  defined(LITTLE_ENDIAN)  defined(BIG_ENDIAN)
+#  define __BYTE_ORDER BYTE_ORDER
+#  define __LITTLE_ENDIAN LITTLE_ENDIAN
+#  define __BIG_ENDIAN BIG_ENDIAN
+# endif
+#endif
+
+#if !defined(__BYTE_ORDER)
+# error Cannot determine endianness
+#endif
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+# define ntohll(n) (n)
+# define htonll(n) (n)
+#else
+# define ntohll(n) default_bswap64(n)
+# define htonll(n) default_bswap64(n)
+#endif
+
+#endif
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 02/23] revindex: Export new APIs

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

Allow users to efficiently lookup consecutive entries that are expected
to be found on the same revindex by exporting `find_revindex_position`:
this function takes a pointer to revindex itself, instead of looking up
the proper revindex for a given packfile on each call.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 pack-revindex.c | 38 +-
 pack-revindex.h |  8 
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index b4d2b35..0bb13b1 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -16,11 +16,6 @@
  * get the object sha1 from the main index.
  */
 
-struct pack_revindex {
-   struct packed_git *p;
-   struct revindex_entry *revindex;
-};
-
 static struct pack_revindex *pack_revindex;
 static int pack_revindex_hashsz;
 
@@ -201,15 +196,14 @@ static void create_pack_revindex(struct pack_revindex 
*rix)
sort_revindex(rix-revindex, num_ent, p-pack_size);
 }
 
-struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
+struct pack_revindex *revindex_for_pack(struct packed_git *p)
 {
int num;
-   unsigned lo, hi;
struct pack_revindex *rix;
-   struct revindex_entry *revindex;
 
if (!pack_revindex_hashsz)
init_pack_revindex();
+
num = pack_revindex_ix(p);
if (num  0)
die(internal error: pack revindex fubar);
@@ -217,21 +211,39 @@ struct revindex_entry *find_pack_revindex(struct 
packed_git *p, off_t ofs)
rix = pack_revindex[num];
if (!rix-revindex)
create_pack_revindex(rix);
-   revindex = rix-revindex;
 
-   lo = 0;
-   hi = p-num_objects + 1;
+   return rix;
+}
+
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs)
+{
+   int lo = 0;
+   int hi = pridx-p-num_objects + 1;
+   struct revindex_entry *revindex = pridx-revindex;
+
do {
unsigned mi = lo + (hi - lo) / 2;
if (revindex[mi].offset == ofs) {
-   return revindex + mi;
+   return mi;
} else if (ofs  revindex[mi].offset)
hi = mi;
else
lo = mi + 1;
} while (lo  hi);
+
error(bad offset for revindex);
-   return NULL;
+   return -1;
+}
+
+struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
+{
+   struct pack_revindex *pridx = revindex_for_pack(p);
+   int pos = find_revindex_position(pridx, ofs);
+
+   if (pos  0)
+   return NULL;
+
+   return pridx-revindex + pos;
 }
 
 void discard_revindex(void)
diff --git a/pack-revindex.h b/pack-revindex.h
index 8d5027a..866ca9c 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -6,6 +6,14 @@ struct revindex_entry {
unsigned int nr;
 };
 
+struct pack_revindex {
+   struct packed_git *p;
+   struct revindex_entry *revindex;
+};
+
+struct pack_revindex *revindex_for_pack(struct packed_git *p);
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs);
+
 struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs);
 void discard_revindex(void);
 
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 18/23] repack: consider bitmaps when performing repacks

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.

This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.

Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.

Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-repack.txt | 9 -
 builtin/repack.c | 9 -
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 4c1aff6..dad186c 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository
 SYNOPSIS
 
 [verse]
-'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [--window=n] 
[--depth=n]
+'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=n] 
[--depth=n]
 
 DESCRIPTION
 ---
@@ -110,6 +110,13 @@ other objects in that pack they already have locally.
The default is unlimited, unless the config variable
`pack.packSizeLimit` is set.
 
+-b::
+--write-bitmap-index::
+   Write a reachability bitmap index as part of the repack. This
+   only makes sense when used with `-a` or `-A`, as the bitmaps
+   must be able to refer to all reachable objects. This option
+   overrides the setting of `pack.writebitmaps`.
+
 
 Configuration
 -
diff --git a/builtin/repack.c b/builtin/repack.c
index 8b7dfd0..239f278 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -94,7 +94,7 @@ static void get_non_kept_pack_filenames(struct string_list 
*fname_list)
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
-   const char *exts[] = {.pack, .idx, .keep};
+   const char *exts[] = {.pack, .idx, .keep, .bitmap};
int i;
struct strbuf buf = STRBUF_INIT;
size_t plen;
@@ -121,6 +121,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
} exts[] = {
{.pack},
{.idx},
+   {.bitmap, 1},
};
struct child_process cmd;
struct string_list_item *item;
@@ -143,6 +144,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int write_bitmap = -1;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, pack_everything,
@@ -161,6 +163,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
OPT__QUIET(quiet, N_(be quiet)),
OPT_BOOL('l', local, local,
N_(pass --local to git-pack-objects)),
+   OPT_BOOL('b', write-bitmap-index, write_bitmap,
+   N_(write bitmap index)),
OPT_STRING(0, unpack-unreachable, unpack_unreachable, 
N_(approxidate),
N_(with -A, do not loosen objects older than 
this)),
OPT_INTEGER(0, window, window,
@@ -202,6 +206,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argv_array_pushf(cmd_args, --no-reuse-delta);
if (no_reuse_object)
argv_array_pushf(cmd_args, --no-reuse-object);
+   if (write_bitmap = 0)
+   argv_array_pushf(cmd_args, --%swrite-bitmap-index,
+write_bitmap ?  : no-);
 
if (pack_everything  ALL_INTO_ONE) {
get_non_kept_pack_filenames(existing_packs);
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 13/23] rev-list: add bitmap mode to speed up object lists

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The bitmap reachability index used to speed up the counting objects
phase during `pack-objects` can also be used to optimize a normal
rev-list if the only thing required are the SHA1s of the objects during
the list (i.e., not the path names at which trees and blobs were found).

Calling `git rev-list --objects --use-bitmap-index [committish]` will
perform an object iteration based on a bitmap result instead of actually
walking the object graph.

These are some example timings for `torvalds/linux` (warm cache,
best-of-five):

$ time git rev-list --objects master  /dev/null

real0m34.191s
user0m33.904s
sys 0m0.268s

$ time git rev-list --objects --use-bitmap-index master  /dev/null

real0m1.041s
user0m0.976s
sys 0m0.064s

Likewise, using `git rev-list --count --use-bitmap-index` will speed up
the counting operation by building the resulting bitmap and performing a
fast popcount (number of bits set on the bitmap) on the result.

Here are some sample timings of different ways to count commits in
`torvalds/linux`:

$ time git rev-list master | wc -l
399882

real0m6.524s
user0m6.060s
sys 0m3.284s

$ time git rev-list --count master
399882

real0m4.318s
user0m4.236s
sys 0m0.076s

$ time git rev-list --use-bitmap-index --count master
399882

real0m0.217s
user0m0.176s
sys 0m0.040s

This also respects negative refs, so you can use it to count
a slice of history:

$ time git rev-list --count v3.0..master
144843

real0m1.971s
user0m1.932s
sys 0m0.036s

$ time git rev-list --use-bitmap-index --count v3.0..master
real0m0.280s
user0m0.220s
sys 0m0.056s

Though note that the closer the endpoints, the less it helps. In the
traversal case, we have fewer commits to cross, so we take less time.
But the bitmap time is dominated by generating the pack revindex, which
is constant with respect to the refs given.

Note that you cannot yet get a fast --left-right count of a symmetric
difference (e.g., --count --left-right master...topic). The slow part
of that walk actually happens during the merge-base determination when
we parse master...topic. Even though a count does not actually need to
know the real merge base (it only needs to take the symmetric difference
of the bitmaps), the revision code would require some refactoring to
handle this case.

Additionally, a `--test-bitmap` flag has been added that will perform
the same rev-list manually (i.e. using a normal revwalk) and using
bitmaps, and verify that the results are the same. This can be used to
exercise the bitmap code, and also to verify that the contents of the
.bitmap file are sane.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-rev-list.txt |  1 +
 Documentation/rev-list-options.txt |  8 
 builtin/rev-list.c | 39 ++
 3 files changed, 48 insertions(+)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index 045b37b..7a1585d 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -55,6 +55,7 @@ SYNOPSIS
 [ \--reverse ]
 [ \--walk-reflogs ]
 [ \--no-walk ] [ \--do-walk ]
+[ \--use-bitmap-index ]
 commit... [ \-- paths... ]
 
 DESCRIPTION
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 5bdfb42..c236b85 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -274,6 +274,14 @@ See also linkgit:git-reflog[1].
Output excluded boundary commits. Boundary commits are
prefixed with `-`.
 
+ifdef::git-rev-list[]
+--use-bitmap-index::
+
+   Try to speed up the traversal using the pack bitmap index (if
+   one is available). Note that when traversing with `--objects`,
+   trees and blobs will not have their associated path printed.
+endif::git-rev-list[]
+
 --
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 4fc1616..5209255 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -3,6 +3,8 @@
 #include diff.h
 #include revision.h
 #include list-objects.h
+#include pack.h
+#include pack-bitmap.h
 #include builtin.h
 #include log-tree.h
 #include graph.h
@@ -257,6 +259,18 @@ static int show_bisect_vars(struct rev_list_info *info, 
int reaches, int all)
return 0;
 }
 
+static int show_object_fast(
+   const unsigned char *sha1,
+   enum object_type type,
+   int exclude,
+   uint32_t name_hash,
+   struct packed_git *found_pack,
+   off_t found_offset)
+{
+   fprintf(stdout, %s\n, sha1_to_hex(sha1));
+   return 1;
+}
+
 int 

[PATCH v4 16/23] repack: turn exts array into array-of-struct

2013-12-21 Thread Jeff King
This is slightly more verbose, but will let us annotate the
extensions with further options in future commits.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 2e88975..a176de2 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -115,7 +115,12 @@ static void remove_redundant_pack(const char *dir_name, 
const char *base_name)
 
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
-   const char *exts[] = {.pack, .idx};
+   struct {
+   const char *name;
+   } exts[] = {
+   {.pack},
+   {.idx},
+   };
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@@ -261,14 +266,14 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup(%s/%s%s, packdir,
-   item-string, exts[ext]);
+   item-string, exts[ext].name);
if (!file_exists(fname)) {
free(fname);
continue;
}
 
fname_old = mkpath(%s/old-%s%s, packdir,
-   item-string, exts[ext]);
+   item-string, exts[ext].name);
if (file_exists(fname_old))
if (unlink(fname_old))
failed = 1;
@@ -319,9 +324,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
char *fname, *fname_old;
struct stat statbuffer;
fname = mkpathdup(%s/pack-%s%s,
-   packdir, item-string, exts[ext]);
+   packdir, item-string, exts[ext].name);
fname_old = mkpathdup(%s-%s%s,
-   packtmp, item-string, exts[ext]);
+   packtmp, item-string, exts[ext].name);
if (!stat(fname_old, statbuffer)) {
statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | 
S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
@@ -340,7 +345,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
fname = mkpath(%s/old-pack-%s%s,
packdir,
item-string,
-   exts[ext]);
+   exts[ext].name);
if (remove_path(fname))
warning(_(removing '%s' failed), fname);
}
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 20/23] t: add basic bitmap functionality tests

2013-12-21 Thread Jeff King
Now that we can read and write bitmaps, we can exercise them
with some basic functionality tests. These tests aren't
particularly useful for seeing the benefit, as the test
repo is too small for it to make a difference. However, we
can at least check that using bitmaps does not break anything.

Signed-off-by: Jeff King p...@peff.net
---
 t/t5310-pack-bitmaps.sh | 138 
 1 file changed, 138 insertions(+)
 create mode 100755 t/t5310-pack-bitmaps.sh

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
new file mode 100755
index 000..d2b0c45
--- /dev/null
+++ b/t/t5310-pack-bitmaps.sh
@@ -0,0 +1,138 @@
+#!/bin/sh
+
+test_description='exercise basic bitmap functionality'
+. ./test-lib.sh
+
+test_expect_success 'setup repo with moderate-sized history' '
+   for i in $(test_seq 1 10); do
+   test_commit $i
+   done 
+   git checkout -b other HEAD~5 
+   for i in $(test_seq 1 10); do
+   test_commit side-$i
+   done 
+   git checkout master 
+   blob=$(echo tagged-blob | git hash-object -w --stdin) 
+   git tag tagged-blob $blob 
+   git config pack.writebitmaps true
+'
+
+test_expect_success 'full repack creates bitmaps' '
+   git repack -ad 
+   ls .git/objects/pack/ | grep bitmap output 
+   test_line_count = 1 output
+'
+
+test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
+   git rev-list --test-bitmap HEAD
+'
+
+rev_list_tests() {
+   state=$1
+
+   test_expect_success counting commits via bitmap ($state) '
+   git rev-list --count HEAD expect 
+   git rev-list --use-bitmap-index --count HEAD actual 
+   test_cmp expect actual
+   '
+
+   test_expect_success counting partial commits via bitmap ($state) '
+   git rev-list --count HEAD~5..HEAD expect 
+   git rev-list --use-bitmap-index --count HEAD~5..HEAD actual 
+   test_cmp expect actual
+   '
+
+   test_expect_success counting non-linear history ($state) '
+   git rev-list --count other...master expect 
+   git rev-list --use-bitmap-index --count other...master actual 

+   test_cmp expect actual
+   '
+
+   test_expect_success enumerate --objects ($state) '
+   git rev-list --objects --use-bitmap-index HEAD tmp 
+   cut -d  -f1 tmp tmp2 
+   sort tmp2 actual 
+   git rev-list --objects HEAD tmp 
+   cut -d  -f1 tmp tmp2 
+   sort tmp2 expect 
+   test_cmp expect actual
+   '
+
+   test_expect_success bitmap --objects handles non-commit objects 
($state) '
+   git rev-list --objects --use-bitmap-index HEAD tagged-blob 
actual 
+   grep $blob actual
+   '
+}
+
+rev_list_tests 'full bitmap'
+
+test_expect_success 'clone from bitmapped repository' '
+   git clone --no-local --bare . clone.git 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_expect_success 'setup further non-bitmapped commits' '
+   for i in $(test_seq 1 10); do
+   test_commit further-$i
+   done
+'
+
+rev_list_tests 'partial bitmap'
+
+test_expect_success 'fetch (partial bitmap)' '
+   git --git-dir=clone.git fetch origin master:master 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_expect_success 'incremental repack cannot create bitmaps' '
+   test_commit more-1 
+   test_must_fail git repack -d
+'
+
+test_expect_success 'incremental repack can disable bitmaps' '
+   test_commit more-2 
+   git repack -d --no-write-bitmap-index
+'
+
+test_expect_success 'full repack, reusing previous bitmaps' '
+   git repack -ad 
+   ls .git/objects/pack/ | grep bitmap output 
+   test_line_count = 1 output
+'
+
+test_expect_success 'fetch (full bitmap)' '
+   git --git-dir=clone.git fetch origin master:master 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_lazy_prereq JGIT '
+   type jgit
+'
+
+test_expect_success JGIT 'we can read jgit bitmaps' '
+   git clone . compat-jgit 
+   (
+   cd compat-jgit 
+   rm -f .git/objects/pack/*.bitmap 
+   jgit gc 
+   git rev-list --test-bitmap HEAD
+   )
+'
+
+test_expect_success JGIT 'jgit can read our bitmaps' '
+   git clone . compat-us 
+   (
+   cd compat-us 
+   git repack -adb 
+   # jgit gc will barf if it does not like our bitmaps
+   jgit gc
+   )
+'
+
+test_done
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo 

[PATCH v4 15/23] repack: stop using magic number for ARRAY_SIZE(exts)

2013-12-21 Thread Jeff King
We have a static array of extensions, but hardcode the size
of the array in our loops. Let's pull out this magic number,
which will make it easier to change.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a0ff5c7..2e88975 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -115,7 +115,7 @@ static void remove_redundant_pack(const char *dir_name, 
const char *base_name)
 
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
-   const char *exts[2] = {.pack, .idx};
+   const char *exts[] = {.pack, .idx};
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@@ -258,7 +258,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 */
failed = 0;
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup(%s/%s%s, packdir,
item-string, exts[ext]);
@@ -315,7 +315,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
/* Now the ones with the same name are out of the way... */
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
fname = mkpathdup(%s/pack-%s%s,
@@ -335,7 +335,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
/* Remove the old- files */
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname;
fname = mkpath(%s/old-pack-%s%s,
packdir,
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 11/23] pack-objects: split add_object_entry

2013-12-21 Thread Jeff King
This function actually does three things:

  1. Check whether we've already added the object to our
 packing list.

  2. Check whether the object meets our criteria for adding.

  3. Actually add the object to our packing list.

It's a little hard to see these three phases, because they
happen linearly in the rather long function. Instead, this
patch breaks them up into three separate helper functions.

The result is a little easier to follow, though it
unfortunately suffers from some optimization
interdependencies between the stages (e.g., during step 3 we
use the packing list index from step 1 and the packfile
information from step 2).

More importantly, though, the various parts can be
composed differently, as they will be in the next patch.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/pack-objects.c | 98 +++---
 1 file changed, 78 insertions(+), 20 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index faf746b..13b171d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -800,41 +800,69 @@ static int no_try_delta(const char *path)
return 0;
 }
 
-static int add_object_entry(const unsigned char *sha1, enum object_type type,
-   const char *name, int exclude)
+/*
+ * When adding an object, check whether we have already added it
+ * to our packing list. If so, we can skip. However, if we are
+ * being asked to excludei t, but the previous mention was to include
+ * it, make sure to adjust its flags and tweak our numbers accordingly.
+ *
+ * As an optimization, we pass out the index position where we would have
+ * found the item, since that saves us from having to look it up again a
+ * few lines later when we want to add the new entry.
+ */
+static int have_duplicate_entry(const unsigned char *sha1,
+   int exclude,
+   uint32_t *index_pos)
 {
struct object_entry *entry;
-   struct packed_git *p, *found_pack = NULL;
-   off_t found_offset = 0;
-   uint32_t hash = pack_name_hash(name);
-   uint32_t index_pos;
 
-   entry = packlist_find(to_pack, sha1, index_pos);
-   if (entry) {
-   if (exclude) {
-   if (!entry-preferred_base)
-   nr_result--;
-   entry-preferred_base = 1;
-   }
+   entry = packlist_find(to_pack, sha1, index_pos);
+   if (!entry)
return 0;
+
+   if (exclude) {
+   if (!entry-preferred_base)
+   nr_result--;
+   entry-preferred_base = 1;
}
 
+   return 1;
+}
+
+/*
+ * Check whether we want the object in the pack (e.g., we do not want
+ * objects found in non-local stores if the --local option was used).
+ *
+ * As a side effect of this check, we will find the packed version of this
+ * object, if any. We therefore pass out the pack information to avoid having
+ * to look it up again later.
+ */
+static int want_object_in_pack(const unsigned char *sha1,
+  int exclude,
+  struct packed_git **found_pack,
+  off_t *found_offset)
+{
+   struct packed_git *p;
+
if (!exclude  local  has_loose_object_nonlocal(sha1))
return 0;
 
+   *found_pack = NULL;
+   *found_offset = 0;
+
for (p = packed_git; p; p = p-next) {
off_t offset = find_pack_entry_one(sha1, p);
if (offset) {
-   if (!found_pack) {
+   if (!*found_pack) {
if (!is_pack_valid(p)) {
warning(packfile %s cannot be 
accessed, p-pack_name);
continue;
}
-   found_offset = offset;
-   found_pack = p;
+   *found_offset = offset;
+   *found_pack = p;
}
if (exclude)
-   break;
+   return 1;
if (incremental)
return 0;
if (local  !p-pack_local)
@@ -844,6 +872,20 @@ static int add_object_entry(const unsigned char *sha1, 
enum object_type type,
}
}
 
+   return 1;
+}
+
+static void create_object_entry(const unsigned char *sha1,
+   enum object_type type,
+   uint32_t hash,
+   int exclude,
+   int no_try_delta,
+   uint32_t index_pos,
+   struct packed_git *found_pack,
+   off_t found_offset)
+{
+   struct 

[PATCH v4 08/23] ewah: compressed bitmap implementation

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).

It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.

The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.

EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.

This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.

The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:

https://github.com/vmg/libewok

The original Java implementation can also be found on GitHub:

https://github.com/lemire/javaewah

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile   |  11 +-
 ewah/bitmap.c  | 221 
 ewah/ewah_bitmap.c | 726 +
 ewah/ewah_io.c | 193 ++
 ewah/ewah_rlw.c| 115 +
 ewah/ewok.h| 235 +
 ewah/ewok_rlw.h| 114 +
 7 files changed, 1613 insertions(+), 2 deletions(-)
 create mode 100644 ewah/bitmap.c
 create mode 100644 ewah/ewah_bitmap.c
 create mode 100644 ewah/ewah_io.c
 create mode 100644 ewah/ewah_rlw.c
 create mode 100644 ewah/ewok.h
 create mode 100644 ewah/ewok_rlw.h

diff --git a/Makefile b/Makefile
index 48ff0bd..64a1ed7 100644
--- a/Makefile
+++ b/Makefile
@@ -667,6 +667,8 @@ LIB_H += diff.h
 LIB_H += diffcore.h
 LIB_H += dir.h
 LIB_H += exec_cmd.h
+LIB_H += ewah/ewok.h
+LIB_H += ewah/ewok_rlw.h
 LIB_H += fetch-pack.h
 LIB_H += fmt-merge-msg.h
 LIB_H += fsck.h
@@ -800,6 +802,10 @@ LIB_OBJS += dir.o
 LIB_OBJS += editor.o
 LIB_OBJS += entry.o
 LIB_OBJS += environment.o
+LIB_OBJS += ewah/bitmap.o
+LIB_OBJS += ewah/ewah_bitmap.o
+LIB_OBJS += ewah/ewah_io.o
+LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
@@ -2474,8 +2480,9 @@ profile-clean:
$(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
 
 clean: profile-clean coverage-clean
-   $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o 
xdiff/*.o vcs-svn/*.o \
-   builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+   $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o
+   $(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o
+   $(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
$(RM) $(TEST_PROGRAMS) $(NO_INSTALL)
$(RM) -r bin-wrappers $(dep_dirs)
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
new file mode 100644
index 000..710e58c
--- /dev/null
+++ b/ewah/bitmap.c
@@ -0,0 +1,221 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, 
USA.
+ */
+#include git-compat-util.h
+#include ewok.h
+
+#define MASK(x) ((eword_t)1  (x % BITS_IN_WORD))
+#define BLOCK(x) (x / BITS_IN_WORD)
+
+struct bitmap *bitmap_new(void)
+{
+   struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap));
+   bitmap-words = ewah_calloc(32, sizeof(eword_t));
+   bitmap-word_alloc = 32;
+   return bitmap;
+}
+
+void bitmap_set(struct bitmap *self, size_t pos)
+{
+   size_t block = BLOCK(pos);
+
+   if (block = self-word_alloc) {
+   size_t old_size = self-word_alloc;
+   self-word_alloc = block * 2;
+   self-words = ewah_realloc(self-words,
+   self-word_alloc * sizeof(eword_t));
+
+   memset(self-words + old_size, 0x0,
+   

[PATCH v4 10/23] pack-bitmap: add support for bitmap indexes

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.

For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.

By using the bitmaps available in the index, this commit implements
several new functions:

- `prepare_bitmap_git`
- `prepare_bitmap_walk`
- `traverse_bitmap_commit_list`
- `reuse_partial_packfile_from_bitmap`

The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:

- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.

- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.

- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:

a) limit the subsequent walk when building the WANTs bitmap
b) finding the final set of interesting commits by performing an
   AND-NOT of the WANTs and the HAVEs.

If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.

As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.

If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.

Hence, this new set of functions are a generic API that allows to
perform the equivalent of

git rev-list --objects [roots...] [^uninteresting...]

for any set of commits, even if they don't have specific bitmaps
generated for them.

In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile  |   2 +
 khash.h   | 338 
 pack-bitmap.c | 970 ++
 pack-bitmap.h |  43 +++
 4 files changed, 1353 insertions(+)
 create mode 100644 khash.h
 create mode 100644 pack-bitmap.c
 create mode 100644 pack-bitmap.h

diff --git a/Makefile b/Makefile
index 64a1ed7..b983d78 100644
--- a/Makefile
+++ b/Makefile
@@ -699,6 +699,7 @@ LIB_H += object.h
 LIB_H += pack-objects.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
+LIB_H += pack-bitmap.h
 LIB_H += parse-options.h
 LIB_H += patch-ids.h
 LIB_H += pathspec.h
@@ -837,6 +838,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
diff --git a/khash.h b/khash.h
new file mode 100644
index 000..57ff603
--- /dev/null
+++ b/khash.h
@@ -0,0 +1,338 @@
+/* The MIT License
+
+   Copyright (c) 2008, 2009, 2011 by Attractive Chaos attrac...@live.co.uk
+
+   Permission is hereby granted, free of charge, to any person obtaining
+   a copy of this software and associated documentation files (the
+   Software), to deal in the Software without restriction, including
+   without limitation the rights to use, copy, modify, merge, publish,
+   distribute, sublicense, and/or sell copies of the Software, and to
+   permit persons to whom the Software is 

[PATCH v4 09/23] documentation: add documentation for the bitmap format

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/bitmap-format.txt | 131 ++
 1 file changed, 131 insertions(+)
 create mode 100644 Documentation/technical/bitmap-format.txt

diff --git a/Documentation/technical/bitmap-format.txt 
b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 000..7a86bd7
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,131 @@
+GIT bitmap v1 format
+
+
+   - A header appears at the beginning:
+
+   4-byte signature: {'B', 'I', 'T', 'M'}
+
+   2-byte version number (network byte order)
+   The current implementation only supports version 1
+   of the bitmap index (the same one as JGit).
+
+   2-byte flags (network byte order)
+
+   The following flags are supported:
+
+   - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+   This flag must always be present. It implies that the 
bitmap
+   index has been generated for a packfile with full 
closure
+   (i.e. where every single object in the packfile can find
+its parent links inside the same packfile). This is a
+   requirement for the bitmap index format, also present 
in JGit,
+   that greatly reduces the complexity of the 
implementation.
+
+   4-byte entry count (network byte order)
+
+   The total count of entries (bitmapped commits) in this 
bitmap index.
+
+   20-byte checksum
+
+   The SHA1 checksum of the pack this bitmap index belongs 
to.
+
+   - 4 EWAH bitmaps that act as type indexes
+
+   Type indexes are serialized after the hash cache in the shape
+   of four EWAH bitmaps stored consecutively (see Appendix A for
+   the serialization format of an EWAH bitmap).
+
+   There is a bitmap for each Git object type, stored in the 
following
+   order:
+
+   - Commits
+   - Trees
+   - Blobs
+   - Tags
+
+   In each bitmap, the `n`th bit is set to true if the `n`th object
+   in the packfile is of that type.
+
+   The obvious consequence is that the OR of all 4 bitmaps will 
result
+   in a full set (all bits set), and the AND of all 4 bitmaps will
+   result in an empty bitmap (no bits set).
+
+   - N entries with compressed bitmaps, one for each indexed commit
+
+   Where `N` is the total amount of entries in this bitmap index.
+   Each entry contains the following:
+
+   - 4-byte object position (network byte order)
+   The position **in the index for the packfile** where the
+   bitmap for this commit is found.
+
+   - 1-byte XOR-offset
+   The xor offset used to compress this bitmap. For an 
entry
+   in position `x`, a XOR offset of `y` means that the 
actual
+   bitmap representing this commit is composed by XORing 
the
+   bitmap for this entry with the bitmap in entry `x-y` 
(i.e.
+   the bitmap `y` entries before this one).
+
+   Note that this compression can be recursive. In order to
+   XOR this entry with a previous one, the previous entry 
needs
+   to be decompressed first, and so on.
+
+   The hard-limit for this offset is 160 (an entry can 
only be
+   xor'ed against one of the 160 entries preceding it). 
This
+   number is always positive, and hence entries are always 
xor'ed
+   with **previous** bitmaps, not bitmaps that will come 
afterwards
+   in the index.
+
+   - 1-byte flags for this bitmap
+   At the moment the only available flag is `0x1`, which 
hints
+   that this bitmap can be re-used when rebuilding bitmap 
indexes
+   for the repository.
+
+   - The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+   - 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+   - 4-byte number of words of the COMPRESSED bitmap, when stored
+
+   - N x 8-byte words, as specified by the previous field
+
+  

[PATCH v4 12/23] pack-objects: use bitmaps when packing objects

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).

For bitmaps to be used, the following must be true:

  1. We must be packing to stdout (as a normal `pack-objects` from
 `upload-pack` would do).

  2. There must be a .bitmap index containing at least one of the
 have objects that the client is asking for.

  3. Bitmaps must be enabled (they are enabled by default, but can be
 disabled by setting `pack.usebitmaps` to false, or by using
 `--no-use-bitmap-index` on the command-line).

If any of these is not true, we fall back to doing a normal walk of the
object graph.

Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:

[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
/dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m44.111s
user0m42.396s
sys 0m3.544s

[bitmaps only, without partial pack reuse; note that
 pack reuse is automatic, so timing this required a
 patch to disable it]
$ time git pack-objects --all --stdout /dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m5.413s
user0m5.604s
sys 0m1.804s

[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout /dev/null /dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)

real0m1.636s
user0m1.460s
sys 0m0.172s

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/config.txt |   6 +++
 builtin/pack-objects.c   | 107 +++
 2 files changed, 113 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab26963..a981369 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1858,6 +1858,12 @@ pack.packSizeLimit::
Common unit suffixes of 'k', 'm', or 'g' are
supported.
 
+pack.useBitmaps::
+   When true, git will use pack bitmaps (if available) when packing
+   to stdout (e.g., during the server side of a fetch). Defaults to
+   true. You should not generally need to turn this off unless
+   you are debugging pack bitmaps.
+
 pager.cmd::
If the value is boolean, turns on or off pagination of the
output of a particular Git subcommand when writing to a tty.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 13b171d..030d894 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -19,6 +19,7 @@
 #include refs.h
 #include streaming.h
 #include thread-utils.h
+#include pack-bitmap.h
 
 static const char *pack_usage[] = {
N_(git pack-objects --stdout [options...] [ ref-list |  
object-list]),
@@ -57,6 +58,12 @@ static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
 
+static struct packed_git *reuse_packfile;
+static uint32_t reuse_packfile_objects;
+static off_t reuse_packfile_offset;
+
+static int use_bitmap_index = 1;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -678,6 +685,46 @@ static struct object_entry **compute_write_order(void)
return wo;
 }
 
+static off_t write_reused_pack(struct sha1file *f)
+{
+   unsigned char buffer[8192];
+   off_t to_write;
+   int fd;
+
+   if (!is_pack_valid(reuse_packfile))
+   die(packfile is invalid: %s, reuse_packfile-pack_name);
+
+   fd = git_open_noatime(reuse_packfile-pack_name);
+   if (fd  0)
+   die_errno(unable to open packfile for reuse: %s,
+ reuse_packfile-pack_name);
+
+   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
+   die_errno(unable to seek in reused packfile);
+
+   if (reuse_packfile_offset  0)
+   reuse_packfile_offset = reuse_packfile-pack_size - 20;
+
+   to_write = reuse_packfile_offset - sizeof(struct pack_header);
+
+   while (to_write) {
+   int read_pack 

[PATCH v4 17/23] repack: handle optional files created by pack-objects

2013-12-21 Thread Jeff King
We ask pack-objects to pack to a set of temporary files, and
then rename them into place. Some files that pack-objects
creates may be optional (like a .bitmap file), in which case
we would not want to call rename(). We already call stat()
and make the chmod optional if the file cannot be accessed.
We could simply skip the rename step in this case, but that
would be a minor regression in noticing problems with
non-optional files (like the .pack and .idx files).

Instead, we can now annotate extensions as optional, and
skip them if they don't exist (and otherwise rely on
rename() to barf).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a176de2..8b7dfd0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -117,6 +117,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 {
struct {
const char *name;
+   unsigned optional:1;
} exts[] = {
{.pack},
{.idx},
@@ -323,6 +324,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
+   int exists = 0;
fname = mkpathdup(%s/pack-%s%s,
packdir, item-string, exts[ext].name);
fname_old = mkpathdup(%s-%s%s,
@@ -330,9 +332,12 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
if (!stat(fname_old, statbuffer)) {
statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | 
S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
+   exists = 1;
+   }
+   if (exists || !exts[ext].optional) {
+   if (rename(fname_old, fname))
+   die_errno(_(renaming '%s' failed), 
fname_old);
}
-   if (rename(fname_old, fname))
-   die_errno(_(renaming '%s' failed), fname_old);
free(fname);
free(fname_old);
}
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 22/23] pack-bitmap: implement optional name_hash cache

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.

In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.

As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have have tips in the
bitmapped history, but want newer objects.

The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.

But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
new and old objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.

This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.

Here are perf results for p5310:

Test  origin/master   HEAD^  HEAD
-
5310.2: repack to disk36.81(37.82+1.43)   47.70(48.74+1.41) +29.6%   
47.75(48.70+1.51) +29.7%
5310.3: simulated clone   30.78(29.70+2.14)   1.08(0.97+0.10) -96.5% 
1.07(0.94+0.12) -96.5%
5310.4: simulated fetch   3.16(6.10+0.08) 3.54(10.65+0.06) +12.0%
1.70(3.07+0.06) -46.2%
5310.6: partial bitmap36.76(43.19+1.81)   6.71(11.25+0.76) -81.7%
4.08(6.26+0.46) -88.9%

You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/config.txt  | 11 +++
 Documentation/technical/bitmap-format.txt | 33 +++
 builtin/pack-objects.c| 10 +-
 pack-bitmap-write.c   | 21 ++--
 pack-bitmap.c | 11 +++
 pack-bitmap.h |  6 --
 t/perf/p5310-pack-bitmaps.sh  |  3 ++-
 t/t5310-pack-bitmaps.sh   |  3 ++-
 8 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4b0c368..499a3c4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1872,6 +1872,17 @@ pack.writebitmaps::
space and extra time spent on the initial repack.  Defaults to
false.
 
+pack.writeBitmapHashCache::
+   When true, git will include a hash cache section in the bitmap
+   index (if one is written). This cache can be used to feed git's
+   delta heuristics, potentially leading to better deltas between
+   bitmapped and non-bitmapped objects (e.g., when serving a fetch
+   between an older, bitmapped pack and objects that have been
+   pushed since the last gc). The downside is that it consumes 4
+   bytes per object of disk space, and that JGit's bitmap
+   implementation does not understand it, causing it to complain if
+   Git and JGit are used on the same repository. Defaults to false.
+
 pager.cmd::
If the value is boolean, turns on or off pagination of the
output of a particular Git subcommand when writing to a tty.
diff --git a/Documentation/technical/bitmap-format.txt 
b/Documentation/technical/bitmap-format.txt
index 7a86bd7..f8c18a0 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -21,6 +21,12 @@ GIT bitmap v1 format
requirement for the bitmap index format, also present 
in JGit,
that greatly reduces the complexity of the 
implementation.
 
+   - BITMAP_OPT_HASH_CACHE (0x4)
+   If present, the end of the bitmap file contains
+   `N` 32-bit name-hash values, one per object in the
+   pack. The format and meaning of the name-hash is
+   described below.
+
4-byte entry count (network byte order)
 
The total count of entries (bitmapped commits) in this 
bitmap index.
@@ -129,3 +135,30 @@ The bitstream represented by the above chunk is then:
 The next word after `L_M` (if any) must again be a RLW, for the next
 

[PATCH v4 21/23] t/perf: add tests for pack bitmaps

2013-12-21 Thread Jeff King
This adds a few basic perf tests for the pack bitmap code to
show off its improvements. The tests are:

  1. How long does it take to do a repack (it gets slower
 with bitmaps, since we have to do extra work)?

  2. How long does it take to do a clone (it gets faster
 with bitmaps)?

  3. How does a small fetch perform when we've just
 repacked?

  4. How does a clone perform when we haven't repacked since
 a week of pushes?

Here are results against linux.git:

Test  origin/master   this tree
---
5310.2: repack to disk33.64(32.64+2.04)   67.67(66.75+1.84) +101.2%
5310.3: simulated clone   30.49(29.47+2.05)   1.20(1.10+0.10) -96.1%
5310.4: simulated fetch   3.49(6.79+0.06) 5.57(22.35+0.07) +59.6%
5310.6: partial bitmap36.70(43.87+1.81)   8.18(21.92+0.73) -77.7%

You can see that we do take longer to repack, but we do way
better for further clones. A small fetch performs a bit
worse, as we spend way more time on delta compression (note
the heavy user CPU time, as we have 8 threads) due to the
lack of name hashes for the bitmapped objects.

The final test shows how the bitmaps degrade over time
between packs. There's still a significant speedup over the
non-bitmap case, but we don't do quite as well (we have to
spend time accessing the new objects the old fashioned
way, including delta compression).

Signed-off-by: Jeff King p...@peff.net
---
 t/perf/p5310-pack-bitmaps.sh | 56 
 1 file changed, 56 insertions(+)
 create mode 100755 t/perf/p5310-pack-bitmaps.sh

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
new file mode 100755
index 000..8c6ae45
--- /dev/null
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -0,0 +1,56 @@
+#!/bin/sh
+
+test_description='Tests pack performance using bitmaps'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+# note that we do everything through config,
+# since we want to be able to compare bitmap-aware
+# git versus non-bitmap git
+test_expect_success 'setup bitmap config' '
+   git config pack.writebitmaps true
+'
+
+test_perf 'repack to disk' '
+   git repack -ad
+'
+
+test_perf 'simulated clone' '
+   git pack-objects --stdout --all /dev/null /dev/null
+'
+
+test_perf 'simulated fetch' '
+   have=$(git rev-list HEAD~100 -1) 
+   {
+   echo HEAD 
+   echo ^$have
+   } | git pack-objects --revs --stdout /dev/null
+'
+
+test_expect_success 'create partial bitmap state' '
+   # pick a commit to represent the repo tip in the past
+   cutoff=$(git rev-list HEAD~100 -1) 
+   orig_tip=$(git rev-parse HEAD) 
+
+   # now kill off all of the refs and pretend we had
+   # just the one tip
+   rm -rf .git/logs .git/refs/* .git/packed-refs
+   git update-ref HEAD $cutoff
+
+   # and then repack, which will leave us with a nice
+   # big bitmap pack of the old history, and all of
+   # the new history will be loose, as if it had been pushed
+   # up incrementally and exploded via unpack-objects
+   git repack -Ad
+
+   # and now restore our original tip, as if the pushes
+   # had happened
+   git update-ref HEAD $orig_tip
+'
+
+test_perf 'partial bitmap' '
+   git pack-objects --stdout --all /dev/null /dev/null
+'
+
+test_done
-- 
1.8.5.1.399.g900e7cd

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 14/23] pack-objects: implement bitmap writing

2013-12-21 Thread Jeff King
From: Vicent Marti tan...@gmail.com

This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.

If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.

Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:

1. `bitmap_writer_set_checksum`: this call stores the partial
   checksum for the packfile being written; the checksum will be
   written in the resulting bitmap index to verify its integrity

2. `bitmap_writer_build_type_index`: this call uses the array of
   `struct object_entry` that has just been sorted when writing out
   the actual packfile index to disk to generate 4 type-index bitmaps
   (one for each object type).

   These bitmaps have their nth bit set if the given object is of
   the bitmap's type. E.g. the nth bit of the Commits bitmap will be
   1 if the nth object in the packfile index is a commit.

   This is a very cheap operation because the bitmap writing code has
   access to the metadata stored in the `struct object_entry` array,
   and hence the real type for each object in the packfile.

3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
   index for one of the packfiles we're trying to repack, this call
   will efficiently rebuild the existing bitmaps so they can be
   reused on the new index. All the existing bitmaps will be stored
   in a `reuse` hash table, and the commit selection phase will
   prioritize these when selecting, as they can be written directly
   to the new index without having to perform a revision walk to
   fill the bitmap. This can greatly speed up the repack of a
   repository that already has bitmaps.

4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
   a given `pack-objects` run, the sequence of commits generated
   during the Counting Objects phase will be stored in an array.

   We then use that array to build up the list of selected commits.
   Writing a bitmap in the index for each object in the repository
   would be cost-prohibitive, so we use a simple heuristic to pick
   the commits that will be indexed with bitmaps.

   The current heuristics are a simplified version of JGit's
   original implementation. We select a higher density of commits
   depending on their age: the 100 most recent commits are always
   selected, after that we pick 1 commit of each 100, and the gap
   increases as the commits grow older. On top of that, we make sure
   that every single branch that has not been merged (all the tips
   that would be required from a clone) gets their own bitmap, and
   when selecting commits between a gap, we tend to prioritize the
   commit with the most parents.

   Do note that there is no right/wrong way to perform commit
   selection; different selection algorithms will result in
   different commits being selected, but there's no such thing as
   missing a commit. The bitmap walker algorithm implemented in
   `prepare_bitmap_walk` is able to adapt to missing bitmaps by
   performing manual walks that complete the bitmap: the ideal
   selection algorithm, however, would select the commits that are
   more likely to be used as roots for a walk in the future (e.g.
   the tips of each branch, and so on) to ensure a bitmap for them
   is always available.

5. `bitmap_writer_build`: this is the computationally expensive part
   of bitmap generation. Based on the list of commits that were
   selected in the previous step, we perform several incremental
   walks to generate the bitmap for each commit.

   The walks begin from the oldest commit, and are built up
   incrementally for each branch. E.g. consider this dag where A, B,
   C, D, E, F are the selected commits, and a, b, c, e are a chunk
   of simplified history that will not receive bitmaps.

A---a---B--b--C--c--D
 \
  E--e--F

   We start by building the bitmap for A, using A as the root for a
   revision walk and marking all the objects that are reachable
   until the walk is over. Once this bitmap is stored, we reuse the
   bitmap walker to perform the walk for B, assuming that once we
   reach A again, the walk will be terminated because A has already
   been SEEN on the previous walk.

   This process 

[PATCH v4 23/23] compat/mingw.h: Fix the MinGW and msvc builds

2013-12-21 Thread Jeff King
From: Ramsay Jones ram...@ramsay1.demon.co.uk

Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk
Signed-off-by: Junio C Hamano gits...@pobox.com
Signed-off-by: Jeff King p...@peff.net
---
 compat/mingw.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/compat/mingw.h b/compat/mingw.h
index 92cd728..8828ede 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -345,6 +345,7 @@ static inline char *mingw_find_last_dir_sep(const char 
*path)
 #define PATH_SEP ';'
 #define PRIuMAX I64u
 #define PRId64 I64d
+#define PRIx64 I64x
 
 void mingw_open_html(const char *path);
 #define open_html mingw_open_html
-- 
1.8.5.1.399.g900e7cd
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/22] pack bitmaps

2013-12-21 Thread Jeff King
On Sat, Dec 21, 2013 at 08:56:51AM -0500, Jeff King wrote:

 Interdiff is below.
 
   [01/23]: sha1write: make buffer const-correct
   [02/23]: revindex: Export new APIs
   [03/23]: pack-objects: Refactor the packing list
   [04/23]: pack-objects: factor out name_hash
   [05/23]: revision: allow setting custom limiter function
   [06/23]: sha1_file: export `git_open_noatime`
   [07/23]: compat: add endianness helpers
   [08/23]: ewah: compressed bitmap implementation
   [09/23]: documentation: add documentation for the bitmap format

By the way, the patches are identical up through 09/23. I think the
first one is already merged into another topic, too, so it may be worth
building on that instead of re-applying.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/22] pack bitmaps

2013-12-21 Thread Jeff King
On Sat, Dec 21, 2013 at 08:56:51AM -0500, Jeff King wrote:

 The changes from v3 are:
 
  - reworked add_object_entry refactoring (see patch 11, which is new,
and patch 12 which builds on it in a more natural way)
 
  - better error/die reporting from write_reused_pack
 
  - added Ramsay's PRIx64 compat fix
 
  - fixed a user-after-free in the warning message of open_pack_bitmap_1
 
  - minor typo/thinko fixes from Thomas in docs and tests

One thing explicitly _not_ here is ripping out khash in favor of
Karsten's hash system. That is still on the table, but I'd much rather
do it on top if we are going to.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html