Re: Request: timeout option for remote operations, esp. git fetch

2013-11-14 Thread Jeff King
On Tue, Nov 12, 2013 at 10:33:49AM -0800, H. Peter Anvin wrote:

  Which means that your original wish may not be granted with
  SO_KEEPALIVE at all, no?  I was wondering if you wanted a forced
  timeout based on alarm(2), something similar to what you added to
  git-daemon in 960deccb (git-daemon: timeout, eliminate double DWIM,
  2005-10-19).
  
 
 Yes, something more like that on the client end.  SO_KEEPALIVE is better
 than nothing, but not really good enough.

Would it be enough to just use timeout(1), like:

  timeout 10m git fetch

That will time the _whole_ fetch operation, which means a legitimately
gigantic but fast fetch would still fail. Setting a shorter timeout only
for periods of inactivity on the network socket would catch killed or
very laggy connections. But it would not catch a server that feeds you
data at a constant but ridiculously slow rate.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Jeff King
On Thu, Nov 14, 2013 at 08:56:07AM +0100, Thomas Rast wrote:

  Whatever it was that happened to a hundred or more repos on the Jenkins
  project seems to be stirring up this debate in some circles.
 
 Making us so curious ... and then you just leave us hanging there ;-)
 
 Any pointers to this debate?

I do not know about any particular debate in git circles, but I assume
Sitaram is referring to this incident:

  https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ

in which a Jenkins dev force-pushed and rewound history on 150 different
repos. In this case the reflog made rollback easy, but if he had pushed
a deletion, it would be harder.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Jeff King
On Thu, Nov 14, 2013 at 05:48:50AM +0530, Sitaram Chamarty wrote:

 Is there *any* way we can preserve a reflog for a deleted branch,
 perhaps under logs/refs/deleted/timestamp/full/ref/name ?

I had patches to do something like this here:

  http://thread.gmane.org/gmane.comp.version-control.git/201715/focus=201752

but there were definitely some buggy corners, as much of the code
assumed you needed to have a ref to have a reflog. I don't even run with
it locally anymore.

At GitHub, we log each change to an audit log in addition to the
regular reflog (we also stuff extra data from the environment into the
reflog message). So even after a branch is deleted, its audit log
entries remain, though you have to pull out the data by hand (git
doesn't know about it at all, except as an append-only sink for
writing). And git doesn't use the audit log for connectivity, either, so
eventually the objects could be pruned.

 Just some basic protection -- don't delete the reflog, and instead,
 rename it to something that preserves the name but in a different
 namespace.

That part is easy. Accessing it seamlessly and handling reflog
expiration are a little harder. Not because they're intractable, but
just because there are some low-level assumptions in the git code. The
patch series I mentioned above mostly works. It probably just needs
somebody to go through and find the corner cases.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem while cloning a git repo

2013-11-14 Thread Yann COLLETTE
I am not able to clone via https because gerrit doesn't propose this way 
of cloning.

When I clone via http, I see that git is starting the download of objects:
remote: Counting objects: 256, done
remote: Finding sources: 100% (256/256)
Receiving objects: 46% ... (this part always fails at 46 %).
So, I think that the proxy part is not a problem (there is no proxy set 
here).

If I set GIT_CURL_VERBOSE=1 before cloning, I've got the following error:
* Problem (2) in the Chunked-Encoded data
* Closing connection 1

YC

Le 14/11/2013 00:40, brian m. carlson a écrit :

On Wed, Nov 13, 2013 at 08:43:28AM +0100, Yann COLLETTE wrote:

Hello,

When I perform the git clone via git, it works. The problem is only
happening via http.
I tried to play with http.postBuffer and I set this parameter to
it's maximum (a little bit before a git clone triggered a memory
alloc problem) and I see that something big is trying to be
downloaded. But I don't see such a big file in my history of
commits. The maximum one seems to be around 50 Mo ...

Please keep the list in CC.

http.postBuffer doesn't affect clones, only pushes, so that isn't
relevant here.  You're experiencing something that is dropping the
connection over HTTP.  So you either have a bad proxy, or something else
is causing the connection to be dropped.  Since it's only over HTTP, I
suspect it's the former.

Do you have HTTPS, and if so, does it work if you try to clone over
that?



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] config: arbitrary number of matches for --unset and --replace-all

2013-11-14 Thread Jeff King
On Thu, Nov 14, 2013 at 08:50:43AM +0100, Thomas Rast wrote:

 git-config used a static match array to hold the matches we want to
 unset/replace when using --unset or --replace-all.  Use a
 variable-sized array instead.
 
 This in particular fixes the symptoms git-svn had when storing large
 numbers of svn-remote.*.added-placeholder entries in the config file.
 
 While the tests are rather more paranoid than just --unset and
 --replace-all, the other operations already worked.  Indeed git-svn's
 usage only breaks the first time *after* creating so many entries,
 when it wants to unset and re-add them all.
 
 Reported-by: Jess Hottenstein jess.hottenst...@gmail.com
 Signed-off-by: Thomas Rast t...@thomasrast.ch

This looks good to me.

I agree with your earlier assessment that adding tons of config keys is
probably a bad idea. It's not just slow when looking up those keys, but
it also slows down every single git operation (which might read config
many times, if it is a script).  Still, we should at least do the right
thing in the face of such config.

 @@ -1260,11 +1259,15 @@ static int store_aux(const char *key, const char 
 *value, void *cb)
* Do not increment matches: this is no match, but we
* just made sure we are in the desired section.
*/
 + ALLOC_GROW(store.offset, store.seen+1,
 +store.offset_alloc);
   store.offset[store.seen] = cf-do_ftell(cf);
   /* fallthru */
   case SECTION_END_SEEN:
   case START:
   if (matches(key, value)) {
 + ALLOC_GROW(store.offset, store.seen+1,
 +store.offset_alloc);
   store.offset[store.seen] = cf-do_ftell(cf);
   store.state = KEY_SEEN;
   store.seen++;

This code is weird to follow because of the fall-throughs. I do not
think you have introduced any bugs with your patch, but it seems weird
to me that we set the offset at the top of the hunk. If we hit the
conditional in the bottom half, we do actually increment storer.seen,
but only _after_ having overwritten the value from above (with the same
value, no less).

But if we do not follow that code path, we may end up here:

 @@ -1272,6 +1275,9 @@ static int store_aux(const char *key, const char 
 *value, void *cb)
   if (strrchr(key, '.') - key == store.baselen 
 !strncmp(key, store.key, store.baselen)) {
   store.state = SECTION_SEEN;
 + ALLOC_GROW(store.offset,
 +store.seen+1,
 +store.offset_alloc);
   store.offset[store.seen] = 
 cf-do_ftell(cf);
   }
   }

where we overwrite it again, but do not update store.seen. Or we may
trigger neither, and leave the function with our offset stored, but
store.seen not incremented.

So it seems odd to me that we would set the offset, but in some cases
never actually increment our counter. AFAICT, we do not ever access
those values. The reader limits itself to i  store.seen, which makes
sense. And the writers always call a fresh ftell before incrementing
store.seen.  But maybe I am missing something.

Anyway, it is neither here nor there for your patch. Just something odd
I noticed while reading the code.

 +setup_many() {
 + setup 
 + # This time we want the newline so that we can tack on more
 + # entries.
 + echo .git/config 
 + # Semi-efficient way of concatenating 5^5 = 3125 lines. Note
 + # that because 'setup' already put one line, this means 3126
 + # entries for section.key in the config file.
 + cat 5to1 EOF 
 +  key = foo
 +  key = foo
 +  key = foo
 +  key = foo
 +  key = foo
 +EOF
 + cat 5to1 5to1 5to1 5to1 5to1 5to2   # 25
 + cat 5to2 5to2 5to2 5to2 5to2 5to3   # 125
 + cat 5to3 5to3 5to3 5to3 5to3 5to4   # 635
 + cat 5to4 5to4 5to4 5to4 5to4 .git/config # 3125
 +}

You could make this even more semi-efficient by just storing the end
result in 5to5, and then copying it into place rather than doing the
whole dance for each test. I doubt it matters that much, though.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Sitaram Chamarty
On 11/14/2013 01:37 PM, Jeff King wrote:
 On Thu, Nov 14, 2013 at 08:56:07AM +0100, Thomas Rast wrote:
 
 Whatever it was that happened to a hundred or more repos on the Jenkins
 project seems to be stirring up this debate in some circles.

 Making us so curious ... and then you just leave us hanging there ;-)

Oh my apologies; I missed the URL!!  (But Peff supplied it before I saw
this email!)

 Any pointers to this debate?
 
 I do not know about any particular debate in git circles, but I assume
 Sitaram is referring to this incident:
 
   https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ
 
 in which a Jenkins dev force-pushed and rewound history on 150 different
 repos. In this case the reflog made rollback easy, but if he had pushed
 a deletion, it would be harder.

I don't know if they had a reflog on the server side; they used
client-side reflogs if I understood correctly.

I'm talking about server side (bare repo), assuming the site has
core.logAllRefUpdates set.

And I'll explain the some circles part as something on LinkedIn.  To
be honest there's been a fair bit of FUDding by CVCS types there so I
stopped looking at the posts, but I get the subject lines by email and I
saw one that said Git History Protection - if we needed proof... or
something like that.

I admit I didn't check to see if a debate actually followed that post
:-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Jeff King
On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote:

  I do not know about any particular debate in git circles, but I assume
  Sitaram is referring to this incident:
  
https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ
  
  in which a Jenkins dev force-pushed and rewound history on 150 different
  repos. In this case the reflog made rollback easy, but if he had pushed
  a deletion, it would be harder.
 
 I don't know if they had a reflog on the server side; they used
 client-side reflogs if I understood correctly.
 
 I'm talking about server side (bare repo), assuming the site has
 core.logAllRefUpdates set.

Yes, they did have server-side reflogs (the pushes were to GitHub, and
we reflog everything). Client-side reflogs would not be sufficient, as
the client who pushed does not record the history he just rewound (he
_might_ have it at refs/remotes/origin/master@{1}, but if somebody
pushed since his last fetch, then he doesn't).

The simplest way to recover is to just have everyone push again
(without --force). The history will just silently fast-forward to
whoever has the most recent tip. The downside is that you have to wait
for that person to actually push. :)

I think they started with that, and then eventually GitHub support got
wind of it and pulled the last value for each repo out of the
server-side reflog for them.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Luca Milanesio
Would be really useful anyway to have the ability to create a server-side 
reference based on a SHA-1, using the Git protocol.
Alternatively, just fetching a remote repo based on a SHA-1 (not referenced by 
any ref-spec but still existent) so that you can create a new reference locally 
and push.

Luca.

On 14 Nov 2013, at 11:09, Jeff King p...@peff.net wrote:

 On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote:
 
 I do not know about any particular debate in git circles, but I assume
 Sitaram is referring to this incident:
 
  https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ
 
 in which a Jenkins dev force-pushed and rewound history on 150 different
 repos. In this case the reflog made rollback easy, but if he had pushed
 a deletion, it would be harder.
 
 I don't know if they had a reflog on the server side; they used
 client-side reflogs if I understood correctly.
 
 I'm talking about server side (bare repo), assuming the site has
 core.logAllRefUpdates set.
 
 Yes, they did have server-side reflogs (the pushes were to GitHub, and
 we reflog everything). Client-side reflogs would not be sufficient, as
 the client who pushed does not record the history he just rewound (he
 _might_ have it at refs/remotes/origin/master@{1}, but if somebody
 pushed since his last fetch, then he doesn't).
 
 The simplest way to recover is to just have everyone push again
 (without --force). The history will just silently fast-forward to
 whoever has the most recent tip. The downside is that you have to wait
 for that person to actually push. :)
 
 I think they started with that, and then eventually GitHub support got
 wind of it and pulled the last value for each repo out of the
 server-side reflog for them.
 
 -Peff
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/21] sha1write: make buffer const-correct

2013-11-14 Thread Jeff King
We are passed a void * and write it out without ever
touching it; let's indicate that by using const.

Signed-off-by: Jeff King p...@peff.net
---
 csum-file.c | 6 +++---
 csum-file.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 53f5375..465971c 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -11,7 +11,7 @@
 #include progress.h
 #include csum-file.h
 
-static void flush(struct sha1file *f, void *buf, unsigned int count)
+static void flush(struct sha1file *f, const void *buf, unsigned int count)
 {
if (0 = f-check_fd  count)  {
unsigned char check_buffer[8192];
@@ -86,13 +86,13 @@ int sha1close(struct sha1file *f, unsigned char *result, 
unsigned int flags)
return fd;
 }
 
-int sha1write(struct sha1file *f, void *buf, unsigned int count)
+int sha1write(struct sha1file *f, const void *buf, unsigned int count)
 {
while (count) {
unsigned offset = f-offset;
unsigned left = sizeof(f-buffer) - offset;
unsigned nr = count  left ? left : count;
-   void *data;
+   const void *data;
 
if (f-do_crc)
f-crc32 = crc32(f-crc32, buf, nr);
diff --git a/csum-file.h b/csum-file.h
index 3b540bd..9dedb03 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -34,7 +34,7 @@ extern struct sha1file *sha1fd(int fd, const char *name);
 extern struct sha1file *sha1fd_check(const char *name);
 extern struct sha1file *sha1fd_throughput(int fd, const char *name, struct 
progress *tp);
 extern int sha1close(struct sha1file *, unsigned char *, unsigned int);
-extern int sha1write(struct sha1file *, void *, unsigned int);
+extern int sha1write(struct sha1file *, const void *, unsigned int);
 extern void sha1flush(struct sha1file *f);
 extern void crc32_begin(struct sha1file *);
 extern uint32_t crc32_end(struct sha1file *);
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/21] pack bitmaps

2013-11-14 Thread Jeff King
Here's another iteration of the pack bitmaps series. Compared to v2, it
changes:

 - misc style/typo fixes

 - portability fixes from Ramsay and Torsten

 - count-objects garbage-reporting patch from Duy

 - disable bitmaps when is_repository_shallow(); this also covers the
   case where the client is shallow, since we feed pack-objects a
   --shallow-file in that case. This used to done by checking
   !internal_rev_list, but that doesn't apply after cdab485.

 - ewah sources now properly use git-compat-util.h and do not include
   system headers

 - the ewah code uses ewah_malloc, ewah_realloc, and so forth to let the
   project use a particular allocator (and we want to use xmalloc and
   friends). And we defined those in pack-bitmap.h, but of course that
   had no effect on the ewah/*.c files that did not include
   pack-bitmap.h.  Since we are hacking up and git-ifying libewok
   anyway, we can just set the hardcoded fallback to xmalloc instead of
   malloc.

  - the ewah code used gcc's __builtin_ctzll, but did not provide a
suitable fallback. We now provide a fallback in C.

  - The bitmap reading code only handles a single bitmapped pack (since
they must be fully closed, there is not much point in having
multiple). It used to silently ignore extra bitmap indices it found,
but will now warn that they are being ignored.

  - The name-hash cache is now optional, controlled by
pack.writeBitmapHashCache.

  - The test script will now do basic interoperability testing with jgit
(if you have jgit in your $PATH).

  - There are now perf tests. Spoiler alert: bitmaps make clones faster.
See patch 20 for details. We can also measure the speedup from the
hash cache (see patch 21).

Not addressed:

  - I did not include the NEEDS_ALIGNED_ACCESS patch. I note that we do
not even have a Makefile knob for this, and the code in read-cache.c
has probably never actually been used. Are there real systems that
have a problem? The read-cache code was in support of the index v4
experiment, which did away with the 8-byte padding. So it could be
that we simply don't see it, because everything is currently
aligned.

  - On a related note, we do some cast-buffer-to-struct magic on the
mmap'd file. I note that the regular packfile reader also does this.
How careful do we want to be?

  - We still assume that reusing a slice from the front of the pack will
never miss delta bases. This is the case currently for packs
generated by both git and JGit, but it would be nice to mark the
property in the bitmap index. Adding a new flag would break JGit
compatibility, though. We can either make it an option, or assume
it's good enough for now and worry about it in v2.

  [01/21]: sha1write: make buffer const-correct
  [02/21]: revindex: Export new APIs
  [03/21]: pack-objects: Refactor the packing list
  [04/21]: pack-objects: factor out name_hash
  [05/21]: revision: allow setting custom limiter function
  [06/21]: sha1_file: export `git_open_noatime`
  [07/21]: compat: add endianness helpers
  [08/21]: ewah: compressed bitmap implementation
  [09/21]: documentation: add documentation for the bitmap format
  [10/21]: pack-bitmap: add support for bitmap indexes
  [11/21]: pack-objects: use bitmaps when packing objects
  [12/21]: rev-list: add bitmap mode to speed up object lists
  [13/21]: pack-objects: implement bitmap writing
  [14/21]: repack: stop using magic number for ARRAY_SIZE(exts)
  [15/21]: repack: turn exts array into array-of-struct
  [16/21]: repack: handle optional files created by pack-objects
  [17/21]: repack: consider bitmaps when performing repacks
  [18/21]: count-objects: recognize .bitmap in garbage-checking
  [19/21]: t: add basic bitmap functionality tests
  [20/21]: t/perf: add tests for pack bitmaps
  [21/21]: pack-bitmap: implement optional name_hash cache

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/21] revision: allow setting custom limiter function

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

This commit enables users of `struct rev_info` to peform custom limiting
during a revision walk (i.e. `get_revision`).

If the field `include_check` has been set to a callback, this callback
will be issued once for each commit before it is added to the pending
list of the revwalk. If the include check returns 0, the commit will be
marked as added but won't be pushed to the pending list, effectively
limiting the walk.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 revision.c | 4 
 revision.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/revision.c b/revision.c
index 956040c..260f4a1 100644
--- a/revision.c
+++ b/revision.c
@@ -779,6 +779,10 @@ static int add_parents_to_list(struct rev_info *revs, 
struct commit *commit,
return 0;
commit-object.flags |= ADDED;
 
+   if (revs-include_check 
+   !revs-include_check(commit, revs-include_check_data))
+   return 0;
+
/*
 * If the commit is uninteresting, don't try to
 * prune parents - we want the maximal uninteresting
diff --git a/revision.h b/revision.h
index 89132df..5658ddd 100644
--- a/revision.h
+++ b/revision.h
@@ -169,6 +169,8 @@ struct rev_info {
unsigned long min_age;
int min_parents;
int max_parents;
+   int (*include_check)(struct commit *, void *);
+   void *include_check_data;
 
/* diff info for patches and for paths limiting */
struct diff_options diffopt;
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/21] sha1_file: export `git_open_noatime`

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The `git_open_noatime` helper can be of general interest for other
consumers of git's different on-disk formats.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 cache.h | 1 +
 sha1_file.c | 4 +---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index ce377e1..6d4ef65 100644
--- a/cache.h
+++ b/cache.h
@@ -780,6 +780,7 @@ extern int hash_sha1_file(const void *buf, unsigned long 
len, const char *type,
 extern int write_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *return_sha1);
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
+extern int git_open_noatime(const char *name);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
 extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
diff --git a/sha1_file.c b/sha1_file.c
index 7dadd04..5557bd9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -239,8 +239,6 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
-static int git_open_noatime(const char *name);
-
 /*
  * Prepare alternate object database registry.
  *
@@ -1357,7 +1355,7 @@ int check_sha1_signature(const unsigned char *sha1, void 
*map,
return hashcmp(sha1, real_sha1) ? -1 : 0;
 }
 
-static int git_open_noatime(const char *name)
+int git_open_noatime(const char *name)
 {
static int sha1_file_open_flag = O_NOATIME;
 
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/21] compat: add endianness helpers

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The POSIX standard doesn't currently define a `ntohll`/`htonll`
function pair to perform network-to-host and host-to-network
swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk
storage of EWAH bitmaps if they are not in native byte order.

Many thanks to Ramsay Jones ram...@ramsay1.demon.co.uk and
Torsten Bögershausen tbo...@web.de for cygwin/mingw/msvc
portability fixes.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 compat/bswap.h | 76 +-
 1 file changed, 75 insertions(+), 1 deletion(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index 5061214..c18a78e 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -17,7 +17,20 @@ static inline uint32_t default_swab32(uint32_t val)
((val  0x00ff)  24));
 }
 
+static inline uint64_t default_bswap64(uint64_t val)
+{
+   return (((val  (uint64_t)0x00ffULL)  56) |
+   ((val  (uint64_t)0xff00ULL)  40) |
+   ((val  (uint64_t)0x00ffULL)  24) |
+   ((val  (uint64_t)0xff00ULL)   8) |
+   ((val  (uint64_t)0x00ffULL)   8) |
+   ((val  (uint64_t)0xff00ULL)  24) |
+   ((val  (uint64_t)0x00ffULL)  40) |
+   ((val  (uint64_t)0xff00ULL)  56));
+}
+
 #undef bswap32
+#undef bswap64
 
 #if defined(__GNUC__)  (defined(__i386__) || defined(__x86_64__))
 
@@ -32,15 +45,42 @@ static inline uint32_t git_bswap32(uint32_t x)
return result;
 }
 
+#define bswap64 git_bswap64
+#if defined(__x86_64__)
+static inline uint64_t git_bswap64(uint64_t x)
+{
+   uint64_t result;
+   if (__builtin_constant_p(x))
+   result = default_bswap64(x);
+   else
+   __asm__(bswap %q0 : =r (result) : 0 (x));
+   return result;
+}
+#else
+static inline uint64_t git_bswap64(uint64_t x)
+{
+   union { uint64_t i64; uint32_t i32[2]; } tmp, result;
+   if (__builtin_constant_p(x))
+   result.i64 = default_bswap64(x);
+   else {
+   tmp.i64 = x;
+   result.i32[0] = git_bswap32(tmp.i32[1]);
+   result.i32[1] = git_bswap32(tmp.i32[0]);
+   }
+   return result.i64;
+}
+#endif
+
 #elif defined(_MSC_VER)  (defined(_M_IX86) || defined(_M_X64))
 
 #include stdlib.h
 
 #define bswap32(x) _byteswap_ulong(x)
+#define bswap64(x) _byteswap_uint64(x)
 
 #endif
 
-#ifdef bswap32
+#if defined(bswap32)
 
 #undef ntohl
 #undef htonl
@@ -48,3 +88,37 @@ static inline uint32_t git_bswap32(uint32_t x)
 #define htonl(x) bswap32(x)
 
 #endif
+
+#if defined(bswap64)
+
+#undef ntohll
+#undef htonll
+#define ntohll(x) bswap64(x)
+#define htonll(x) bswap64(x)
+
+#else
+
+#undef ntohll
+#undef htonll
+
+#if !defined(__BYTE_ORDER)
+# if defined(BYTE_ORDER)  defined(LITTLE_ENDIAN)  defined(BIG_ENDIAN)
+#  define __BYTE_ORDER BYTE_ORDER
+#  define __LITTLE_ENDIAN LITTLE_ENDIAN
+#  define __BIG_ENDIAN BIG_ENDIAN
+# endif
+#endif
+
+#if !defined(__BYTE_ORDER)
+# error Cannot determine endianness
+#endif
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+# define ntohll(n) (n)
+# define htonll(n) (n)
+#else
+# define ntohll(n) default_bswap64(n)
+# define htonll(n) default_bswap64(n)
+#endif
+
+#endif
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 04/21] pack-objects: factor out name_hash

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

As the pack-objects system grows beyond the single
pack-objects.c file, more parts (like the soon-to-exist
bitmap code) will need to compute hashes for matching
deltas. Factor out name_hash to make it available to other
files.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 builtin/pack-objects.c | 24 ++--
 pack-objects.h | 20 
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f3f0cf9..faf746b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -778,26 +778,6 @@ static void write_pack_file(void)
written, nr_result);
 }
 
-static uint32_t name_hash(const char *name)
-{
-   uint32_t c, hash = 0;
-
-   if (!name)
-   return 0;
-
-   /*
-* This effectively just creates a sortable number from the
-* last sixteen non-whitespace characters. Last characters
-* count most, so things that end in .c sort together.
-*/
-   while ((c = *name++) != 0) {
-   if (isspace(c))
-   continue;
-   hash = (hash  2) + (c  24);
-   }
-   return hash;
-}
-
 static void setup_delta_attr_check(struct git_attr_check *check)
 {
static struct git_attr *attr_delta;
@@ -826,7 +806,7 @@ static int add_object_entry(const unsigned char *sha1, enum 
object_type type,
struct object_entry *entry;
struct packed_git *p, *found_pack = NULL;
off_t found_offset = 0;
-   uint32_t hash = name_hash(name);
+   uint32_t hash = pack_name_hash(name);
uint32_t index_pos;
 
entry = packlist_find(to_pack, sha1, index_pos);
@@ -1082,7 +1062,7 @@ static void add_preferred_base_object(const char *name)
 {
struct pbase_tree *it;
int cmplen;
-   unsigned hash = name_hash(name);
+   unsigned hash = pack_name_hash(name);
 
if (!num_preferred_base || check_pbase_path(hash))
return;
diff --git a/pack-objects.h b/pack-objects.h
index f528215..90ad0a8 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -44,4 +44,24 @@ struct object_entry *packlist_find(struct packing_data 
*pdata,
   const unsigned char *sha1,
   uint32_t *index_pos);
 
+static inline uint32_t pack_name_hash(const char *name)
+{
+   uint32_t c, hash = 0;
+
+   if (!name)
+   return 0;
+
+   /*
+* This effectively just creates a sortable number from the
+* last sixteen non-whitespace characters. Last characters
+* count most, so things that end in .c sort together.
+*/
+   while ((c = *name++) != 0) {
+   if (isspace(c))
+   continue;
+   hash = (hash  2) + (c  24);
+   }
+   return hash;
+}
+
 #endif
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/21] pack-objects: Refactor the packing list

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The hash table that stores the packing list for a given `pack-objects`
run was tightly coupled to the pack-objects code.

In this commit, we refactor the hash table and the underlying storage
array into a `packing_data` struct. The functionality for accessing and
adding entries to the packing list is hence accessible from other parts
of Git besides the `pack-objects` builtin.

This refactoring is a requirement for further patches in this series
that will require accessing the commit packing list from outside of
`pack-objects`.

The hash table implementation has been minimally altered: we now
use table sizes which are always a power of two, to ensure a uniform
index distribution in the array.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile   |   2 +
 builtin/pack-objects.c | 175 +++--
 pack-objects.c | 111 +++
 pack-objects.h |  47 +
 4 files changed, 200 insertions(+), 135 deletions(-)
 create mode 100644 pack-objects.c
 create mode 100644 pack-objects.h

diff --git a/Makefile b/Makefile
index af847f8..48ff0bd 100644
--- a/Makefile
+++ b/Makefile
@@ -694,6 +694,7 @@ LIB_H += notes-merge.h
 LIB_H += notes-utils.h
 LIB_H += notes.h
 LIB_H += object.h
+LIB_H += pack-objects.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
 LIB_H += parse-options.h
@@ -831,6 +832,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
+LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += pager.o
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 36273dd..f3f0cf9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -14,6 +14,7 @@
 #include diff.h
 #include revision.h
 #include list-objects.h
+#include pack-objects.h
 #include progress.h
 #include refs.h
 #include streaming.h
@@ -25,42 +26,15 @@ static const char *pack_usage[] = {
NULL
 };
 
-struct object_entry {
-   struct pack_idx_entry idx;
-   unsigned long size; /* uncompressed size */
-   struct packed_git *in_pack; /* already in pack */
-   off_t in_pack_offset;
-   struct object_entry *delta; /* delta base object */
-   struct object_entry *delta_child; /* deltified objects who bases me */
-   struct object_entry *delta_sibling; /* other deltified objects who
-* uses the same base as me
-*/
-   void *delta_data;   /* cached delta (uncompressed) */
-   unsigned long delta_size;   /* delta data size (uncompressed) */
-   unsigned long z_delta_size; /* delta data size (compressed) */
-   enum object_type type;
-   enum object_type in_pack_type;  /* could be delta */
-   uint32_t hash;  /* name hint hash */
-   unsigned char in_pack_header_size;
-   unsigned preferred_base:1; /*
-   * we do not pack this, but is available
-   * to be used as the base object to delta
-   * objects against.
-   */
-   unsigned no_try_delta:1;
-   unsigned tagged:1; /* near the very tip of refs */
-   unsigned filled:1; /* assigned write-order */
-};
-
 /*
- * Objects we are going to pack are collected in objects array (dynamically
- * expanded).  nr_objects  nr_alloc controls this array.  They are stored
- * in the order we see -- typically rev-list --objects order that gives us
- * nice minimum seek order.
+ * Objects we are going to pack are collected in the `to_pack` structure.
+ * It contains an array (dynamically expanded) of the object data, and a map
+ * that can resolve SHA1s to their position in the array.
  */
-static struct object_entry *objects;
+static struct packing_data to_pack;
+
 static struct pack_idx_entry **written_list;
-static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
+static uint32_t nr_result, nr_written;
 
 static int non_empty;
 static int reuse_delta = 1, reuse_object = 1;
@@ -90,21 +64,11 @@ static unsigned long cache_max_small_delta_size = 1000;
 static unsigned long window_memory_limit = 0;
 
 /*
- * The object names in objects array are hashed with this hashtable,
- * to help looking up the entry by object name.
- * This hashtable is built after all the objects are seen.
- */
-static int *object_ix;
-static int object_ix_hashsz;
-static struct object_entry *locate_object_entry(const unsigned char *sha1);
-
-/*
  * stats
  */
 static uint32_t written, written_delta;
 static uint32_t reused, reused_delta;
 
-
 static void *get_delta(struct object_entry *entry)
 {
unsigned long size, base_size, delta_size;
@@ -553,12 +517,12 @@ static int mark_tagged(const char *path, const unsigned 
char *sha1, int 

[PATCH v3 02/21] revindex: Export new APIs

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

Allow users to efficiently lookup consecutive entries that are expected
to be found on the same revindex by exporting `find_revindex_position`:
this function takes a pointer to revindex itself, instead of looking up
the proper revindex for a given packfile on each call.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 pack-revindex.c | 38 +-
 pack-revindex.h |  8 
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index b4d2b35..0bb13b1 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -16,11 +16,6 @@
  * get the object sha1 from the main index.
  */
 
-struct pack_revindex {
-   struct packed_git *p;
-   struct revindex_entry *revindex;
-};
-
 static struct pack_revindex *pack_revindex;
 static int pack_revindex_hashsz;
 
@@ -201,15 +196,14 @@ static void create_pack_revindex(struct pack_revindex 
*rix)
sort_revindex(rix-revindex, num_ent, p-pack_size);
 }
 
-struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
+struct pack_revindex *revindex_for_pack(struct packed_git *p)
 {
int num;
-   unsigned lo, hi;
struct pack_revindex *rix;
-   struct revindex_entry *revindex;
 
if (!pack_revindex_hashsz)
init_pack_revindex();
+
num = pack_revindex_ix(p);
if (num  0)
die(internal error: pack revindex fubar);
@@ -217,21 +211,39 @@ struct revindex_entry *find_pack_revindex(struct 
packed_git *p, off_t ofs)
rix = pack_revindex[num];
if (!rix-revindex)
create_pack_revindex(rix);
-   revindex = rix-revindex;
 
-   lo = 0;
-   hi = p-num_objects + 1;
+   return rix;
+}
+
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs)
+{
+   int lo = 0;
+   int hi = pridx-p-num_objects + 1;
+   struct revindex_entry *revindex = pridx-revindex;
+
do {
unsigned mi = lo + (hi - lo) / 2;
if (revindex[mi].offset == ofs) {
-   return revindex + mi;
+   return mi;
} else if (ofs  revindex[mi].offset)
hi = mi;
else
lo = mi + 1;
} while (lo  hi);
+
error(bad offset for revindex);
-   return NULL;
+   return -1;
+}
+
+struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
+{
+   struct pack_revindex *pridx = revindex_for_pack(p);
+   int pos = find_revindex_position(pridx, ofs);
+
+   if (pos  0)
+   return NULL;
+
+   return pridx-revindex + pos;
 }
 
 void discard_revindex(void)
diff --git a/pack-revindex.h b/pack-revindex.h
index 8d5027a..866ca9c 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -6,6 +6,14 @@ struct revindex_entry {
unsigned int nr;
 };
 
+struct pack_revindex {
+   struct packed_git *p;
+   struct revindex_entry *revindex;
+};
+
+struct pack_revindex *revindex_for_pack(struct packed_git *p);
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs);
+
 struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs);
 void discard_revindex(void);
 
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/21] documentation: add documentation for the bitmap format

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/bitmap-format.txt | 131 ++
 1 file changed, 131 insertions(+)
 create mode 100644 Documentation/technical/bitmap-format.txt

diff --git a/Documentation/technical/bitmap-format.txt 
b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 000..7a86bd7
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,131 @@
+GIT bitmap v1 format
+
+
+   - A header appears at the beginning:
+
+   4-byte signature: {'B', 'I', 'T', 'M'}
+
+   2-byte version number (network byte order)
+   The current implementation only supports version 1
+   of the bitmap index (the same one as JGit).
+
+   2-byte flags (network byte order)
+
+   The following flags are supported:
+
+   - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+   This flag must always be present. It implies that the 
bitmap
+   index has been generated for a packfile with full 
closure
+   (i.e. where every single object in the packfile can find
+its parent links inside the same packfile). This is a
+   requirement for the bitmap index format, also present 
in JGit,
+   that greatly reduces the complexity of the 
implementation.
+
+   4-byte entry count (network byte order)
+
+   The total count of entries (bitmapped commits) in this 
bitmap index.
+
+   20-byte checksum
+
+   The SHA1 checksum of the pack this bitmap index belongs 
to.
+
+   - 4 EWAH bitmaps that act as type indexes
+
+   Type indexes are serialized after the hash cache in the shape
+   of four EWAH bitmaps stored consecutively (see Appendix A for
+   the serialization format of an EWAH bitmap).
+
+   There is a bitmap for each Git object type, stored in the 
following
+   order:
+
+   - Commits
+   - Trees
+   - Blobs
+   - Tags
+
+   In each bitmap, the `n`th bit is set to true if the `n`th object
+   in the packfile is of that type.
+
+   The obvious consequence is that the OR of all 4 bitmaps will 
result
+   in a full set (all bits set), and the AND of all 4 bitmaps will
+   result in an empty bitmap (no bits set).
+
+   - N entries with compressed bitmaps, one for each indexed commit
+
+   Where `N` is the total amount of entries in this bitmap index.
+   Each entry contains the following:
+
+   - 4-byte object position (network byte order)
+   The position **in the index for the packfile** where the
+   bitmap for this commit is found.
+
+   - 1-byte XOR-offset
+   The xor offset used to compress this bitmap. For an 
entry
+   in position `x`, a XOR offset of `y` means that the 
actual
+   bitmap representing this commit is composed by XORing 
the
+   bitmap for this entry with the bitmap in entry `x-y` 
(i.e.
+   the bitmap `y` entries before this one).
+
+   Note that this compression can be recursive. In order to
+   XOR this entry with a previous one, the previous entry 
needs
+   to be decompressed first, and so on.
+
+   The hard-limit for this offset is 160 (an entry can 
only be
+   xor'ed against one of the 160 entries preceding it). 
This
+   number is always positive, and hence entries are always 
xor'ed
+   with **previous** bitmaps, not bitmaps that will come 
afterwards
+   in the index.
+
+   - 1-byte flags for this bitmap
+   At the moment the only available flag is `0x1`, which 
hints
+   that this bitmap can be re-used when rebuilding bitmap 
indexes
+   for the repository.
+
+   - The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the same protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+   - 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+   - 4-byte number of words of the COMPRESSED bitmap, when stored
+
+   - N x 8-byte words, as specified by the previous field
+
+  

[PATCH v3 10/21] pack-bitmap: add support for bitmap indexes

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.

For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.

By using the bitmaps available in the index, this commit implements
several new functions:

- `prepare_bitmap_git`
- `prepare_bitmap_walk`
- `traverse_bitmap_commit_list`
- `reuse_partial_packfile_from_bitmap`

The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:

- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.

- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.

- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:

a) limit the subsequent walk when building the WANTs bitmap
b) finding the final set of interesting commits by performing an
   AND-NOT of the WANTs and the HAVEs.

If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.

As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.

If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.

Hence, this new set of functions are a generic API that allows to
perform the equivalent of

git rev-list --objects [roots...] [^uninteresting...]

for any set of commits, even if they don't have specific bitmaps
generated for them.

In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile  |   2 +
 khash.h   | 338 
 pack-bitmap.c | 970 ++
 pack-bitmap.h |  43 +++
 4 files changed, 1353 insertions(+)
 create mode 100644 khash.h
 create mode 100644 pack-bitmap.c
 create mode 100644 pack-bitmap.h

diff --git a/Makefile b/Makefile
index 64a1ed7..b983d78 100644
--- a/Makefile
+++ b/Makefile
@@ -699,6 +699,7 @@ LIB_H += object.h
 LIB_H += pack-objects.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
+LIB_H += pack-bitmap.h
 LIB_H += parse-options.h
 LIB_H += patch-ids.h
 LIB_H += pathspec.h
@@ -837,6 +838,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
diff --git a/khash.h b/khash.h
new file mode 100644
index 000..57ff603
--- /dev/null
+++ b/khash.h
@@ -0,0 +1,338 @@
+/* The MIT License
+
+   Copyright (c) 2008, 2009, 2011 by Attractive Chaos attrac...@live.co.uk
+
+   Permission is hereby granted, free of charge, to any person obtaining
+   a copy of this software and associated documentation files (the
+   Software), to deal in the Software without restriction, including
+   without limitation the rights to use, copy, modify, merge, publish,
+   distribute, sublicense, and/or sell copies of the Software, and to
+   permit persons to whom the Software is 

[PATCH v3 08/21] ewah: compressed bitmap implementation

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).

It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.

The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.

EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.

This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.

The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:

https://github.com/vmg/libewok

The original Java implementation can also be found on GitHub:

https://github.com/lemire/javaewah

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile   |  11 +-
 ewah/bitmap.c  | 221 
 ewah/ewah_bitmap.c | 726 +
 ewah/ewah_io.c | 193 ++
 ewah/ewah_rlw.c| 115 +
 ewah/ewok.h| 235 +
 ewah/ewok_rlw.h| 114 +
 7 files changed, 1613 insertions(+), 2 deletions(-)
 create mode 100644 ewah/bitmap.c
 create mode 100644 ewah/ewah_bitmap.c
 create mode 100644 ewah/ewah_io.c
 create mode 100644 ewah/ewah_rlw.c
 create mode 100644 ewah/ewok.h
 create mode 100644 ewah/ewok_rlw.h

diff --git a/Makefile b/Makefile
index 48ff0bd..64a1ed7 100644
--- a/Makefile
+++ b/Makefile
@@ -667,6 +667,8 @@ LIB_H += diff.h
 LIB_H += diffcore.h
 LIB_H += dir.h
 LIB_H += exec_cmd.h
+LIB_H += ewah/ewok.h
+LIB_H += ewah/ewok_rlw.h
 LIB_H += fetch-pack.h
 LIB_H += fmt-merge-msg.h
 LIB_H += fsck.h
@@ -800,6 +802,10 @@ LIB_OBJS += dir.o
 LIB_OBJS += editor.o
 LIB_OBJS += entry.o
 LIB_OBJS += environment.o
+LIB_OBJS += ewah/bitmap.o
+LIB_OBJS += ewah/ewah_bitmap.o
+LIB_OBJS += ewah/ewah_io.o
+LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
@@ -2474,8 +2480,9 @@ profile-clean:
$(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
 
 clean: profile-clean coverage-clean
-   $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o 
xdiff/*.o vcs-svn/*.o \
-   builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+   $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o
+   $(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o
+   $(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
$(RM) $(TEST_PROGRAMS) $(NO_INSTALL)
$(RM) -r bin-wrappers $(dep_dirs)
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
new file mode 100644
index 000..710e58c
--- /dev/null
+++ b/ewah/bitmap.c
@@ -0,0 +1,221 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, 
USA.
+ */
+#include git-compat-util.h
+#include ewok.h
+
+#define MASK(x) ((eword_t)1  (x % BITS_IN_WORD))
+#define BLOCK(x) (x / BITS_IN_WORD)
+
+struct bitmap *bitmap_new(void)
+{
+   struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap));
+   bitmap-words = ewah_calloc(32, sizeof(eword_t));
+   bitmap-word_alloc = 32;
+   return bitmap;
+}
+
+void bitmap_set(struct bitmap *self, size_t pos)
+{
+   size_t block = BLOCK(pos);
+
+   if (block = self-word_alloc) {
+   size_t old_size = self-word_alloc;
+   self-word_alloc = block * 2;
+   self-words = ewah_realloc(self-words,
+   self-word_alloc * sizeof(eword_t));
+
+   memset(self-words + old_size, 0x0,
+   

[PATCH v3 12/21] rev-list: add bitmap mode to speed up object lists

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

The bitmap reachability index used to speed up the counting objects
phase during `pack-objects` can also be used to optimize a normal
rev-list if the only thing required are the SHA1s of the objects during
the list (i.e., not the path names at which trees and blobs were found).

Calling `git rev-list --objects --use-bitmap-index [committish]` will
perform an object iteration based on a bitmap result instead of actually
walking the object graph.

These are some example timings for `torvalds/linux` (warm cache,
best-of-five):

$ time git rev-list --objects master  /dev/null

real0m34.191s
user0m33.904s
sys 0m0.268s

$ time git rev-list --objects --use-bitmap-index master  /dev/null

real0m1.041s
user0m0.976s
sys 0m0.064s

Likewise, using `git rev-list --count --use-bitmap-index` will speed up
the counting operation by building the resulting bitmap and performing a
fast popcount (number of bits set on the bitmap) on the result.

Here are some sample timings of different ways to count commits in
`torvalds/linux`:

$ time git rev-list master | wc -l
399882

real0m6.524s
user0m6.060s
sys 0m3.284s

$ time git rev-list --count master
399882

real0m4.318s
user0m4.236s
sys 0m0.076s

$ time git rev-list --use-bitmap-index --count master
399882

real0m0.217s
user0m0.176s
sys 0m0.040s

This also respects negative refs, so you can use it to count
a slice of history:

$ time git rev-list --count v3.0..master
144843

real0m1.971s
user0m1.932s
sys 0m0.036s

$ time git rev-list --use-bitmap-index --count v3.0..master
real0m0.280s
user0m0.220s
sys 0m0.056s

Though note that the closer the endpoints, the less it helps. In the
traversal case, we have fewer commits to cross, so we take less time.
But the bitmap time is dominated by generating the pack revindex, which
is constant with respect to the refs given.

Note that you cannot yet get a fast --left-right count of a symmetric
difference (e.g., --count --left-right master...topic). The slow part
of that walk actually happens during the merge-base determination when
we parse master...topic. Even though a count does not actually need to
know the real merge base (it only needs to take the symmetric difference
of the bitmaps), the revision code would require some refactoring to
handle this case.

Additionally, a `--test-bitmap` flag has been added that will perform
the same rev-list manually (i.e. using a normal revwalk) and using
bitmaps, and verify that the results are the same. This can be used to
exercise the bitmap code, and also to verify that the contents of the
.bitmap file are sane.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-rev-list.txt |  1 +
 Documentation/rev-list-options.txt |  8 
 builtin/rev-list.c | 39 ++
 3 files changed, 48 insertions(+)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index 045b37b..7a1585d 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -55,6 +55,7 @@ SYNOPSIS
 [ \--reverse ]
 [ \--walk-reflogs ]
 [ \--no-walk ] [ \--do-walk ]
+[ \--use-bitmap-index ]
 commit... [ \-- paths... ]
 
 DESCRIPTION
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index ec86d09..f7c8a4d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -274,6 +274,14 @@ See also linkgit:git-reflog[1].
Output excluded boundary commits. Boundary commits are
prefixed with `-`.
 
+ifdef::git-rev-list[]
+--use-bitmap-index::
+
+   Try to speed up the traversal using the pack bitmap index (if
+   one is available). Note that when traversing with `--objects`,
+   trees and blobs will not have their associated path printed.
+endif::git-rev-list[]
+
 --
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 0745e2d..9f92905 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -3,6 +3,8 @@
 #include diff.h
 #include revision.h
 #include list-objects.h
+#include pack.h
+#include pack-bitmap.h
 #include builtin.h
 #include log-tree.h
 #include graph.h
@@ -257,6 +259,18 @@ static int show_bisect_vars(struct rev_list_info *info, 
int reaches, int all)
return 0;
 }
 
+static int show_object_fast(
+   const unsigned char *sha1,
+   enum object_type type,
+   int exclude,
+   uint32_t name_hash,
+   struct packed_git *found_pack,
+   off_t found_offset)
+{
+   fprintf(stdout, %s\n, sha1_to_hex(sha1));
+   return 1;
+}
+
 int 

[PATCH v3 11/21] pack-objects: use bitmaps when packing objects

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).

For bitmaps to be used, the following must be true:

  1. We must be packing to stdout (as a normal `pack-objects` from
 `upload-pack` would do).

  2. There must be a .bitmap index containing at least one of the
 have objects that the client is asking for.

  3. Bitmaps must be enabled (they are enabled by default, but can be
 disabled by setting `pack.usebitmaps` to false, or by using
 `--no-use-bitmap-index` on the command-line).

If any of these is not true, we fall back to doing a normal walk of the
object graph.

Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:

[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
/dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m44.111s
user0m42.396s
sys 0m3.544s

[bitmaps only, without partial pack reuse; note that
 pack reuse is automatic, so timing this required a
 patch to disable it]
$ time git pack-objects --all --stdout /dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m5.413s
user0m5.604s
sys 0m1.804s

[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout /dev/null /dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)

real0m1.636s
user0m1.460s
sys 0m0.172s

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/config.txt |   6 ++
 builtin/pack-objects.c   | 169 ---
 2 files changed, 150 insertions(+), 25 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab26963..a981369 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1858,6 +1858,12 @@ pack.packSizeLimit::
Common unit suffixes of 'k', 'm', or 'g' are
supported.
 
+pack.useBitmaps::
+   When true, git will use pack bitmaps (if available) when packing
+   to stdout (e.g., during the server side of a fetch). Defaults to
+   true. You should not generally need to turn this off unless
+   you are debugging pack bitmaps.
+
 pager.cmd::
If the value is boolean, turns on or off pagination of the
output of a particular Git subcommand when writing to a tty.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index faf746b..f04cce5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -19,6 +19,7 @@
 #include refs.h
 #include streaming.h
 #include thread-utils.h
+#include pack-bitmap.h
 
 static const char *pack_usage[] = {
N_(git pack-objects --stdout [options...] [ ref-list |  
object-list]),
@@ -57,12 +58,23 @@ static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
 
+static struct packed_git *reuse_packfile;
+static uint32_t reuse_packfile_objects;
+static off_t reuse_packfile_offset;
+
+static int use_bitmap_index = 1;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+enum {
+   OBJECT_ENTRY_EXCLUDE = (1  0),
+   OBJECT_ENTRY_NO_TRY_DELTA = (1  1)
+};
+
 /*
  * stats
  */
@@ -678,6 +690,49 @@ static struct object_entry **compute_write_order(void)
return wo;
 }
 
+static off_t write_reused_pack(struct sha1file *f)
+{
+   uint8_t buffer[8192];
+   off_t to_write;
+   int fd;
+
+   if (!is_pack_valid(reuse_packfile))
+   return 0;
+
+   fd = git_open_noatime(reuse_packfile-pack_name);
+   if (fd  0)
+   return 0;
+
+   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) {
+   close(fd);
+   return 0;
+   }
+
+   if (reuse_packfile_offset  0)
+   reuse_packfile_offset = reuse_packfile-pack_size - 20;
+
+   to_write = reuse_packfile_offset - sizeof(struct pack_header);
+
+   while 

[PATCH v3 15/21] repack: turn exts array into array-of-struct

2013-11-14 Thread Jeff King
This is slightly more verbose, but will let us annotate the
extensions with further options in future commits.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 2e88975..a176de2 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -115,7 +115,12 @@ static void remove_redundant_pack(const char *dir_name, 
const char *base_name)
 
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
-   const char *exts[] = {.pack, .idx};
+   struct {
+   const char *name;
+   } exts[] = {
+   {.pack},
+   {.idx},
+   };
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@@ -261,14 +266,14 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup(%s/%s%s, packdir,
-   item-string, exts[ext]);
+   item-string, exts[ext].name);
if (!file_exists(fname)) {
free(fname);
continue;
}
 
fname_old = mkpath(%s/old-%s%s, packdir,
-   item-string, exts[ext]);
+   item-string, exts[ext].name);
if (file_exists(fname_old))
if (unlink(fname_old))
failed = 1;
@@ -319,9 +324,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
char *fname, *fname_old;
struct stat statbuffer;
fname = mkpathdup(%s/pack-%s%s,
-   packdir, item-string, exts[ext]);
+   packdir, item-string, exts[ext].name);
fname_old = mkpathdup(%s-%s%s,
-   packtmp, item-string, exts[ext]);
+   packtmp, item-string, exts[ext].name);
if (!stat(fname_old, statbuffer)) {
statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | 
S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
@@ -340,7 +345,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
fname = mkpath(%s/old-pack-%s%s,
packdir,
item-string,
-   exts[ext]);
+   exts[ext].name);
if (remove_path(fname))
warning(_(removing '%s' failed), fname);
}
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/21] repack: stop using magic number for ARRAY_SIZE(exts)

2013-11-14 Thread Jeff King
We have a static array of extensions, but hardcode the size
of the array in our loops. Let's pull out this magic number,
which will make it easier to change.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a0ff5c7..2e88975 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -115,7 +115,7 @@ static void remove_redundant_pack(const char *dir_name, 
const char *base_name)
 
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
-   const char *exts[2] = {.pack, .idx};
+   const char *exts[] = {.pack, .idx};
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@@ -258,7 +258,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 */
failed = 0;
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup(%s/%s%s, packdir,
item-string, exts[ext]);
@@ -315,7 +315,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
/* Now the ones with the same name are out of the way... */
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
fname = mkpathdup(%s/pack-%s%s,
@@ -335,7 +335,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
/* Remove the old- files */
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname;
fname = mkpath(%s/old-pack-%s%s,
packdir,
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 16/21] repack: handle optional files created by pack-objects

2013-11-14 Thread Jeff King
We ask pack-objects to pack to a set of temporary files, and
then rename them into place. Some files that pack-objects
creates may be optional (like a .bitmap file), in which case
we would not want to call rename(). We already call stat()
and make the chmod optional if the file cannot be accessed.
We could simply skip the rename step in this case, but that
would be a minor regression in noticing problems with
non-optional files (like the .pack and .idx files).

Instead, we can now annotate extensions as optional, and
skip them if they don't exist (and otherwise rely on
rename() to barf).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a176de2..8b7dfd0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -117,6 +117,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 {
struct {
const char *name;
+   unsigned optional:1;
} exts[] = {
{.pack},
{.idx},
@@ -323,6 +324,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
+   int exists = 0;
fname = mkpathdup(%s/pack-%s%s,
packdir, item-string, exts[ext].name);
fname_old = mkpathdup(%s-%s%s,
@@ -330,9 +332,12 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
if (!stat(fname_old, statbuffer)) {
statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | 
S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
+   exists = 1;
+   }
+   if (exists || !exts[ext].optional) {
+   if (rename(fname_old, fname))
+   die_errno(_(renaming '%s' failed), 
fname_old);
}
-   if (rename(fname_old, fname))
-   die_errno(_(renaming '%s' failed), fname_old);
free(fname);
free(fname_old);
}
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 18/21] count-objects: recognize .bitmap in garbage-checking

2013-11-14 Thread Jeff King
From: Nguyễn Thái Ngọc Duy pclo...@gmail.com

Count-objects will report any garbage files in the packs
directory, including files whose extensions it does not
know (case 1), and files whose matching .pack file is
missing (case 2).  Without having learned about .bitmap
files, the current code reports all such files as garbage
(case 1), even if their pack exists. Instead, they should be
treated as case 2.

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sha1_file.c b/sha1_file.c
index 5557bd9..d810b58 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1194,6 +1194,7 @@ static void prepare_packed_git_one(char *objdir, int 
local)
 
if (has_extension(de-d_name, .idx) ||
has_extension(de-d_name, .pack) ||
+   has_extension(de-d_name, .bitmap) ||
has_extension(de-d_name, .keep))
string_list_append(garbage, path);
else
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 17/21] repack: consider bitmaps when performing repacks

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.

This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.

Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.

Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-repack.txt | 9 -
 builtin/repack.c | 9 -
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 509cf73..002cfd5 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository
 SYNOPSIS
 
 [verse]
-'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [--window=n] 
[--depth=n]
+'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=n] 
[--depth=n]
 
 DESCRIPTION
 ---
@@ -110,6 +110,13 @@ other objects in that pack they already have locally.
The default is unlimited, unless the config variable
`pack.packSizeLimit` is set.
 
+-b::
+--write-bitmap-index::
+   Write a reachability bitmap index as part of the repack. This
+   only makes sense when used with `-a` or `-A`, as the bitmaps
+   must be able to refer to all reachable objects. This option
+   overrides the setting of `pack.writebitmaps`.
+
 
 Configuration
 -
diff --git a/builtin/repack.c b/builtin/repack.c
index 8b7dfd0..239f278 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -94,7 +94,7 @@ static void get_non_kept_pack_filenames(struct string_list 
*fname_list)
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
-   const char *exts[] = {.pack, .idx, .keep};
+   const char *exts[] = {.pack, .idx, .keep, .bitmap};
int i;
struct strbuf buf = STRBUF_INIT;
size_t plen;
@@ -121,6 +121,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
} exts[] = {
{.pack},
{.idx},
+   {.bitmap, 1},
};
struct child_process cmd;
struct string_list_item *item;
@@ -143,6 +144,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int write_bitmap = -1;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, pack_everything,
@@ -161,6 +163,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
OPT__QUIET(quiet, N_(be quiet)),
OPT_BOOL('l', local, local,
N_(pass --local to git-pack-objects)),
+   OPT_BOOL('b', write-bitmap-index, write_bitmap,
+   N_(write bitmap index)),
OPT_STRING(0, unpack-unreachable, unpack_unreachable, 
N_(approxidate),
N_(with -A, do not loosen objects older than 
this)),
OPT_INTEGER(0, window, window,
@@ -202,6 +206,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argv_array_pushf(cmd_args, --no-reuse-delta);
if (no_reuse_object)
argv_array_pushf(cmd_args, --no-reuse-object);
+   if (write_bitmap = 0)
+   argv_array_pushf(cmd_args, --%swrite-bitmap-index,
+write_bitmap ?  : no-);
 
if (pack_everything  ALL_INTO_ONE) {
get_non_kept_pack_filenames(existing_packs);
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 13/21] pack-objects: implement bitmap writing

2013-11-14 Thread Jeff King
From: Vicent Marti tan...@gmail.com

This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.

If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.

Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:

1. `bitmap_writer_set_checksum`: this call stores the partial
   checksum for the packfile being written; the checksum will be
   written in the resulting bitmap index to verify its integrity

2. `bitmap_writer_build_type_index`: this call uses the array of
   `struct object_entry` that has just been sorted when writing out
   the actual packfile index to disk to generate 4 type-index bitmaps
   (one for each object type).

   These bitmaps have their nth bit set if the given object is of
   the bitmap's type. E.g. the nth bit of the Commits bitmap will be
   1 if the nth object in the packfile index is a commit.

   This is a very cheap operation because the bitmap writing code has
   access to the metadata stored in the `struct object_entry` array,
   and hence the real type for each object in the packfile.

3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
   index for one of the packfiles we're trying to repack, this call
   will efficiently rebuild the existing bitmaps so they can be
   reused on the new index. All the existing bitmaps will be stored
   in a `reuse` hash table, and the commit selection phase will
   prioritize these when selecting, as they can be written directly
   to the new index without having to perform a revision walk to
   fill the bitmap. This can greatly speed up the repack of a
   repository that already has bitmaps.

4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
   a given `pack-objects` run, the sequence of commits generated
   during the Counting Objects phase will be stored in an array.

   We then use that array to build up the list of selected commits.
   Writing a bitmap in the index for each object in the repository
   would be cost-prohibitive, so we use a simple heuristic to pick
   the commits that will be indexed with bitmaps.

   The current heuristics are a simplified version of JGit's
   original implementation. We select a higher density of commits
   depending on their age: the 100 most recent commits are always
   selected, after that we pick 1 commit of each 100, and the gap
   increases as the commits grow older. On top of that, we make sure
   that every single branch that has not been merged (all the tips
   that would be required from a clone) gets their own bitmap, and
   when selecting commits between a gap, we tend to prioritize the
   commit with the most parents.

   Do note that there is no right/wrong way to perform commit
   selection; different selection algorithms will result in
   different commits being selected, but there's no such thing as
   missing a commit. The bitmap walker algorithm implemented in
   `prepare_bitmap_walk` is able to adapt to missing bitmaps by
   performing manual walks that complete the bitmap: the ideal
   selection algorithm, however, would select the commits that are
   more likely to be used as roots for a walk in the future (e.g.
   the tips of each branch, and so on) to ensure a bitmap for them
   is always available.

5. `bitmap_writer_build`: this is the computationally expensive part
   of bitmap generation. Based on the list of commits that were
   selected in the previous step, we perform several incremental
   walks to generate the bitmap for each commit.

   The walks begin from the oldest commit, and are built up
   incrementally for each branch. E.g. consider this dag where A, B,
   C, D, E, F are the selected commits, and a, b, c, e are a chunk
   of simplified history that will not receive bitmaps.

A---a---B--b--C--c--D
 \
  E--e--F

   We start by building the bitmap for A, using A as the root for a
   revision walk and marking all the objects that are reachable
   until the walk is over. Once this bitmap is stored, we reuse the
   bitmap walker to perform the walk for B, assuming that once we
   reach A again, the walk will be terminated because A has already
   been SEEN on the previous walk.

   This process 

[PATCH v3 19/21] t: add basic bitmap functionality tests

2013-11-14 Thread Jeff King
Now that we can read and write bitmaps, we can exercise them
with some basic functionality tests. These tests aren't
particularly useful for seeing the benefit, as the test
repo is too small for it to make a difference. However, we
can at least check that using bitmaps does not break anything.

Signed-off-by: Jeff King p...@peff.net
---
 t/t5310-pack-bitmaps.sh | 138 
 1 file changed, 138 insertions(+)
 create mode 100755 t/t5310-pack-bitmaps.sh

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
new file mode 100755
index 000..bd743f4
--- /dev/null
+++ b/t/t5310-pack-bitmaps.sh
@@ -0,0 +1,138 @@
+#!/bin/sh
+
+test_description='exercise basic bitmap functionality'
+. ./test-lib.sh
+
+test_expect_success 'setup repo with moderate-sized history' '
+   for i in $(test_seq 1 10); do
+   test_commit $i
+   done 
+   git checkout -b other HEAD~5 
+   for i in $(test_seq 1 10); do
+   test_commit side-$i
+   done 
+   git checkout master 
+   blob=$(echo tagged-blob | git hash-object -w --stdin) 
+   git tag tagged-blob $blob 
+   git config pack.writebitmaps true
+'
+
+test_expect_success 'full repack creates bitmaps' '
+   git repack -ad 
+   ls .git/objects/pack/ | grep bitmap output 
+   test_line_count = 1 output
+'
+
+test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
+   git rev-list --test-bitmap HEAD
+'
+
+rev_list_tests() {
+   state=$1
+
+   test_expect_success counting commits via bitmap ($state) '
+   git rev-list --count HEAD expect 
+   git rev-list --use-bitmap-index --count HEAD actual 
+   test_cmp expect actual
+   '
+
+   test_expect_success counting partial commits via bitmap ($state) '
+   git rev-list --count HEAD~5..HEAD expect 
+   git rev-list --use-bitmap-index --count HEAD~5..HEAD actual 
+   test_cmp expect actual
+   '
+
+   test_expect_success counting non-linear history ($state) '
+   git rev-list --count other...master expect 
+   git rev-list --use-bitmap-index --count other...master actual 

+   test_cmp expect actual
+   '
+
+   test_expect_success enumerate --objects ($state) '
+   git rev-list --objects --use-bitmap-index HEAD tmp 
+   cut -d  -f1 tmp tmp2 
+   sort tmp2 actual 
+   git rev-list --objects HEAD tmp 
+   cut -d  -f1 tmp tmp2 
+   sort tmp2 expect 
+   test_cmp expect actual
+   '
+
+   test_expect_success bitmap --objects handles non-commit objects 
($state) '
+   git rev-list --objects --use-bitmap-index HEAD tagged-blob 
actual 
+   grep $blob actual
+   '
+}
+
+rev_list_tests 'full bitmap'
+
+test_expect_success 'clone from bitmapped repository' '
+   git clone --no-local --bare . clone.git 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_expect_success 'setup further non-bitmapped commits' '
+   for i in $(test_seq 1 10); do
+   test_commit further-$i
+   done
+'
+
+rev_list_tests 'partial bitmap'
+
+test_expect_success 'fetch (partial bitmap)' '
+   git --git-dir=clone.git fetch origin master:master 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_expect_success 'incremental repack cannot create bitmaps' '
+   test_commit more-1 
+   test_must_fail git repack -d
+'
+
+test_expect_success 'incremental repack can disable bitmaps' '
+   test_commit more-2 
+   git repack -d --no-write-bitmap-index
+'
+
+test_expect_success 'full repack, reusing previous bitmaps' '
+   git repack -ad 
+   ls .git/objects/pack/ | grep bitmap output 
+   test_line_count = 1 output
+'
+
+test_expect_success 'fetch (full bitmap)' '
+   git --git-dir=clone.git fetch origin master:master 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_lazy_prereq JGIT '
+   type jgit
+'
+
+test_expect_success JGIT 'we can read jgit bitmaps' '
+   git clone . compat-jgit 
+   (
+   cd compat-jgit 
+   rm -f .git/objects/pack/*.bitmap 
+   jgit gc 
+   git rev-list --test-bitmap HEAD
+   )
+'
+
+test_expect_success JGIT 'jgit can read our bitmaps' '
+   git clone . compat-us.git 
+   (
+   cd compat-us.git 
+   git repack -adb 
+   # jgit gc will barf if it does not like our bitmaps
+   jgit gc
+   )
+'
+
+test_done
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More 

[PATCH v3 20/21] t/perf: add tests for pack bitmaps

2013-11-14 Thread Jeff King
This adds a few basic perf tests for the pack bitmap code to
show off its improvements. The tests are:

  1. How long does it take to do a repack (it gets slower
 with bitmaps, since we have to do extra work)?

  2. How long does it take to do a clone (it gets faster
 with bitmaps)?

  3. How does a small fetch perform when we've just
 repacked?

  4. How does a clone perform when we haven't repacked since
 a week of pushes?

Here are results against linux.git:

Test  origin/master   this tree
---
5310.2: repack to disk33.64(32.64+2.04)   67.67(66.75+1.84) +101.2%
5310.3: simulated clone   30.49(29.47+2.05)   1.20(1.10+0.10) -96.1%
5310.4: simulated fetch   3.49(6.79+0.06) 5.57(22.35+0.07) +59.6%
5310.6: partial bitmap36.70(43.87+1.81)   8.18(21.92+0.73) -77.7%

You can see that we do take longer to repack, but we do way
better for further clones. A small fetch performs a bit
worse, as we spend way more time on delta compression (note
the heavy user CPU time, as we have 8 threads) due to the
lack of name hashes for the bitmapped objects.

The final test shows how the bitmaps degrade over time
between packs. There's still a significant speedup over the
non-bitmap case, but we don't do quite as well (we have to
spend time accessing the new objects the old fashioned
way, including delta compression).

Signed-off-by: Jeff King p...@peff.net
---
 t/perf/p5310-pack-bitmaps.sh | 56 
 1 file changed, 56 insertions(+)
 create mode 100755 t/perf/p5310-pack-bitmaps.sh

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
new file mode 100755
index 000..ac036d0
--- /dev/null
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -0,0 +1,56 @@
+#!/bin/sh
+
+test_description='Tests pack performance using bitmaps'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+# note that we do everything through config,
+# since we want to be able to compare bitmap-aware
+# git versus non-bitmap git
+test_expect_success 'setup bitmap config' '
+   git config pack.writebitmaps true
+'
+
+test_perf 'repack to disk' '
+   git repack -ad
+'
+
+test_perf 'simulated clone' '
+   git pack-objects --stdout --all /dev/null /dev/null
+'
+
+test_perf 'simulated fetch' '
+   have=$(git rev-list HEAD --until=1.week.ago -1) 
+   {
+   echo HEAD 
+   echo ^$have
+   } | git pack-objects --revs --stdout /dev/null
+'
+
+test_expect_success 'create partial bitmap state' '
+   # pick a commit to represent the repo tip in the past
+   cutoff=$(git rev-list HEAD --until=1.week.ago -1) 
+   orig_tip=$(git rev-parse HEAD) 
+
+   # now kill off all of the refs and pretend we had
+   # just the one tip
+   rm -rf .git/logs .git/refs/* .git/packed-refs
+   git update-ref HEAD $cutoff
+
+   # and then repack, which will leave us with a nice
+   # big bitmap pack of the old history, and all of
+   # the new history will be loose, as if it had been pushed
+   # up incrementally and exploded via unpack-objects
+   git repack -Ad
+
+   # and now restore our original tip, as if the pushes
+   # had happened
+   git update-ref HEAD $orig_tip
+'
+
+test_perf 'partial bitmap' '
+   git pack-objects --stdout --all /dev/null /dev/null
+'
+
+test_done
-- 
1.8.5.rc0.443.g2df7f3f

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 64-bit support.

2013-11-14 Thread Konstantin Khomoutov
On Thu, 14 Nov 2013 16:58:31 +0400
Лежанкин Иван  abys...@gmail.com wrote:

 Do you plan to implement the 64-bit support in git? - Right now I have
 a problems sometimes with a huge repo and renaming detection. If I
 merge more than 32768 files at once, then the renaming detection
 fails, because of limitation inside git. The limitation is put by max
 32-bit value.
 
 I tweaked sources locally, to use 64-bit value as a number of merging
 files - generally, it works. But I'm not so brave to try to replace it
 everywhere in git.

$ apt-cache policy git
git:
  Installed: 1:1.7.10.4-1+wheezy1
  Candidate: 1:1.7.10.4-1+wheezy1
  Version table:
 *** 1:1.7.10.4-1+wheezy1 0
500 http://cdn.debian.net/debian/ wheezy/main amd64 Packages
100 /var/lib/dpkg/status

Notice the amd64 arch.  So what the question really is about?

If you have found a place where Git explicitly uses a 32-bit integer
where it would better be using a 64-bit one, please propose a patch to
discuss.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Sitaram Chamarty
On 11/14/2013 04:39 PM, Jeff King wrote:
 On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote:
 
 I do not know about any particular debate in git circles, but I assume
 Sitaram is referring to this incident:

   https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ

 in which a Jenkins dev force-pushed and rewound history on 150 different
 repos. In this case the reflog made rollback easy, but if he had pushed
 a deletion, it would be harder.

 I don't know if they had a reflog on the server side; they used
 client-side reflogs if I understood correctly.

 I'm talking about server side (bare repo), assuming the site has
 core.logAllRefUpdates set.
 
 Yes, they did have server-side reflogs (the pushes were to GitHub, and
 we reflog everything). Client-side reflogs would not be sufficient, as
 the client who pushed does not record the history he just rewound (he
 _might_ have it at refs/remotes/origin/master@{1}, but if somebody
 pushed since his last fetch, then he doesn't).
 
 The simplest way to recover is to just have everyone push again
 (without --force). The history will just silently fast-forward to
 whoever has the most recent tip. The downside is that you have to wait
 for that person to actually push. :)
 
 I think they started with that, and then eventually GitHub support got
 wind of it and pulled the last value for each repo out of the
 server-side reflog for them.

Great.  But what does github do if the branches were *deleted* by
mistake (say someone does a git push --mirror; most likely in a
script, for added fun and laughs!)

Github may be able to help people recover from that also, but plain Git
won't.

And that's what I would like to see a change in.

 
 -Peff
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Sitaram Chamarty
On 11/14/2013 04:47 PM, Luca Milanesio wrote:
 Would be really useful anyway to have the ability to create a
 server-side reference based on a SHA-1, using the Git protocol.
 Alternatively, just fetching a remote repo based on a SHA-1 (not
 referenced by any ref-spec but still existent) so that you can create
 a new reference locally and push.

That's a security issue.

Just to clarify, what I am asking for is the ability to recover on the
server, where you have access to the actual files that comprise the
repo.

sitaram

 
 Luca.
 
 On 14 Nov 2013, at 11:09, Jeff King p...@peff.net wrote:
 
 On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote:

 I do not know about any particular debate in git circles, but I assume
 Sitaram is referring to this incident:

  https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ

 in which a Jenkins dev force-pushed and rewound history on 150 different
 repos. In this case the reflog made rollback easy, but if he had pushed
 a deletion, it would be harder.

 I don't know if they had a reflog on the server side; they used
 client-side reflogs if I understood correctly.

 I'm talking about server side (bare repo), assuming the site has
 core.logAllRefUpdates set.

 Yes, they did have server-side reflogs (the pushes were to GitHub, and
 we reflog everything). Client-side reflogs would not be sufficient, as
 the client who pushed does not record the history he just rewound (he
 _might_ have it at refs/remotes/origin/master@{1}, but if somebody
 pushed since his last fetch, then he doesn't).

 The simplest way to recover is to just have everyone push again
 (without --force). The history will just silently fast-forward to
 whoever has the most recent tip. The downside is that you have to wait
 for that person to actually push. :)

 I think they started with that, and then eventually GitHub support got
 wind of it and pulled the last value for each repo out of the
 server-side reflog for them.

 -Peff
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Stephen Bash
- Original Message -
 From: Jeff King p...@peff.net
 Sent: Thursday, November 14, 2013 3:14:56 AM
 Subject: Re: can we prevent reflog deletion when branch is deleted?
 
 On Thu, Nov 14, 2013 at 05:48:50AM +0530, Sitaram Chamarty wrote:
 
  Is there *any* way we can preserve a reflog for a deleted branch,
  perhaps under logs/refs/deleted/timestamp/full/ref/name ?
 
 At GitHub, we log each change to an audit log in addition to the
 regular reflog (we also stuff extra data from the environment into the
 reflog message). So even after a branch is deleted, its audit log
 entries remain, though you have to pull out the data by hand (git
 doesn't know about it at all, except as an append-only sink for
 writing). 

We recently ran into a similar situation at my $dayjob, so I made our
server side update hook log all pushes (including deletes) and added the
new log file to logrotate(8) -- note: make sure if logrotate recreates
the file that it allows everyone to write to it.  I'm sure it's not as
comprehensive as Peff's solution, but it's pretty simple for smaller
shops that want a little more protection.  Here are the relevant
excerpts from the script:

#!/usr/bin/env python

import os, sys, pwd, stat
from datetime import datetime

def log_push(too_many_changes):
log_file = 'push-log.txt'
try:
f = open(log_file, 'a')

try:
# In case we just created the file, attempt to chmod it
os.chmod(log_file, 0666)
except OSError:
# chmod will fail if the current user isn't the owner, but
# if we've gotten this far we already have write permissions,
# so just continue quietly
pass

# Linux/Mac okay, bad for Windows
username = pwd.getpwuid(os.getuid())[0]
f.write('%s: %s push by %s of %s from %s to %s\n'% \
(datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'Failed' if too_many_changes else 'Successful', username,
refname, oldsha, newsha))
f.close()
except IOError:
try:
log_stats = os.stat(log_file)
# Figure out owner and permissions
log_owner = pwd.getpwuid(log_stats.st_uid).pw_name
log_perm = oct(stat.S_IMODE(log_stats.st_mode))
print_flush('Unable to open %s for appending. Current owner ' + \
'is %s and permissions are %s.'%(log_file,
log_owner, log_perm))
except:
exception,desc,stack = sys.exc_info()
print_flush('Unable to open log file.  While generating error' + \
' message encountered error: %s'%(desc))

if len(sys.argv) != 4:
print_flush('Usage: %s refname oldsha newsha'%sys.argv[0])
sys.exit(1)

refname = sys.argv[1]
oldsha = sys.argv[2]
newsha = sys.argv[3]

if newsha == '0'*40:
# Deleted ref, nothing to do
log_push(False)
sys.exit(0)

# ... checking for various rule/style violations ...

log_push(too_many_changes)
if too_many_changes:
sys.exit(1)
else:
sys.exit(0)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 64-bit support.

2013-11-14 Thread Kent R. Spillner
Can you be more specific about the limitation you suspect you are hitting?  
32,768 is not the max 32-bit value.


 On Nov 14, 2013, at 6:58, Лежанкин Иван  abys...@gmail.com wrote:
 
 Hi!
 
 Do you plan to implement the 64-bit support in git? - Right now I have
 a problems sometimes with a huge repo and renaming detection. If I
 merge more than 32768 files at once, then the renaming detection
 fails, because of limitation inside git. The limitation is put by max
 32-bit value.
 
 I tweaked sources locally, to use 64-bit value as a number of merging
 files - generally, it works. But I'm not so brave to try to replace it
 everywhere in git.
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 64-bit support.

2013-11-14 Thread Лежанкин Иван
I hit this limit in file 'diffcore-rename.c':

if (rename_limit = 0 || rename_limit  32767)
rename_limit = 32767;

I just guess, that this limit comes from the O(N^2) complexity of the
comparison algorithm. Since the max 32-bit signed value is 2^31, then
the 2^15 = 32768 is somehow correlated with its square root, maybe,
like 2^(32/2 - 1) - to prevent overflow.
I'm trying to prepare the patch right now, that changes the `int
rename_limit` = `long rename_limit` and all intermediate variable
types. Is it a correct way to do?

On 14 November 2013 18:40, Kent R. Spillner kspill...@acm.org wrote:
 Can you be more specific about the limitation you suspect you are hitting?  
 32,768 is not the max 32-bit value.


 On Nov 14, 2013, at 6:58, Лежанкин Иван  abys...@gmail.com wrote:

 Hi!

 Do you plan to implement the 64-bit support in git? - Right now I have
 a problems sometimes with a huge repo and renaming detection. If I
 merge more than 32768 files at once, then the renaming detection
 fails, because of limitation inside git. The limitation is put by max
 32-bit value.

 I tweaked sources locally, to use 64-bit value as a number of merging
 files - generally, it works. But I'm not so brave to try to replace it
 everywhere in git.
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 64-bit support.

2013-11-14 Thread Konstantin Khomoutov
On Thu, 14 Nov 2013 18:55:52 +0400
Лежанкин Иван  abys...@gmail.com wrote:

 I hit this limit in file 'diffcore-rename.c':
 
 if (rename_limit = 0 || rename_limit  32767)
 rename_limit = 32767;
 
 I just guess, that this limit comes from the O(N^2) complexity of the
 comparison algorithm. Since the max 32-bit signed value is 2^31, then
 the 2^15 = 32768 is somehow correlated with its square root, maybe,
 like 2^(32/2 - 1) - to prevent overflow.
 I'm trying to prepare the patch right now, that changes the `int
 rename_limit` = `long rename_limit` and all intermediate variable
 types. Is it a correct way to do?

I beleive rename_limit comes from reading the diff.renameLimit
configuration variable.  The gitconfig(1) manual page hints to look at
the -l command-line option of `git diff` which is described this way:

  -lnum

The -M and -C options require O(n^2) processing time where n is the
  number of potential rename/copy targets. This option prevents
  rename/copy detection from running if the number of rename/copy
  targets exceeds the specified number.

This description is not too clear, I admit.

Looks like you're on the right track but the patch appears to require a
more wide impact:
* 32767 should be a default limit, applied in the case the user did not
  specify neither diff.renameLimit nor -l.
* If whatever value read from those sources is less than 0, an error
  should be thrown--it looks strange to just revert it to the default
  value in this case.
* If the user-supplied value is = 0, then just use it, assume the user
  knows what they are doing.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Error with git-svn pushing a rename

2013-11-14 Thread Benjamin Pabst
Hi guys,

I'm using git as a subversion client, which works great so far. But
today I tried to push back a rename to the subversion server, which
resulted in the following error:

perl: subversion/libsvn_subr/dirent_uri.c:2489:
svn_fspath__skip_ancestor: Assertion
`svn_fspath__is_canonical(child_fspath)' failed.
error: git-svn died of signal 6

After searching the web I found a similar problem at stackoverflow:
http://stackoverflow.com/questions/17693255/git-svn-dcommit-fails-because-of-assertion-error-svn-fspath-is-canonicalchildj

So I tried to downgrade subversion (1.7.x) which didn't help. Then I
also tried to downgrade git (1.7.x) which resulted in a different
error:
'tempfile' can't be called as a method at
/usr/share/perl5/vendor_perl/Git.pm line 1042.

So I updated both packages to the current version:
$ git --version
git version 1.8.4.2

$ svn --version
svn, version 1.8.3 (r1516576)
   compiled Sep 13 2013, 00:34:15 on x86_64-unknown-linux-gnu

Copyright (C) 2013 The Apache Software Foundation.
This software consists of contributions made by many people;
see the NOTICE file for more information.
Subversion is open source software, see http://subversion.apache.org/

The following repository access (RA) modules are available:

* ra_svn : Module for accessing a repository using the svn network protocol.
  - with Cyrus SASL authentication
  - handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
  - handles 'file' scheme
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.
  - using serf 1.3.1
  - handles 'http' scheme
  - handles 'https' scheme

But now I'm back at the first error. Just for completeness, the error
occurs on the following operation:
$ git svn dcommit
Committing to https://x ...
R //SomeFile = //SomeOtherFile
perl: subversion/libsvn_subr/dirent_uri.c:2489:
svn_fspath__skip_ancestor: Assertion
`svn_fspath__is_canonical(child_fspath)' failed.
error: git-svn died of signal 6

Any ideas on handling this error?

Sorry for the (wrongly sent) first mail (incomplete).

Would be happy to hear from you.

Greetings
Ben
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Sitaram Chamarty
On 11/14/2013 01:44 PM, Jeff King wrote:
 On Thu, Nov 14, 2013 at 05:48:50AM +0530, Sitaram Chamarty wrote:
 
 Is there *any* way we can preserve a reflog for a deleted branch,
 perhaps under logs/refs/deleted/timestamp/full/ref/name ?
 
 I had patches to do something like this here:
 
   http://thread.gmane.org/gmane.comp.version-control.git/201715/focus=201752
 
 but there were definitely some buggy corners, as much of the code
 assumed you needed to have a ref to have a reflog. I don't even run with
 it locally anymore.
 
 At GitHub, we log each change to an audit log in addition to the
 regular reflog (we also stuff extra data from the environment into the
 reflog message). So even after a branch is deleted, its audit log
 entries remain, though you have to pull out the data by hand (git
 doesn't know about it at all, except as an append-only sink for
 writing). And git doesn't use the audit log for connectivity, either, so
 eventually the objects could be pruned.
 
 Just some basic protection -- don't delete the reflog, and instead,
 rename it to something that preserves the name but in a different
 namespace.
 
 That part is easy. Accessing it seamlessly and handling reflog
 expiration are a little harder. Not because they're intractable, but
 just because there are some low-level assumptions in the git code. The
 patch series I mentioned above mostly works. It probably just needs
 somebody to go through and find the corner cases.

The use cases I am talking about are those where someone deleted
something and it was noticed well within Git's the earliest of Git's
expire timeouts.

So, no need to worry about expiry times and connecting it with object
pruning.  Really, just the eqvt of a cp or mv of one file is all
that most people need.

Gitolite's log is the same.  So no one who uses Gitolite needs this
feature.  But people shouldn't have to install Gitolite or anything else
just to get this either!
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we prevent reflog deletion when branch is deleted?

2013-11-14 Thread Sitaram Chamarty
I can't resist...

On 11/14/2013 08:12 PM, Stephen Bash wrote:

[snipped some stuff from Peff]

[snipped 60 lines of python]

In honor of your last name, here's what I would do if I needed to log
ref updates (and wasn't using Gitolite):

#!/bin/bash
# -- use this as a post-receive hook

while read old new ref
do
echo $(date +%F %T): Successful push by $USER of $ref from $old to $new
done  push-log.txt
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v1.8.5-rc2

2013-11-14 Thread Marc Branchaud
On 13-11-13 06:07 PM, Junio C Hamano wrote:
 A release candidate Git v1.8.5-rc2 is now available for testing
 at the usual places.

 [ snip ]

 Updates since v1.8.4
 
 
 Foreign interfaces, subsystems and ports.
 
  * git-svn used with SVN 1.8.0 when talking over https:// connection
dumped core due to a bug in the serf library that SVN uses.  Work
it around on our side, even though the SVN side is being fixed.
 
  * On MacOS X, we detected if the filesystem needs the pre-composed
unicode strings workaround, but did not automatically enable it.
Now we do.
 
  * remote-hg remote helper misbehaved when interacting with a local Hg
repository relative to the home directory, e.g. clone hg::~/there.
 
  * imap-send ported to OS X uses Apple's security framework instead of
OpenSSL one.
 
  * Subversion 1.8.0 that was recently released breaks older subversion
clients coming over http/https in various ways.

Isn't this the same as the serf fixes ([1],[2])?  If not, what git change is
it referring to?

BTW,

$ git log --no-merges --oneline v1.8.4..v1.8.5-rc2 -- git-svn.perl perl/Git/SVN/

shows:

f849bb6 git-svn: Warn about changing default for --prefix in Git v2.0
60786bd git-svn: fix signed commit parsing
73ffac3 git-svn: fix termination issues for remote svn connections
8ac251b git-svn: allow git-svn fetching to work using serf

The release notes don't seem to mention 60786bd.

M.

[1] http://thread.gmane.org/gmane.comp.version-control.git/229720
[2] http://thread.gmane.org/gmane.comp.version-control.git/233701

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] RelNotes: Spelling grammar fixes.

2013-11-14 Thread Marc Branchaud
---

Mostly just tweaks, although I did change the foo^{tag} description a lot.

M.

 Documentation/RelNotes/1.8.5.txt | 158 ---
 1 file changed, 80 insertions(+), 78 deletions(-)

diff --git a/Documentation/RelNotes/1.8.5.txt b/Documentation/RelNotes/1.8.5.txt
index 13b4336..352dbbb 100644
--- a/Documentation/RelNotes/1.8.5.txt
+++ b/Documentation/RelNotes/1.8.5.txt
@@ -8,7 +8,7 @@ When git push [$there] does not say what to push, we have 
used the
 traditional matching semantics so far (all your branches were sent
 to the remote as long as there already are branches of the same name
 over there).  In Git 2.0, the default will change to the simple
-semantics that pushes:
+semantics, which pushes:
 
  - only the current branch to the branch with the same name, and only
when the current branch is set to integrate with that remote
@@ -55,7 +55,7 @@ Foreign interfaces, subsystems and ports.
 
  * git-svn used with SVN 1.8.0 when talking over https:// connection
dumped core due to a bug in the serf library that SVN uses.  Work
-   it around on our side, even though the SVN side is being fixed.
+   around it on our side, even though the SVN side is being fixed.
 
  * On MacOS X, we detected if the filesystem needs the pre-composed
unicode strings workaround, but did not automatically enable it.
@@ -65,7 +65,7 @@ Foreign interfaces, subsystems and ports.
repository relative to the home directory, e.g. clone hg::~/there.
 
  * imap-send ported to OS X uses Apple's security framework instead of
-   OpenSSL one.
+   OpenSSL's.
 
  * Subversion 1.8.0 that was recently released breaks older subversion
clients coming over http/https in various ways.
@@ -79,22 +79,22 @@ UI, Workflows  Features
  * xdg-open can be used as a browser backend for git web-browse
(hence to show git help -w output), when available.
 
- * git grep and git show pays attention to --textconv option
+ * git grep and git show pay attention to the --textconv option
when these commands are told to operate on blob objects (e.g. git
-   grep -e pattern HEAD:Makefile).
+   grep -e pattern --textconv HEAD:Makefile).
 
  * git replace helper no longer allows an object to be replaced with
another object of a different type to avoid confusion (you can
-   still manually craft such replacement using git update-ref, as an
+   still manually craft such a replacement using git update-ref, as an
escape hatch).
 
- * git status no longer prints dirty status information for
+ * git status no longer prints the dirty status information of
submodules for which submodule.$name.ignore is set to all.
 
  * git rebase -i honours core.abbrev when preparing the insn sheet
for editing.
 
- * git status during a cherry-pick shows what original commit is
+ * git status during a cherry-pick shows which original commit is
being picked.
 
  * Instead of typing four capital letters HEAD, you can say @ now,
@@ -102,21 +102,21 @@ UI, Workflows  Features
 
  * git check-ignore follows the same rule as git add and git
status in that the ignore/exclude mechanism does not take effect
-   on paths that are already tracked.  With --no-index option, it
+   on paths that are already tracked.  With the --no-index option, it
can be used to diagnose which paths that should have been ignored
have been mistakenly added to the index.
 
  * Some irrelevant advice messages that are shared with git status
output have been removed from the commit log template.
 
- * update-refs learnt a --stdin option to read multiple update
+ * update-refs learned a --stdin option to read multiple update
requests and perform them in an all-or-none fashion.
 
  * Just like make -C directory, git -C directory ... tells Git
to go there before doing anything else.
 
- * Just like git checkout - knows to check out and git merge -
-   knows to merge the branch you were previously on, git cherry-pick
+ * Just like git checkout - knows to check out, and git merge -
+   knows to merge, the branch you were previously on, git cherry-pick
now understands git cherry-pick - to pick from the previous
branch.
 
@@ -126,56 +126,58 @@ UI, Workflows  Features
git status --porcelain instead, as its format is stable and easier
to parse.
 
- * Make foo^{tag} to peel a tag to itself, i.e. no-op., and fail if
-   foo is not a tag.  git rev-parse --verify v1.0^{tag} would be
-   a more convenient way to say test $(git cat-file -t v1.0) = tag.
+ * The ref syntax foo^{tag} (with the literal string {tag}) peels a
+   tag ref to itself, i.e. it's a no-op., and fails if
+   foo is not a tag.  git rev-parse --verify v1.0^{tag} is
+   a more convenient way than test $(git cat-file -t v1.0) = tag to
+   check if v1.0 is a tag.
 
  * git branch -v -v (and git status) did not distinguish among a
-   branch that does not build on any other branch, a branch that is in
-   sync with the branch it builds on, and a 

[PATCH] branch: fix --verbose output column alignment

2013-11-14 Thread Torstein Hegge
Commit f2e0873 (branch: report invalid tracking branch as gone) removed
an early return from fill_tracking_info() in the path taken when 'git
branch -v' lists a branch in sync with its upstream. This resulted in an
unconditionally added space in front of the subject line:

$ git branch -v
* master f5eb3da  commit pushed to upstream
  topic  f935eb6 unpublished topic

Instead, only add the trailing space if a decoration have been added.

To catch this kind of whitespace breakage in the tests, be a bit less
smart when filtering the output through sed.

Signed-off-by: Torstein Hegge he...@resisty.net
---
 builtin/branch.c |  8 +++-
 t/t6040-tracking-info.sh | 24 +---
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 0bb0e93..636a16e 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -424,6 +424,7 @@ static void fill_tracking_info(struct strbuf *stat, const 
char *branch_name,
struct branch *branch = branch_get(branch_name);
struct strbuf fancy = STRBUF_INIT;
int upstream_is_gone = 0;
+   int added_decoration = 1;
 
switch (stat_tracking_info(branch, ours, theirs)) {
case 0:
@@ -451,9 +452,13 @@ static void fill_tracking_info(struct strbuf *stat, const 
char *branch_name,
if (upstream_is_gone) {
if (show_upstream_ref)
strbuf_addf(stat, _([%s: gone]), fancy.buf);
+   else
+   added_decoration = 0;
} else if (!ours  !theirs) {
if (show_upstream_ref)
strbuf_addf(stat, _([%s]), fancy.buf);
+   else
+   added_decoration = 0;
} else if (!ours) {
if (show_upstream_ref)
strbuf_addf(stat, _([%s: behind %d]), fancy.buf, 
theirs);
@@ -474,7 +479,8 @@ static void fill_tracking_info(struct strbuf *stat, const 
char *branch_name,
ours, theirs);
}
strbuf_release(fancy);
-   strbuf_addch(stat, ' ');
+   if (added_decoration)
+   strbuf_addch(stat, ' ');
free(ref);
 }
 
diff --git a/t/t6040-tracking-info.sh b/t/t6040-tracking-info.sh
index ba26cfe..7ac8fd0 100755
--- a/t/t6040-tracking-info.sh
+++ b/t/t6040-tracking-info.sh
@@ -39,12 +39,14 @@ test_expect_success setup '
advance h
 '
 
-script='s/^..\(b.\)[0-9a-f]*\[\([^]]*\)\].*/\1 \2/p'
+script='s/^..\(b.\) *[0-9a-f]* \(.*\)$/\1 \2/p'
 cat expect \EOF
-b1 ahead 1, behind 1
-b2 ahead 1, behind 1
-b3 behind 1
-b4 ahead 2
+b1 [ahead 1, behind 1] d
+b2 [ahead 1, behind 1] d
+b3 [behind 1] b
+b4 [ahead 2] f
+b5 g
+b6 c
 EOF
 
 test_expect_success 'branch -v' '
@@ -57,12 +59,12 @@ test_expect_success 'branch -v' '
 '
 
 cat expect \EOF
-b1 origin/master: ahead 1, behind 1
-b2 origin/master: ahead 1, behind 1
-b3 origin/master: behind 1
-b4 origin/master: ahead 2
-b5 brokenbase: gone
-b6 origin/master
+b1 [origin/master: ahead 1, behind 1] d
+b2 [origin/master: ahead 1, behind 1] d
+b3 [origin/master: behind 1] b
+b4 [origin/master: ahead 2] f
+b5 [brokenbase: gone] g
+b6 [origin/master] c
 EOF
 
 test_expect_success 'branch -vv' '
-- 
1.8.5.rc0.216.ge00de29

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 00/14] New hash table implementation

2013-11-14 Thread Karsten Blees
Also here:
https://github.com/kblees/git/commits/kb/hashmap-v5

Changes since v3:
- changed table resizing and grow / shrink thresholds (#02)
- hashmap_free: changed free_function to boolean (#02, #06, #07, #09)
- swapped xcalloc args (num elements, element size) (#02)
- fixed pointer cast formatting (type*) - (type *) (#02)
- removed function pointer casts (*fn)(...) - fn(...) (#02)
- api-hashmap.txt: clarified removal / replacement of entries (#02)
- update-index.c: fixed const-casts (#13, #14)

Karsten


Jens Lehmann (1):
  submodule: don't access the .gitmodules cache entry after removing it

Karsten Blees (13):
  add a hashtable implementation that supports O(1) removal
  buitin/describe.c: use new hash map implementation
  diffcore-rename.c: move code around to prepare for the next patch
  diffcore-rename.c: simplify finding exact renames
  diffcore-rename.c: use new hash map implementation
  name-hash.c: use new hash map implementation for directories
  name-hash.c: remove unreferenced directory entries
  name-hash.c: use new hash map implementation for cache entries
  name-hash.c: remove cache entries instead of marking them CE_UNHASHED
  remove old hash.[ch] implementation
  fix 'git update-index --verbose --again' output
  builtin/update-index.c: cleanup update_one
  read-cache.c: fix memory leaks caused by removed cache entries

 Documentation/technical/api-hash.txt|  52 ---
 Documentation/technical/api-hashmap.txt | 235 +
 Makefile|   5 +-
 builtin/describe.c  |  53 +++
 builtin/rm.c|   2 +-
 builtin/update-index.c  |  39 +++--
 cache.h |  18 +--
 diffcore-rename.c   | 185 ---
 hash.c  | 110 --
 hash.h  |  50 ---
 hashmap.c   | 228 
 hashmap.h   |  71 +
 name-hash.c | 156 +++
 read-cache.c|  10 +-
 resolve-undo.c  |   7 +-
 submodule.c |  25 +---
 t/t0011-hashmap.sh  | 240 ++
 test-hashmap.c  | 256 
 unpack-trees.c  |   3 +-
 19 files changed, 1216 insertions(+), 529 deletions(-)
 delete mode 100644 Documentation/technical/api-hash.txt
 create mode 100644 Documentation/technical/api-hashmap.txt
 delete mode 100644 hash.c
 delete mode 100644 hash.h
 create mode 100644 hashmap.c
 create mode 100644 hashmap.h
 create mode 100755 t/t0011-hashmap.sh
 create mode 100644 test-hashmap.c

--

git diff kb/hashmap-v4:

diff --git a/Documentation/technical/api-hashmap.txt 
b/Documentation/technical/api-hashmap.txt
index fc46e56..b2280f1 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -42,11 +42,6 @@ parameters that have the same hash code. When looking up an 
entry, the `key`
 and `keydata` parameters to hashmap_get and hashmap_remove are always passed
 as second and third argument, respectively. Otherwise, `keydata` is NULL.
 
-`void (*hashmap_free_fn)(void *entry)`::
-
-   User-supplied function to free a hashmap_entry. If entries are simply
-   malloc'ed memory, use stdlib's free() function.
-
 Functions
 -
 
@@ -76,14 +71,14 @@ If the total number of entries is known in advance, the 
`initial_size`
 parameter may be used to preallocate a sufficiently large table and thus
 prevent expensive resizing. If 0, the table is dynamically resized.
 
-`void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)`::
+`void hashmap_free(struct hashmap *map, int free_entries)`::
 
Frees a hashmap structure and allocated memory.
 +
 `map` is the hashmap to free.
 +
-If `free_function` is specified (not NULL), it is called to free each
-hashmap_entry in the map. If entries are simply malloc'ed, use stdlib's free().
+If `free_entries` is true, each hashmap_entry in the map is freed as well
+(using stdlib's free()).
 
 `void hashmap_entry_init(void *entry, int hash)`::
 
@@ -127,17 +122,20 @@ call to `hashmap_get` or `hashmap_get_next`.
 
 `void *hashmap_put(struct hashmap *map, void *entry)`::
 
-   Adds or replaces a hashmap entry.
+   Adds or replaces a hashmap entry. If the hashmap contains duplicate
+   entries equal to the specified entry, only one of them will be replaced.
 +
 `map` is the hashmap structure.
 +
-`entry` is the entry to add or update.
+`entry` is the entry to add or replace.
 +
-Returns the previous entry, or NULL if not found (i.e. the entry was added).
+Returns the replaced entry, or NULL if not found (i.e. the entry was added).
 
 `void *hashmap_remove(struct hashmap *map, 

[PATCH v5 01/14] submodule: don't access the .gitmodules cache entry after removing it

2013-11-14 Thread Karsten Blees
From: Jens Lehmann jens.lehm...@web.de

Commit 5fee995244e introduced the stage_updated_gitmodules() function to
add submodule configuration updates to the index. It assumed that even
after calling remove_cache_entry_at() the same cache entry would still be
valid. This was true in the old days, as cache entries could never be
freed, but that is not so sure in the present as there is ongoing work to
free removed cache entries, which makes this code segfault.

Fix that by calling add_file_to_cache() instead of open coding it. Also
remove the could not find .gitmodules in index warning, as that won't
happen in regular use cases (and by then just silently adding it to the
index we do the right thing).

Thanks-to: Karsten Blees karsten.bl...@gmail.com
Signed-off-by: Jens Lehmann jens.lehm...@web.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 submodule.c | 25 +
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/submodule.c b/submodule.c
index 1905d75..e388487 100644
--- a/submodule.c
+++ b/submodule.c
@@ -116,30 +116,7 @@ int remove_path_from_gitmodules(const char *path)
 
 void stage_updated_gitmodules(void)
 {
-   struct strbuf buf = STRBUF_INIT;
-   struct stat st;
-   int pos;
-   struct cache_entry *ce;
-   int namelen = strlen(.gitmodules);
-
-   pos = cache_name_pos(.gitmodules, namelen);
-   if (pos  0) {
-   warning(_(could not find .gitmodules in index));
-   return;
-   }
-   ce = active_cache[pos];
-   ce-ce_flags = namelen;
-   if (strbuf_read_file(buf, .gitmodules, 0)  0)
-   die(_(reading updated .gitmodules failed));
-   if (lstat(.gitmodules, st)  0)
-   die_errno(_(unable to stat updated .gitmodules));
-   fill_stat_cache_info(ce, st);
-   ce-ce_mode = ce_mode_from_stat(ce, st.st_mode);
-   if (remove_cache_entry_at(pos)  0)
-   die(_(unable to remove .gitmodules from index));
-   if (write_sha1_file(buf.buf, buf.len, blob_type, ce-sha1))
-   die(_(adding updated .gitmodules failed));
-   if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE))
+   if (add_file_to_cache(.gitmodules, 0))
die(_(staging updated .gitmodules failed));
 }
 
-- 
1.8.5.rc0.333.g5394214

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 03/14] buitin/describe.c: use new hash map implementation

2013-11-14 Thread Karsten Blees
Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/describe.c | 53 -
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 6f62109..0b2cef4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -6,7 +6,7 @@
 #include exec_cmd.h
 #include parse-options.h
 #include diff.h
-#include hash.h
+#include hashmap.h
 #include argv-array.h
 
 #define SEEN   (1u  0)
@@ -25,7 +25,7 @@ static int longformat;
 static int first_parent;
 static int abbrev = -1; /* unspecified */
 static int max_candidates = 10;
-static struct hash_table names;
+static struct hashmap names;
 static int have_util;
 static const char *pattern;
 static int always;
@@ -37,7 +37,7 @@ static const char *diff_index_args[] = {
 };
 
 struct commit_name {
-   struct commit_name *next;
+   struct hashmap_entry entry;
unsigned char peeled[20];
struct tag *tag;
unsigned prio:2; /* annotated tag = 2, tag = 1, head = 0 */
@@ -50,6 +50,12 @@ static const char *prio_names[] = {
head, lightweight, annotated,
 };
 
+static int commit_name_cmp(const struct commit_name *cn1,
+   const struct commit_name *cn2, const void *peeled)
+{
+   return hashcmp(cn1-peeled, peeled ? peeled : cn2-peeled);
+}
+
 static inline unsigned int hash_sha1(const unsigned char *sha1)
 {
unsigned int hash;
@@ -59,21 +65,9 @@ static inline unsigned int hash_sha1(const unsigned char 
*sha1)
 
 static inline struct commit_name *find_commit_name(const unsigned char *peeled)
 {
-   struct commit_name *n = lookup_hash(hash_sha1(peeled), names);
-   while (n  !!hashcmp(peeled, n-peeled))
-   n = n-next;
-   return n;
-}
-
-static int set_util(void *chain, void *data)
-{
-   struct commit_name *n;
-   for (n = chain; n; n = n-next) {
-   struct commit *c = lookup_commit_reference_gently(n-peeled, 1);
-   if (c)
-   c-util = n;
-   }
-   return 0;
+   struct commit_name key;
+   hashmap_entry_init(key, hash_sha1(peeled));
+   return hashmap_get(names, key, peeled);
 }
 
 static int replace_name(struct commit_name *e,
@@ -118,16 +112,10 @@ static void add_to_known_names(const char *path,
struct tag *tag = NULL;
if (replace_name(e, prio, sha1, tag)) {
if (!e) {
-   void **pos;
e = xmalloc(sizeof(struct commit_name));
hashcpy(e-peeled, peeled);
-   pos = insert_hash(hash_sha1(peeled), e, names);
-   if (pos) {
-   e-next = *pos;
-   *pos = e;
-   } else {
-   e-next = NULL;
-   }
+   hashmap_entry_init(e, hash_sha1(peeled));
+   hashmap_add(names, e);
e-path = NULL;
}
e-tag = tag;
@@ -292,7 +280,14 @@ static void describe(const char *arg, int last_one)
fprintf(stderr, _(searching to describe %s\n), arg);
 
if (!have_util) {
-   for_each_hash(names, set_util, NULL);
+   struct hashmap_iter iter;
+   struct commit *c;
+   struct commit_name *n = hashmap_iter_first(names, iter);
+   for (; n; n = hashmap_iter_next(iter)) {
+   c = lookup_commit_reference_gently(n-peeled, 1);
+   if (c)
+   c-util = n;
+   }
have_util = 1;
}
 
@@ -463,9 +458,9 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
return cmd_name_rev(args.argc, args.argv, prefix);
}
 
-   init_hash(names);
+   hashmap_init(names, (hashmap_cmp_fn) commit_name_cmp, 0);
for_each_rawref(get_name, NULL);
-   if (!names.nr  !always)
+   if (!names.size  !always)
die(_(No names found, cannot describe anything.));
 
if (argc == 0) {
-- 
1.8.5.rc0.333.g5394214

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 04/14] diffcore-rename.c: move code around to prepare for the next patch

2013-11-14 Thread Karsten Blees
No actual code changes, just move hash_filespec up and outdent part of
find_identical_files.

Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 diffcore-rename.c | 98 +++
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..008a60c 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -248,6 +248,18 @@ struct file_similarity {
struct file_similarity *next;
 };
 
+static unsigned int hash_filespec(struct diff_filespec *filespec)
+{
+   unsigned int hash;
+   if (!filespec-sha1_valid) {
+   if (diff_populate_filespec(filespec, 0))
+   return 0;
+   hash_sha1_file(filespec-data, filespec-size, blob, 
filespec-sha1);
+   }
+   memcpy(hash, filespec-sha1, sizeof(hash));
+   return hash;
+}
+
 static int find_identical_files(struct file_similarity *src,
struct file_similarity *dst,
struct diff_options *options)
@@ -258,46 +270,46 @@ static int find_identical_files(struct file_similarity 
*src,
 * Walk over all the destinations ...
 */
do {
-   struct diff_filespec *target = dst-filespec;
-   struct file_similarity *p, *best;
-   int i = 100, best_score = -1;
-
-   /*
-* .. to find the best source match
-*/
-   best = NULL;
-   for (p = src; p; p = p-next) {
-   int score;
-   struct diff_filespec *source = p-filespec;
-
-   /* False hash collision? */
-   if (hashcmp(source-sha1, target-sha1))
-   continue;
-   /* Non-regular files? If so, the modes must match! */
-   if (!S_ISREG(source-mode) || !S_ISREG(target-mode)) {
-   if (source-mode != target-mode)
-   continue;
-   }
-   /* Give higher scores to sources that haven't been used 
already */
-   score = !source-rename_used;
-   if (source-rename_used  options-detect_rename != 
DIFF_DETECT_COPY)
-   continue;
-   score += basename_same(source, target);
-   if (score  best_score) {
-   best = p;
-   best_score = score;
-   if (score == 2)
-   break;
-   }
+   struct diff_filespec *target = dst-filespec;
+   struct file_similarity *p, *best;
+   int i = 100, best_score = -1;
 
-   /* Too many identical alternatives? Pick one */
-   if (!--i)
-   break;
+   /*
+* .. to find the best source match
+*/
+   best = NULL;
+   for (p = src; p; p = p-next) {
+   int score;
+   struct diff_filespec *source = p-filespec;
+
+   /* False hash collision? */
+   if (hashcmp(source-sha1, target-sha1))
+   continue;
+   /* Non-regular files? If so, the modes must match! */
+   if (!S_ISREG(source-mode) || !S_ISREG(target-mode)) {
+   if (source-mode != target-mode)
+   continue;
}
-   if (best) {
-   record_rename_pair(dst-index, best-index, MAX_SCORE);
-   renames++;
+   /* Give higher scores to sources that haven't been used already 
*/
+   score = !source-rename_used;
+   if (source-rename_used  options-detect_rename != 
DIFF_DETECT_COPY)
+   continue;
+   score += basename_same(source, target);
+   if (score  best_score) {
+   best = p;
+   best_score = score;
+   if (score == 2)
+   break;
}
+
+   /* Too many identical alternatives? Pick one */
+   if (!--i)
+   break;
+   }
+   if (best) {
+   record_rename_pair(dst-index, best-index, MAX_SCORE);
+   renames++;
+   }
} while ((dst = dst-next) != NULL);
return renames;
 }
@@ -343,18 +355,6 @@ static int find_same_files(void *ptr, void *data)
return ret;
 }
 
-static unsigned int hash_filespec(struct diff_filespec *filespec)
-{
-   unsigned int hash;
-   if (!filespec-sha1_valid) {
-   if (diff_populate_filespec(filespec, 0))
-   return 0;
-   

Re: [PATCH v3 0/21] pack bitmaps

2013-11-14 Thread Ramsay Jones
On 14/11/13 12:41, Jeff King wrote:
 Here's another iteration of the pack bitmaps series. Compared to v2, it
 changes:
 
  - misc style/typo fixes
 
  - portability fixes from Ramsay and Torsten

Unfortunately, I didn't find time this weekend to finish the msvc build
fixes. However, after a quick squint at these patches, I think you have
almost done it for me! :-D

I must have misunderstood the previous discussion, because my patch was
written on the assumption that the ewah directory wouldn't be git-ified
(e.g. #include git-compat-util.h).

So, most of my patch is no longer necessary, given the use of the git
compat header (and removal of system headers). I suspect that you only
need to add an '#define PRIx64 I64x' definition (Hmm, probably to the
compat/mingw.h header).

I won't know for sure until I actually try them out, of course. I will
wait until these patches land in pu.

[Note: the msvc build is still broken, but the failure is not caused by
these patches. Unfortunately, the tests in t5310-*.sh fail. However, if
I include some debug code, the tests pass ... :-P ]

The part of the patch I was still working on was ...

 
  - count-objects garbage-reporting patch from Duy
 
  - disable bitmaps when is_repository_shallow(); this also covers the
case where the client is shallow, since we feed pack-objects a
--shallow-file in that case. This used to done by checking
!internal_rev_list, but that doesn't apply after cdab485.
 
  - ewah sources now properly use git-compat-util.h and do not include
system headers
 
  - the ewah code uses ewah_malloc, ewah_realloc, and so forth to let the
project use a particular allocator (and we want to use xmalloc and
friends). And we defined those in pack-bitmap.h, but of course that
had no effect on the ewah/*.c files that did not include
pack-bitmap.h.  Since we are hacking up and git-ifying libewok
anyway, we can just set the hardcoded fallback to xmalloc instead of
malloc.
 
   - the ewah code used gcc's __builtin_ctzll, but did not provide a
 suitable fallback. We now provide a fallback in C.

... here.

I was messing around with several implementations (including the use of
msvc compiler intrinsics) with the intention of doing some timing tests
etc. [I suspected my C fallback function (a different implementation to
yours) would be slightly faster.]

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 05/14] diffcore-rename.c: simplify finding exact renames

2013-11-14 Thread Karsten Blees
The find_exact_renames function currently only uses the hash table for
grouping, i.e.:

1. add sources
2. add destinations
3. iterate all buckets, per bucket:
4. split sources from destinations
5. iterate destinations, per destination:
6. iterate sources to find best match

This can be simplified by utilizing the lookup functionality of the hash
table, i.e.:

1. add sources
2. iterate destinations, per destination:
3. lookup sources matching the current destination
4. iterate sources to find best match

This saves several iterations and file_similarity allocations for the
destinations.

Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 diffcore-rename.c | 75 +++
 1 file changed, 20 insertions(+), 55 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 008a60c..cfeb408 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -243,7 +243,7 @@ static int score_compare(const void *a_, const void *b_)
 }
 
 struct file_similarity {
-   int src_dst, index;
+   int index;
struct diff_filespec *filespec;
struct file_similarity *next;
 };
@@ -260,25 +260,21 @@ static unsigned int hash_filespec(struct diff_filespec 
*filespec)
return hash;
 }
 
-static int find_identical_files(struct file_similarity *src,
-   struct file_similarity *dst,
+static int find_identical_files(struct hash_table *srcs,
+   int dst_index,
struct diff_options *options)
 {
int renames = 0;
 
-   /*
-* Walk over all the destinations ...
-*/
-   do {
-   struct diff_filespec *target = dst-filespec;
+   struct diff_filespec *target = rename_dst[dst_index].two;
struct file_similarity *p, *best;
int i = 100, best_score = -1;
 
/*
-* .. to find the best source match
+* Find the best source match for specified destination.
 */
best = NULL;
-   for (p = src; p; p = p-next) {
+   for (p = lookup_hash(hash_filespec(target), srcs); p; p = p-next) {
int score;
struct diff_filespec *source = p-filespec;
 
@@ -307,61 +303,28 @@ static int find_identical_files(struct file_similarity 
*src,
break;
}
if (best) {
-   record_rename_pair(dst-index, best-index, MAX_SCORE);
+   record_rename_pair(dst_index, best-index, MAX_SCORE);
renames++;
}
-   } while ((dst = dst-next) != NULL);
return renames;
 }
 
-static void free_similarity_list(struct file_similarity *p)
+static int free_similarity_list(void *p, void *unused)
 {
while (p) {
struct file_similarity *entry = p;
-   p = p-next;
+   p = entry-next;
free(entry);
}
+   return 0;
 }
 
-static int find_same_files(void *ptr, void *data)
-{
-   int ret;
-   struct file_similarity *p = ptr;
-   struct file_similarity *src = NULL, *dst = NULL;
-   struct diff_options *options = data;
-
-   /* Split the hash list up into sources and destinations */
-   do {
-   struct file_similarity *entry = p;
-   p = p-next;
-   if (entry-src_dst  0) {
-   entry-next = src;
-   src = entry;
-   } else {
-   entry-next = dst;
-   dst = entry;
-   }
-   } while (p);
-
-   /*
-* If we have both sources *and* destinations, see if
-* we can match them up
-*/
-   ret = (src  dst) ? find_identical_files(src, dst, options) : 0;
-
-   /* Free the hashes and return the number of renames found */
-   free_similarity_list(src);
-   free_similarity_list(dst);
-   return ret;
-}
-
-static void insert_file_table(struct hash_table *table, int src_dst, int 
index, struct diff_filespec *filespec)
+static void insert_file_table(struct hash_table *table, int index, struct 
diff_filespec *filespec)
 {
void **pos;
unsigned int hash;
struct file_similarity *entry = xmalloc(sizeof(*entry));
 
-   entry-src_dst = src_dst;
entry-index = index;
entry-filespec = filespec;
entry-next = NULL;
@@ -385,24 +348,26 @@ static void insert_file_table(struct hash_table *table, 
int src_dst, int index,
  */
 static int find_exact_renames(struct diff_options *options)
 {
-   int i;
+   int i, renames = 0;
struct hash_table file_table;
 
+   /* Add all sources to the hash table */
init_hash(file_table);
-   preallocate_hash(file_table, rename_src_nr + rename_dst_nr);
+   preallocate_hash(file_table, rename_src_nr);
for (i = 0; i  rename_src_nr; i++)
-   insert_file_table(file_table, -1, i, rename_src[i].p-one);
+ 

[PATCH v5 06/14] diffcore-rename.c: use new hash map implementation

2013-11-14 Thread Karsten Blees
Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 diffcore-rename.c | 48 +---
 1 file changed, 13 insertions(+), 35 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index cfeb408..9b4f068 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -4,7 +4,7 @@
 #include cache.h
 #include diff.h
 #include diffcore.h
-#include hash.h
+#include hashmap.h
 #include progress.h
 
 /* Table of rename/copy destinations */
@@ -243,9 +243,9 @@ static int score_compare(const void *a_, const void *b_)
 }
 
 struct file_similarity {
+   struct hashmap_entry entry;
int index;
struct diff_filespec *filespec;
-   struct file_similarity *next;
 };
 
 static unsigned int hash_filespec(struct diff_filespec *filespec)
@@ -260,21 +260,22 @@ static unsigned int hash_filespec(struct diff_filespec 
*filespec)
return hash;
 }
 
-static int find_identical_files(struct hash_table *srcs,
+static int find_identical_files(struct hashmap *srcs,
int dst_index,
struct diff_options *options)
 {
int renames = 0;
 
struct diff_filespec *target = rename_dst[dst_index].two;
-   struct file_similarity *p, *best;
+   struct file_similarity *p, *best, dst;
int i = 100, best_score = -1;
 
/*
 * Find the best source match for specified destination.
 */
best = NULL;
-   for (p = lookup_hash(hash_filespec(target), srcs); p; p = p-next) {
+   hashmap_entry_init(dst, hash_filespec(target));
+   for (p = hashmap_get(srcs, dst, NULL); p; p = hashmap_get_next(srcs, 
p)) {
int score;
struct diff_filespec *source = p-filespec;
 
@@ -309,34 +310,15 @@ static int find_identical_files(struct hash_table *srcs,
return renames;
 }
 
-static int free_similarity_list(void *p, void *unused)
+static void insert_file_table(struct hashmap *table, int index, struct 
diff_filespec *filespec)
 {
-   while (p) {
-   struct file_similarity *entry = p;
-   p = entry-next;
-   free(entry);
-   }
-   return 0;
-}
-
-static void insert_file_table(struct hash_table *table, int index, struct 
diff_filespec *filespec)
-{
-   void **pos;
-   unsigned int hash;
struct file_similarity *entry = xmalloc(sizeof(*entry));
 
entry-index = index;
entry-filespec = filespec;
-   entry-next = NULL;
-
-   hash = hash_filespec(filespec);
-   pos = insert_hash(hash, entry, table);
 
-   /* We already had an entry there? */
-   if (pos) {
-   entry-next = *pos;
-   *pos = entry;
-   }
+   hashmap_entry_init(entry, hash_filespec(filespec));
+   hashmap_add(table, entry);
 }
 
 /*
@@ -349,11 +331,10 @@ static void insert_file_table(struct hash_table *table, 
int index, struct diff_f
 static int find_exact_renames(struct diff_options *options)
 {
int i, renames = 0;
-   struct hash_table file_table;
+   struct hashmap file_table;
 
/* Add all sources to the hash table */
-   init_hash(file_table);
-   preallocate_hash(file_table, rename_src_nr);
+   hashmap_init(file_table, NULL, rename_src_nr);
for (i = 0; i  rename_src_nr; i++)
insert_file_table(file_table, i, rename_src[i].p-one);
 
@@ -361,11 +342,8 @@ static int find_exact_renames(struct diff_options *options)
for (i = 0; i  rename_dst_nr; i++)
renames += find_identical_files(file_table, i, options);
 
-   /* Free source file_similarity chains */
-   for_each_hash(file_table, free_similarity_list, options);
-
-   /* .. and free the hash data structure */
-   free_hash(file_table);
+   /* Free the hash data structure and entries */
+   hashmap_free(file_table, 1);
 
return renames;
 }
-- 
1.8.5.rc0.333.g5394214

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 07/14] name-hash.c: use new hash map implementation for directories

2013-11-14 Thread Karsten Blees
Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 cache.h |  3 ++-
 name-hash.c | 77 +++--
 2 files changed, 20 insertions(+), 60 deletions(-)

diff --git a/cache.h b/cache.h
index d1f3c71..84e9ad6 100644
--- a/cache.h
+++ b/cache.h
@@ -4,6 +4,7 @@
 #include git-compat-util.h
 #include strbuf.h
 #include hash.h
+#include hashmap.h
 #include advice.h
 #include gettext.h
 #include convert.h
@@ -278,7 +279,7 @@ struct index_state {
unsigned name_hash_initialized : 1,
 initialized : 1;
struct hash_table name_hash;
-   struct hash_table dir_hash;
+   struct hashmap dir_hash;
 };
 
 extern struct index_state the_index;
diff --git a/name-hash.c b/name-hash.c
index e5b6e1a..c75fadf 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,49 +8,28 @@
 #define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include cache.h
 
-/*
- * This removes bit 5 if bit 6 is set.
- *
- * That will make US-ASCII characters hash to their upper-case
- * equivalent. We could easily do this one whole word at a time,
- * but that's for future worries.
- */
-static inline unsigned char icase_hash(unsigned char c)
-{
-   return c  ~((c  0x40)  1);
-}
-
-static unsigned int hash_name(const char *name, int namelen)
-{
-   unsigned int hash = 0x123;
-
-   while (namelen--) {
-   unsigned char c = *name++;
-   c = icase_hash(c);
-   hash = hash*101 + c;
-   }
-   return hash;
-}
-
 struct dir_entry {
-   struct dir_entry *next;
+   struct hashmap_entry ent;
struct dir_entry *parent;
struct cache_entry *ce;
int nr;
unsigned int namelen;
 };
 
+static int dir_entry_cmp(const struct dir_entry *e1,
+   const struct dir_entry *e2, const char *name)
+{
+   return e1-namelen != e2-namelen || strncasecmp(e1-ce-name,
+   name ? name : e2-ce-name, e1-namelen);
+}
+
 static struct dir_entry *find_dir_entry(struct index_state *istate,
const char *name, unsigned int namelen)
 {
-   unsigned int hash = hash_name(name, namelen);
-   struct dir_entry *dir;
-
-   for (dir = lookup_hash(hash, istate-dir_hash); dir; dir = dir-next)
-   if (dir-namelen == namelen 
-   !strncasecmp(dir-ce-name, name, namelen))
-   return dir;
-   return NULL;
+   struct dir_entry key;
+   hashmap_entry_init(key, memihash(name, namelen));
+   key.namelen = namelen;
+   return hashmap_get(istate-dir_hash, key, name);
 }
 
 static struct dir_entry *hash_dir_entry(struct index_state *istate,
@@ -84,18 +63,11 @@ static struct dir_entry *hash_dir_entry(struct index_state 
*istate,
dir = find_dir_entry(istate, ce-name, namelen);
if (!dir) {
/* not found, create it and add to hash table */
-   void **pdir;
-   unsigned int hash = hash_name(ce-name, namelen);
-
dir = xcalloc(1, sizeof(struct dir_entry));
+   hashmap_entry_init(dir, memihash(ce-name, namelen));
dir-namelen = namelen;
dir-ce = ce;
-
-   pdir = insert_hash(hash, dir, istate-dir_hash);
-   if (pdir) {
-   dir-next = *pdir;
-   *pdir = dir;
-   }
+   hashmap_add(istate-dir_hash, dir);
 
/* recursively add missing parent directories */
dir-parent = hash_dir_entry(istate, ce, namelen);
@@ -134,7 +106,7 @@ static void hash_index_entry(struct index_state *istate, 
struct cache_entry *ce)
return;
ce-ce_flags |= CE_HASHED;
ce-next = NULL;
-   hash = hash_name(ce-name, ce_namelen(ce));
+   hash = memihash(ce-name, ce_namelen(ce));
pos = insert_hash(hash, ce, istate-name_hash);
if (pos) {
ce-next = *pos;
@@ -153,6 +125,7 @@ static void lazy_init_name_hash(struct index_state *istate)
return;
if (istate-cache_nr)
preallocate_hash(istate-name_hash, istate-cache_nr);
+   hashmap_init(istate-dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0);
for (nr = 0; nr  istate-cache_nr; nr++)
hash_index_entry(istate, istate-cache[nr]);
istate-name_hash_initialized = 1;
@@ -247,7 +220,7 @@ struct cache_entry *index_dir_exists(struct index_state 
*istate, const char *nam
 
 struct cache_entry *index_file_exists(struct index_state *istate, const char 
*name, int namelen, int icase)
 {
-   unsigned int hash = hash_name(name, namelen);
+   unsigned int hash = memihash(name, namelen);
struct cache_entry *ce;
 
lazy_init_name_hash(istate);
@@ -270,26 +243,12 @@ struct cache_entry *index_name_exists(struct index_state 
*istate, const char *na
return index_file_exists(istate, name, namelen, icase);
 }
 

[PATCH v5 08/14] name-hash.c: remove unreferenced directory entries

2013-11-14 Thread Karsten Blees
The new hashmap implementation supports remove, so remove and free
directory entries that are no longer referenced by active cache entries.

Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 name-hash.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index c75fadf..effe96d 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -86,15 +86,16 @@ static void add_dir_entry(struct index_state *istate, 
struct cache_entry *ce)
 static void remove_dir_entry(struct index_state *istate, struct cache_entry 
*ce)
 {
/*
-* Release reference to the directory entry (and parents if 0).
-*
-* Note: we do not remove / free the entry because there's no
-* hash.[ch]::remove_hash and dir-next may point to other entries
-* that are still valid, so we must not free the memory.
+* Release reference to the directory entry. If 0, remove and continue
+* with parent directory.
 */
struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));
-   while (dir  dir-nr  !(--dir-nr))
-   dir = dir-parent;
+   while (dir  !(--dir-nr)) {
+   struct dir_entry *parent = dir-parent;
+   hashmap_remove(istate-dir_hash, dir, NULL);
+   free(dir);
+   dir = parent;
+   }
 }
 
 static void hash_index_entry(struct index_state *istate, struct cache_entry 
*ce)
-- 
1.8.5.rc0.333.g5394214

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 09/14] name-hash.c: use new hash map implementation for cache entries

2013-11-14 Thread Karsten Blees
Note: the ce-next = NULL; in unpack-trees.c::do_add_entry can safely be
removed, as ce-next (now ce-ent.next) is always properly initialized in
name-hash.c::hash_index_entry.

Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 cache.h|  8 +---
 name-hash.c| 24 
 unpack-trees.c |  1 -
 3 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/cache.h b/cache.h
index 84e9ad6..cedc7ae 100644
--- a/cache.h
+++ b/cache.h
@@ -131,12 +131,12 @@ struct stat_data {
 };
 
 struct cache_entry {
+   struct hashmap_entry ent;
struct stat_data ce_stat_data;
unsigned int ce_mode;
unsigned int ce_flags;
unsigned int ce_namelen;
unsigned char sha1[20];
-   struct cache_entry *next;
char name[FLEX_ARRAY]; /* more */
 };
 
@@ -203,7 +203,9 @@ static inline void copy_cache_entry(struct cache_entry *dst,
unsigned int state = dst-ce_flags  CE_STATE_MASK;
 
/* Don't copy hash chain and name */
-   memcpy(dst, src, offsetof(struct cache_entry, next));
+   memcpy(dst-ce_stat_data, src-ce_stat_data,
+   offsetof(struct cache_entry, name) -
+   offsetof(struct cache_entry, ce_stat_data));
 
/* Restore the hash state */
dst-ce_flags = (dst-ce_flags  ~CE_STATE_MASK) | state;
@@ -278,7 +280,7 @@ struct index_state {
struct cache_time timestamp;
unsigned name_hash_initialized : 1,
 initialized : 1;
-   struct hash_table name_hash;
+   struct hashmap name_hash;
struct hashmap dir_hash;
 };
 
diff --git a/name-hash.c b/name-hash.c
index effe96d..488eccf 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -100,19 +100,11 @@ static void remove_dir_entry(struct index_state *istate, 
struct cache_entry *ce)
 
 static void hash_index_entry(struct index_state *istate, struct cache_entry 
*ce)
 {
-   void **pos;
-   unsigned int hash;
-
if (ce-ce_flags  CE_HASHED)
return;
ce-ce_flags |= CE_HASHED;
-   ce-next = NULL;
-   hash = memihash(ce-name, ce_namelen(ce));
-   pos = insert_hash(hash, ce, istate-name_hash);
-   if (pos) {
-   ce-next = *pos;
-   *pos = ce;
-   }
+   hashmap_entry_init(ce, memihash(ce-name, ce_namelen(ce)));
+   hashmap_add(istate-name_hash, ce);
 
if (ignore_case  !(ce-ce_flags  CE_UNHASHED))
add_dir_entry(istate, ce);
@@ -124,8 +116,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 
if (istate-name_hash_initialized)
return;
-   if (istate-cache_nr)
-   preallocate_hash(istate-name_hash, istate-cache_nr);
+   hashmap_init(istate-name_hash, NULL, istate-cache_nr);
hashmap_init(istate-dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0);
for (nr = 0; nr  istate-cache_nr; nr++)
hash_index_entry(istate, istate-cache[nr]);
@@ -221,18 +212,19 @@ struct cache_entry *index_dir_exists(struct index_state 
*istate, const char *nam
 
 struct cache_entry *index_file_exists(struct index_state *istate, const char 
*name, int namelen, int icase)
 {
-   unsigned int hash = memihash(name, namelen);
struct cache_entry *ce;
+   struct hashmap_entry key;
 
lazy_init_name_hash(istate);
-   ce = lookup_hash(hash, istate-name_hash);
 
+   hashmap_entry_init(key, memihash(name, namelen));
+   ce = hashmap_get(istate-name_hash, key, NULL);
while (ce) {
if (!(ce-ce_flags  CE_UNHASHED)) {
if (same_name(ce, name, namelen, icase))
return ce;
}
-   ce = ce-next;
+   ce = hashmap_get_next(istate-name_hash, ce);
}
return NULL;
 }
@@ -250,6 +242,6 @@ void free_name_hash(struct index_state *istate)
return;
istate-name_hash_initialized = 0;
 
-   free_hash(istate-name_hash);
+   hashmap_free(istate-name_hash, 0);
hashmap_free(istate-dir_hash, 1);
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index ad3e9a0..f8985d4 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -110,7 +110,6 @@ static void do_add_entry(struct unpack_trees_options *o, 
struct cache_entry *ce,
if (set  CE_REMOVE)
set |= CE_WT_REMOVE;
 
-   ce-next = NULL;
ce-ce_flags = (ce-ce_flags  ~clear) | set;
add_index_entry(o-result, ce,
ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
-- 
1.8.5.rc0.333.g5394214

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 10/14] name-hash.c: remove cache entries instead of marking them CE_UNHASHED

2013-11-14 Thread Karsten Blees
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.

The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.

Keep the CE_HASHED flag to prevent adding entries twice.

Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 cache.h|  6 ++
 name-hash.c| 46 ++
 read-cache.c   |  2 +-
 unpack-trees.c |  2 +-
 4 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/cache.h b/cache.h
index cedc7ae..05b8011 100644
--- a/cache.h
+++ b/cache.h
@@ -160,7 +160,6 @@ struct cache_entry {
 #define CE_ADDED (1  19)
 
 #define CE_HASHED(1  20)
-#define CE_UNHASHED  (1  21)
 #define CE_WT_REMOVE (1  22) /* remove in work directory */
 #define CE_CONFLICTED(1  23)
 
@@ -196,11 +195,10 @@ struct pathspec;
  * Copy the sha1 and stat state of a cache entry from one to
  * another. But we never change the name, or the hash state!
  */
-#define CE_STATE_MASK (CE_HASHED | CE_UNHASHED)
 static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
 {
-   unsigned int state = dst-ce_flags  CE_STATE_MASK;
+   unsigned int state = dst-ce_flags  CE_HASHED;
 
/* Don't copy hash chain and name */
memcpy(dst-ce_stat_data, src-ce_stat_data,
@@ -208,7 +206,7 @@ static inline void copy_cache_entry(struct cache_entry *dst,
offsetof(struct cache_entry, ce_stat_data));
 
/* Restore the hash state */
-   dst-ce_flags = (dst-ce_flags  ~CE_STATE_MASK) | state;
+   dst-ce_flags = (dst-ce_flags  ~CE_HASHED) | state;
 }
 
 static inline unsigned create_ce_flags(unsigned stage)
diff --git a/name-hash.c b/name-hash.c
index 488eccf..9a3bd3f 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -106,17 +106,29 @@ static void hash_index_entry(struct index_state *istate, 
struct cache_entry *ce)
hashmap_entry_init(ce, memihash(ce-name, ce_namelen(ce)));
hashmap_add(istate-name_hash, ce);
 
-   if (ignore_case  !(ce-ce_flags  CE_UNHASHED))
+   if (ignore_case)
add_dir_entry(istate, ce);
 }
 
+static int cache_entry_cmp(const struct cache_entry *ce1,
+   const struct cache_entry *ce2, const void *remove)
+{
+   /*
+* For remove_name_hash, find the exact entry (pointer equality); for
+* index_name_exists, find all entries with matching hash code and
+* decide whether the entry matches in same_name.
+*/
+   return remove ? !(ce1 == ce2) : 0;
+}
+
 static void lazy_init_name_hash(struct index_state *istate)
 {
int nr;
 
if (istate-name_hash_initialized)
return;
-   hashmap_init(istate-name_hash, NULL, istate-cache_nr);
+   hashmap_init(istate-name_hash, (hashmap_cmp_fn) cache_entry_cmp,
+   istate-cache_nr);
hashmap_init(istate-dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0);
for (nr = 0; nr  istate-cache_nr; nr++)
hash_index_entry(istate, istate-cache[nr]);
@@ -125,31 +137,19 @@ static void lazy_init_name_hash(struct index_state 
*istate)
 
 void add_name_hash(struct index_state *istate, struct cache_entry *ce)
 {
-   /* if already hashed, add reference to directory entries */
-   if (ignore_case  (ce-ce_flags  CE_STATE_MASK) == CE_STATE_MASK)
-   add_dir_entry(istate, ce);
-
-   ce-ce_flags = ~CE_UNHASHED;
if (istate-name_hash_initialized)
hash_index_entry(istate, ce);
 }
 
-/*
- * We don't actually *remove* it, we can just mark it invalid so that
- * we won't find it in lookups.
- *
- * Not only would we have to search the lists (simple enough), but
- * we'd also have to rehash other hash buckets in case this makes the
- * hash bucket empty (common). So it's much better to just mark
- * it.
- */
 void remove_name_hash(struct index_state *istate, struct cache_entry *ce)
 {
-   /* if already hashed, release reference to directory entries */
-   if (ignore_case  (ce-ce_flags  CE_STATE_MASK) == CE_HASHED)
-   remove_dir_entry(istate, ce);
+   if (!istate-name_hash_initialized || !(ce-ce_flags  CE_HASHED))
+   return;
+   ce-ce_flags = ~CE_HASHED;
+   hashmap_remove(istate-name_hash, ce, ce);
 
-   ce-ce_flags |= CE_UNHASHED;
+   if (ignore_case)
+   remove_dir_entry(istate, ce);
 }
 
 static int slow_same_name(const char *name1, int len1, const char *name2, int 
len2)
@@ -220,10 +220,8 @@ struct cache_entry *index_file_exists(struct index_state 
*istate, const char *na
hashmap_entry_init(key, memihash(name, namelen));
ce = hashmap_get(istate-name_hash, key, NULL);
while (ce) {
-   if (!(ce-ce_flags  CE_UNHASHED)) {
-   if (same_name(ce, name, namelen, 

[PATCH v5 11/14] remove old hash.[ch] implementation

2013-11-14 Thread Karsten Blees
Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 Documentation/technical/api-hash.txt |  52 -
 Makefile |   2 -
 cache.h  |   1 -
 hash.c   | 110 ---
 hash.h   |  50 
 test-hashmap.c   |  84 --
 6 files changed, 299 deletions(-)
 delete mode 100644 Documentation/technical/api-hash.txt
 delete mode 100644 hash.c
 delete mode 100644 hash.h

diff --git a/Documentation/technical/api-hash.txt 
b/Documentation/technical/api-hash.txt
deleted file mode 100644
index e5061e0..000
--- a/Documentation/technical/api-hash.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-hash API
-
-
-The hash API is a collection of simple hash table functions. Users are expected
-to implement their own hashing.
-
-Data Structures

-
-`struct hash_table`::
-
-   The hash table structure. The `array` member points to the hash table
-   entries. The `size` member counts the total number of valid and invalid
-   entries in the table. The `nr` member keeps track of the number of
-   valid entries.
-
-`struct hash_table_entry`::
-
-   An opaque structure representing an entry in the hash table. The `hash`
-   member is the entry's hash key and the `ptr` member is the entry's
-   value.
-
-Functions
--
-
-`init_hash`::
-
-   Initialize the hash table.
-
-`free_hash`::
-
-   Release memory associated with the hash table.
-
-`insert_hash`::
-
-   Insert a pointer into the hash table. If an entry with that hash
-   already exists, a pointer to the existing entry's value is returned.
-   Otherwise NULL is returned.  This allows callers to implement
-   chaining, etc.
-
-`lookup_hash`::
-
-   Lookup an entry in the hash table. If an entry with that hash exists
-   the entry's value is returned. Otherwise NULL is returned.
-
-`for_each_hash`::
-
-   Call a function for each entry in the hash table. The function is
-   expected to take the entry's value as its only argument and return an
-   int. If the function returns a negative int the loop is aborted
-   immediately.  Otherwise, the return value is accumulated and the sum
-   returned upon completion of the loop.
diff --git a/Makefile b/Makefile
index 75bb1a6..d566768 100644
--- a/Makefile
+++ b/Makefile
@@ -673,7 +673,6 @@ LIB_H += git-compat-util.h
 LIB_H += gpg-interface.h
 LIB_H += graph.h
 LIB_H += grep.h
-LIB_H += hash.h
 LIB_H += hashmap.h
 LIB_H += help.h
 LIB_H += http.h
@@ -805,7 +804,6 @@ LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
 LIB_OBJS += grep.o
-LIB_OBJS += hash.o
 LIB_OBJS += hashmap.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
diff --git a/cache.h b/cache.h
index 05b8011..27067b8 100644
--- a/cache.h
+++ b/cache.h
@@ -3,7 +3,6 @@
 
 #include git-compat-util.h
 #include strbuf.h
-#include hash.h
 #include hashmap.h
 #include advice.h
 #include gettext.h
diff --git a/hash.c b/hash.c
deleted file mode 100644
index 749ecfe..000
--- a/hash.c
+++ /dev/null
@@ -1,110 +0,0 @@
-/*
- * Some generic hashing helpers.
- */
-#include cache.h
-#include hash.h
-
-/*
- * Look up a hash entry in the hash table. Return the pointer to
- * the existing entry, or the empty slot if none existed. The caller
- * can then look at the (*ptr) to see whether it existed or not.
- */
-static struct hash_table_entry *lookup_hash_entry(unsigned int hash, const 
struct hash_table *table)
-{
-   unsigned int size = table-size, nr = hash % size;
-   struct hash_table_entry *array = table-array;
-
-   while (array[nr].ptr) {
-   if (array[nr].hash == hash)
-   break;
-   nr++;
-   if (nr = size)
-   nr = 0;
-   }
-   return array + nr;
-}
-
-
-/*
- * Insert a new hash entry pointer into the table.
- *
- * If that hash entry already existed, return the pointer to
- * the existing entry (and the caller can create a list of the
- * pointers or do anything else). If it didn't exist, return
- * NULL (and the caller knows the pointer has been inserted).
- */
-static void **insert_hash_entry(unsigned int hash, void *ptr, struct 
hash_table *table)
-{
-   struct hash_table_entry *entry = lookup_hash_entry(hash, table);
-
-   if (!entry-ptr) {
-   entry-ptr = ptr;
-   entry-hash = hash;
-   table-nr++;
-   return NULL;
-   }
-   return entry-ptr;
-}
-
-static void grow_hash_table(struct hash_table *table)
-{
-   unsigned int i;
-   unsigned int old_size = table-size, new_size;
-   struct hash_table_entry *old_array = table-array, *new_array;
-
-   new_size = alloc_nr(old_size);
-   new_array = xcalloc(sizeof(struct hash_table_entry), new_size);
-   

[PATCH v5 12/14] fix 'git update-index --verbose --again' output

2013-11-14 Thread Karsten Blees
'git update-index --verbose' consistently reports paths relative to the
work-tree root. The only exception is the '--again' option, which reports
paths relative to the current working directory.

Change do_reupdate to use non-prefixed paths.

Signed-off-by: Karsten Blees bl...@dcon.de
---
 builtin/update-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index e3a10d7..d180d80 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -579,7 +579,7 @@ static int do_reupdate(int ac, const char **av,
 * or worse yet 'allow_replace', active_nr may decrease.
 */
save_nr = active_nr;
-   update_one(ce-name + prefix_length, prefix, prefix_length);
+   update_one(ce-name, NULL, 0);
if (save_nr != active_nr)
goto redo;
}
-- 
1.8.5.rc0.333.g5394214

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 13/14] builtin/update-index.c: cleanup update_one

2013-11-14 Thread Karsten Blees
do_reupdate calls update_one with a cache_entry.name, there's no need for
the extra sanitation / normalization that happens in prefix_path.
cmd_update_index calls update_one with an already prefixed path, no need to
prefix_path twice.

Remove the extra prefix_path from update_one. Also remove the now unused
'prefix' and 'prefix_length' parameters.

As of d089eba setup: sanitize absolute and funny paths in get_pathspec(),
prefix_path uncoditionally returns a copy, even if the passed in path isn't
changed. Lets unconditionally free() the result.

Signed-off-by: Karsten Blees bl...@dcon.de
---
 builtin/update-index.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index d180d80..c8f0d5f 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -274,36 +274,32 @@ static void chmod_path(int flip, const char *path)
die(git update-index: cannot chmod %cx '%s', flip, path);
 }
 
-static void update_one(const char *path, const char *prefix, int prefix_length)
+static void update_one(const char *path)
 {
-   const char *p = prefix_path(prefix, prefix_length, path);
-   if (!verify_path(p)) {
+   if (!verify_path(path)) {
fprintf(stderr, Ignoring path %s\n, path);
-   goto free_return;
+   return;
}
if (mark_valid_only) {
-   if (mark_ce_flags(p, CE_VALID, mark_valid_only == MARK_FLAG))
+   if (mark_ce_flags(path, CE_VALID, mark_valid_only == MARK_FLAG))
die(Unable to mark file %s, path);
-   goto free_return;
+   return;
}
if (mark_skip_worktree_only) {
-   if (mark_ce_flags(p, CE_SKIP_WORKTREE, mark_skip_worktree_only 
== MARK_FLAG))
+   if (mark_ce_flags(path, CE_SKIP_WORKTREE, 
mark_skip_worktree_only == MARK_FLAG))
die(Unable to mark file %s, path);
-   goto free_return;
+   return;
}
 
if (force_remove) {
-   if (remove_file_from_cache(p))
+   if (remove_file_from_cache(path))
die(git update-index: unable to remove %s, path);
report(remove '%s', path);
-   goto free_return;
+   return;
}
-   if (process_path(p))
+   if (process_path(path))
die(Unable to process path %s, path);
report(add '%s', path);
- free_return:
-   if (p  path || p  path + strlen(path))
-   free((char *)p);
 }
 
 static void read_index_info(int line_termination)
@@ -579,7 +575,7 @@ static int do_reupdate(int ac, const char **av,
 * or worse yet 'allow_replace', active_nr may decrease.
 */
save_nr = active_nr;
-   update_one(ce-name, NULL, 0);
+   update_one(ce-name);
if (save_nr != active_nr)
goto redo;
}
@@ -836,11 +832,10 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
 
setup_work_tree();
p = prefix_path(prefix, prefix_length, path);
-   update_one(p, NULL, 0);
+   update_one(p);
if (set_executable_bit)
chmod_path(set_executable_bit, p);
-   if (p  path || p  path + strlen(path))
-   free((char *)p);
+   free((char *)p);
ctx.argc--;
ctx.argv++;
break;
@@ -879,11 +874,10 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
strbuf_swap(buf, nbuf);
}
p = prefix_path(prefix, prefix_length, buf.buf);
-   update_one(p, NULL, 0);
+   update_one(p);
if (set_executable_bit)
chmod_path(set_executable_bit, p);
-   if (p  buf.buf || p  buf.buf + buf.len)
-   free((char *)p);
+   free((char *)p);
}
strbuf_release(nbuf);
strbuf_release(buf);
-- 
1.8.5.rc0.333.g5394214

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 14/14] read-cache.c: fix memory leaks caused by removed cache entries

2013-11-14 Thread Karsten Blees
When cache_entry structs are removed from index_state.cache, they are not
properly freed. Freeing those entries wasn't possible before because we
couldn't remove them from index_state.name_hash.

Now that we _do_ remove the entries from name_hash, we can also free them.
Add 'free(cache_entry)' to all call sites of name-hash.c::remove_name_hash
in read-cache.c (we could free() directly in remove_name_hash(), but
name-hash.c isn't concerned with cache_entry allocation at all).

Accessing a cache_entry after removing it from the index is now no longer
allowed, as the memory has been freed. The following functions need minor
fixes (typically by copying ce-name before use):
 - builtin/rm.c::cmd_rm
 - builtin/update-index.c::do_reupdate
 - read-cache.c::read_index_unmerged
 - resolve-undo.c::unmerge_index_entry_at

Signed-off-by: Karsten Blees bl...@dcon.de
---
 builtin/rm.c   | 2 +-
 builtin/update-index.c | 5 -
 read-cache.c   | 8 ++--
 resolve-undo.c | 7 +--
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/builtin/rm.c b/builtin/rm.c
index 3a0e0ea..171f37c 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -311,7 +311,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
if (!match_pathspec_depth(pathspec, ce-name, ce_namelen(ce), 
0, seen))
continue;
ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
-   list.entry[list.nr].name = ce-name;
+   list.entry[list.nr].name = xstrdup(ce-name);
list.entry[list.nr].is_submodule = S_ISGITLINK(ce-ce_mode);
if (list.entry[list.nr++].is_submodule 
!is_staging_gitmodules_ok())
diff --git a/builtin/update-index.c b/builtin/update-index.c
index c8f0d5f..00313f3 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -559,6 +559,7 @@ static int do_reupdate(int ac, const char **av,
const struct cache_entry *ce = active_cache[pos];
struct cache_entry *old = NULL;
int save_nr;
+   char *path;
 
if (ce_stage(ce) || !ce_path_match(ce, pathspec))
continue;
@@ -575,7 +576,9 @@ static int do_reupdate(int ac, const char **av,
 * or worse yet 'allow_replace', active_nr may decrease.
 */
save_nr = active_nr;
-   update_one(ce-name);
+   path = xstrdup(ce-name);
+   update_one(path);
+   free(path);
if (save_nr != active_nr)
goto redo;
}
diff --git a/read-cache.c b/read-cache.c
index 00af9ad..3f735f3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -47,6 +47,7 @@ static void replace_index_entry(struct index_state *istate, 
int nr, struct cache
struct cache_entry *old = istate-cache[nr];
 
remove_name_hash(istate, old);
+   free(old);
set_index_entry(istate, nr, ce);
istate-cache_changed = 1;
 }
@@ -478,6 +479,7 @@ int remove_index_entry_at(struct index_state *istate, int 
pos)
 
record_resolve_undo(istate, ce);
remove_name_hash(istate, ce);
+   free(ce);
istate-cache_changed = 1;
istate-cache_nr--;
if (pos = istate-cache_nr)
@@ -499,8 +501,10 @@ void remove_marked_cache_entries(struct index_state 
*istate)
unsigned int i, j;
 
for (i = j = 0; i  istate-cache_nr; i++) {
-   if (ce_array[i]-ce_flags  CE_REMOVE)
+   if (ce_array[i]-ce_flags  CE_REMOVE) {
remove_name_hash(istate, ce_array[i]);
+   free(ce_array[i]);
+   }
else
ce_array[j++] = ce_array[i];
}
@@ -1894,7 +1898,7 @@ int read_index_unmerged(struct index_state *istate)
new_ce-ce_mode = ce-ce_mode;
if (add_index_entry(istate, new_ce, 0))
return error(%s: cannot drop to stage #0,
-ce-name);
+new_ce-name);
i = index_name_pos(istate, new_ce-name, len);
}
return unmerged;
diff --git a/resolve-undo.c b/resolve-undo.c
index c09b006..49ebaaf 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -119,6 +119,7 @@ int unmerge_index_entry_at(struct index_state *istate, int 
pos)
struct string_list_item *item;
struct resolve_undo_info *ru;
int i, err = 0, matched;
+   char *name;
 
if (!istate-resolve_undo)
return pos;
@@ -138,20 +139,22 @@ int unmerge_index_entry_at(struct index_state *istate, 
int pos)
if (!ru)
return pos;
matched = ce-ce_flags  CE_MATCHED;
+   name = xstrdup(ce-name);
remove_index_entry_at(istate, pos);
for (i = 0; i  3; i++) {
struct cache_entry *nce;
if (!ru-mode[i])

[PATCH v5 02/14] add a hashtable implementation that supports O(1) removal

2013-11-14 Thread Karsten Blees
The existing hashtable implementation (in hash.[ch]) uses open addressing
(i.e. resolve hash collisions by distributing entries across the table).
Thus, removal is difficult to implement with less than O(n) complexity.
Resolving collisions of entries with identical hashes (e.g. via chaining)
is left to the client code.

Add a hashtable implementation that supports O(1) removal and is slightly
easier to use due to builtin entry chaining.

Supports all basic operations init, free, get, add, remove and iteration.

Also includes ready-to-use hash functions based on the public domain FNV-1
algorithm (http://www.isthe.com/chongo/tech/comp/fnv).

The per-entry data structure (hashmap_entry) is piggybacked in front of
the client's data structure to save memory. See test-hashmap.c for usage
examples.

The hashtable is resized by a factor of four when 80% full. With these
settings, average memory consumption is about 2/3 of hash.[ch], and
insertion is about twice as fast due to less frequent resizing.

Lookups are also slightly faster, because entries are strictly confined to
their bucket (i.e. no data of other buckets needs to be traversed).

Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 Documentation/technical/api-hashmap.txt | 235 ++
 Makefile|   3 +
 hashmap.c   | 228 +
 hashmap.h   |  71 +++
 t/t0011-hashmap.sh  | 240 ++
 test-hashmap.c  | 340 
 6 files changed, 1117 insertions(+)
 create mode 100644 Documentation/technical/api-hashmap.txt
 create mode 100644 hashmap.c
 create mode 100644 hashmap.h
 create mode 100755 t/t0011-hashmap.sh
 create mode 100644 test-hashmap.c

diff --git a/Documentation/technical/api-hashmap.txt 
b/Documentation/technical/api-hashmap.txt
new file mode 100644
index 000..b2280f1
--- /dev/null
+++ b/Documentation/technical/api-hashmap.txt
@@ -0,0 +1,235 @@
+hashmap API
+===
+
+The hashmap API is a generic implementation of hash-based key-value mappings.
+
+Data Structures
+---
+
+`struct hashmap`::
+
+   The hash table structure.
++
+The `size` member keeps track of the total number of entries. The `cmpfn`
+member is a function used to compare two entries for equality. The `table` and
+`tablesize` members store the hash table and its size, respectively.
+
+`struct hashmap_entry`::
+
+   An opaque structure representing an entry in the hash table, which must
+   be used as first member of user data structures. Ideally it should be
+   followed by an int-sized member to prevent unused memory on 64-bit
+   systems due to alignment.
++
+The `hash` member is the entry's hash code and the `next` member points to the
+next entry in case of collisions (i.e. if multiple entries map to the same
+bucket).
+
+`struct hashmap_iter`::
+
+   An iterator structure, to be used with hashmap_iter_* functions.
+
+Types
+-
+
+`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void 
*keydata)`::
+
+   User-supplied function to test two hashmap entries for equality. Shall
+   return 0 if the entries are equal.
++
+This function is always called with non-NULL `entry` / `entry_or_key`
+parameters that have the same hash code. When looking up an entry, the `key`
+and `keydata` parameters to hashmap_get and hashmap_remove are always passed
+as second and third argument, respectively. Otherwise, `keydata` is NULL.
+
+Functions
+-
+
+`unsigned int strhash(const char *buf)`::
+`unsigned int strihash(const char *buf)`::
+`unsigned int memhash(const void *buf, size_t len)`::
+`unsigned int memihash(const void *buf, size_t len)`::
+
+   Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+   http://www.isthe.com/chongo/tech/comp/fnv).
++
+`strhash` and `strihash` take 0-terminated strings, while `memhash` and
+`memihash` operate on arbitrary-length memory.
++
+`strihash` and `memihash` are case insensitive versions.
+
+`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t 
initial_size)`::
+
+   Initializes a hashmap structure.
++
+`map` is the hashmap to initialize.
++
+The `equals_function` can be specified to compare two entries for equality.
+If NULL, entries are considered equal if their hash codes are equal.
++
+If the total number of entries is known in advance, the `initial_size`
+parameter may be used to preallocate a sufficiently large table and thus
+prevent expensive resizing. If 0, the table is dynamically resized.
+
+`void hashmap_free(struct hashmap *map, int free_entries)`::
+
+   Frees a hashmap structure and allocated memory.
++
+`map` is the hashmap to free.
++
+If `free_entries` is true, each hashmap_entry in the map is freed as well
+(using stdlib's free()).
+
+`void 

Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)

2013-11-14 Thread Karsten Blees
Am 29.10.2013 10:09, schrieb Karsten Blees:
 On Mon, Oct 28, 2013 at 10:04 PM, Vicent Martí tan...@gmail.com wrote:

 On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees karsten.bl...@gmail.com 
 wrote:

 Regarding performance, khash uses open addressing, which requires more key 
 comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). 
 However, any measurable differences will most likely be dwarfed by IO costs 
 in this particular use case.

 I don't think this is true. If you actually run a couple insertion and
 lookup benchmarks comparing the two implementations, you'll find khash
 to be around ~30% faster for most workloads (venturing a guess from
 past experience). I am obviously not the author of khash, but I've
...

Just out of curiosity, I added performance test code for khash to the test in 
my current hashmap patch series [1]. It turns out that khash is by far the 
slowest of the bunch, especially with many collisions.

Again, I don't think that performance matters all that much (or in other words: 
_any_ hash table implementation will probably be fast enough compared to the 
rest that's going on). Its more a question of whether we really need two 
different hash table implementations (and a queasy feeling about the macro 
kludge in khash.h...).


Khash doesn't store the hash codes along with the entries (as both hash.[ch] 
and hashmap.[ch] do), so it needs to re-calculate hash codes on every resize. 
For a fair comparison, the khash test uses keys with pre-calculated hash 
codes in the key structure. This should be similar to a hash function that just 
copies 4 bytes from a sha1 key. Khash maps with more complex hash functions 
will be slower (see khstr).

The khstr test uses khash's predefined string map and khash's X31 hash 
function for strings (therefore no separate values for different hash functions 
here).

The table is similar to what I posted for hashmap-v2 [2] (i.e. real time in 
seconds for 1,000 rounds á 100,000 entries). I just turned it around a bit to 
make room for khash columns.

test | hash_fn | hashmap |  hash   |  khash  | khstr  |
-+-+-+-+-++
 | FNV |   2.429 |  14.366 |  11.780 | 18.677 |
 | FNV  x2 |   2.946 |  14.558 |  10.922 ||
 add | i   |   1.708 |   7.419 |   4.132 ||
 | ix2 |   1.791 |   8.565 |   4.502 ||
 | i/10|   1.555 |   1.805 | 344.691 ||
 | i/10 x2 |   1.543 |   1.808 | 319.559 ||
-+-+-+-+-++
 | FNV |   1.822 |   3.452 |   4.922 |  8.309 |
get  | FNV  x2 |   2.298 |   3.194 |   4.473 ||
100% | i   |   1.252 |   1.344 |   0.944 ||
hits | ix2 |   1.286 |   1.434 |   1.220 ||
 | i/10|   6.720 |   5.138 | 281.815 ||
 | i/10 x2 |   6.297 |   5.188 | 257.021 ||
-+-+-+-+-++
 | FNV |   1.023 |   3.949 |   4.115 |  4.878 |
get  | FNV  x2 |   1.538 |   3.915 |   4.571 ||
10%  | i   |   0.654 | 397.457 |  38.125 ||
hits | ix2 |   0.718 |   0.722 |   9.111 ||
 | i/10|   1.128 |  30.235 |  60.376 ||
 | i/10 x2 |   1.260 |   1.082 |  43.354 ||
-+-+-+-+-++

[1] https://github.com/kblees/git/commits/kb/hashmap-v5-khash
[2] http://article.gmane.org/gmane.comp.version-control.git/235290


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] config: arbitrary number of matches for --unset and --replace-all

2013-11-14 Thread Thomas Rast
Jeff King p...@peff.net writes:

 This code is weird to follow because of the fall-throughs. I do not
 think you have introduced any bugs with your patch, but it seems weird
 to me that we set the offset at the top of the hunk. If we hit the
 conditional in the bottom half, we do actually increment storer.seen,
 but only _after_ having overwritten the value from above (with the same
 value, no less).

 But if we do not follow that code path, we may end up here:

 @@ -1272,6 +1275,9 @@ static int store_aux(const char *key, const char 
 *value, void *cb)
  if (strrchr(key, '.') - key == store.baselen 
!strncmp(key, store.key, store.baselen)) {
  store.state = SECTION_SEEN;
 +ALLOC_GROW(store.offset,
 +   store.seen+1,
 +   store.offset_alloc);
  store.offset[store.seen] = 
 cf-do_ftell(cf);
  }
  }

 where we overwrite it again, but do not update store.seen. Or we may
 trigger neither, and leave the function with our offset stored, but
 store.seen not incremented.

It's doubly strange that we write in this hunk without any protection
against overflow.  I was too lazy to think about it long enough to come
up with a possible example that triggers this, and instead just put in
the defensive ALLOC_GROW().  But if you can trigger it, it will probably
cause the algorithm to go off the rails because it overwrote store.state
and possibly even store.seen.

-- 
Thomas Rast
t...@thomasrast.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/21] pack bitmaps

2013-11-14 Thread Jeff King
On Thu, Nov 14, 2013 at 07:19:38PM +, Ramsay Jones wrote:

 Unfortunately, I didn't find time this weekend to finish the msvc build
 fixes. However, after a quick squint at these patches, I think you have
 almost done it for me! :-D
 
 I must have misunderstood the previous discussion, because my patch was
 written on the assumption that the ewah directory wouldn't be git-ified
 (e.g. #include git-compat-util.h).

I think it was up for debate at some point, but we did decide to go
ahead and git-ify. Please feel free to submit further fixups if you need
them.

- the ewah code used gcc's __builtin_ctzll, but did not provide a
  suitable fallback. We now provide a fallback in C.
 
 ... here.
 
 I was messing around with several implementations (including the use of
 msvc compiler intrinsics) with the intention of doing some timing tests
 etc. [I suspected my C fallback function (a different implementation to
 yours) would be slightly faster.]

Yeah, I looked around for several implementations, and ultimately wrote
one that was the most readable to me. The one I found shortest and most
inscrutable was:

  return popcount((x  -x) - 1);

the details of which I still haven't worked through in my head. ;)

I do think on most platforms that intrinsics or inline assembler are the
way to go. My main goal was to get something correct that would let it
compile everywhere, and then people can use that as a base for
optimizing. Patches welcome. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 64-bit support.

2013-11-14 Thread Jeff King
On Thu, Nov 14, 2013 at 07:11:31PM +0400, Konstantin Khomoutov wrote:

  I just guess, that this limit comes from the O(N^2) complexity of the
  comparison algorithm. Since the max 32-bit signed value is 2^31, then
  the 2^15 = 32768 is somehow correlated with its square root, maybe,
  like 2^(32/2 - 1) - to prevent overflow.
  I'm trying to prepare the patch right now, that changes the `int
  rename_limit` = `long rename_limit` and all intermediate variable
  types. Is it a correct way to do?

Yes, the 32768 limit is there to prevent 32-bit overflow of the
squared value. We can use a uint64_t and bump the overflow limit to 2^32
if we are actually bumping up against this.

 I beleive rename_limit comes from reading the diff.renameLimit
 configuration variable.  The gitconfig(1) manual page hints to look at
 the -l command-line option of `git diff` which is described this way:

Yes, though there are also builtin defaults, depending on the operation
(400 for diffs, 1000 for merges).

 Looks like you're on the right track but the patch appears to require a
 more wide impact:
 * 32767 should be a default limit, applied in the case the user did not
   specify neither diff.renameLimit nor -l.

That is probably too high for a default limit. The point is that we are
making an O(n^2) matrix, and consuming CPU and memory in line with that.
The defaults were determined empirically on reasonable times (and are
now based on old hardware, so it may be time to bump them up).

The idea was that 32767 was unreasonably high, and you would not want to
spend that much CPU time anyway. On my shiny new machine, doing a 2048^2
matrix takes 5s. A 32767^2 matrix would be 21 minutes.

And it was also expected that you would simply not have that big a
matrix anyway (you might 32767 candidates on one side, but you will not
have a square bigger than rename_limit^2, which is what this is
checking).

 * If whatever value read from those sources is less than 0, an error
   should be thrown--it looks strange to just revert it to the default
   value in this case.

It's not the default, though. The default is something like 400. So
saying -1 (or 0) is a way of saying turn off the limit without
having to guess at an arbitrarily high number.

 * If the user-supplied value is = 0, then just use it, assume the user
   knows what they are doing.

At some point we have to deal with integer overflow. It was assumed that
a 32-bit max was big enough and that nobody would have a real world
case that hit that. I'd be curious to hear if that has actually been
hit, or if there is simply confusion over how the rename-limit config
works (the limit is the square root of the max matrix size).

Here's what the 64-bit patch could look like (I notice that we are not
using unsigned 32-bit integers, either, and we probably should be):

---
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..8356a62 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -427,11 +427,16 @@ static void record_if_better(struct diff_score m[], 
struct diff_score *o)
  * 1 if we need to disable inexact rename detection;
  * 2 if we would be under the limit if we were given -C instead of -C -C.
  */
-static int too_many_rename_candidates(int num_create,
+static int too_many_rename_candidates(uint64_t num_create,
  struct diff_options *options)
 {
-   int rename_limit = options-rename_limit;
-   int num_src = rename_src_nr;
+   /*
+* signed to unsigned conversion here is intentional, and
+* converts limits  0 into large (effectively infinite)
+* limits
+*/
+   uint64_t rename_limit = options-rename_limit;
+   uint64_t num_src = rename_src_nr;
int i;
 
options-needed_rename_limit = 0;
@@ -442,11 +447,10 @@ static int too_many_rename_candidates(int num_create,
 *
 *num_create * num_src  rename_limit * rename_limit
 *
-* but handles the potential overflow case specially (and we
-* assume at least 32-bit integers)
+* but handles the potential overflow case specially.
 */
-   if (rename_limit = 0 || rename_limit  32767)
-   rename_limit = 32767;
+   if (!rename_limit || rename_limit  4294967295)
+   rename_limit = 4294967295;
if ((num_create = rename_limit || num_src = rename_limit) 
(num_create * num_src = rename_limit * rename_limit))
return 0;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] State correct usage of literal examples in man pages in the coding standards

2013-11-14 Thread Jason St. John
The man pages contain inconsistent usage of backticks vs. single quotes
around options, commands, etc. that are in paragraphs. This commit states
that backticks should always be used around literal examples.

This commit states that -- and friends should not be escaped
(e.g. use `--pretty=oneline` instead of `\--pretty=oneline`).

This commit also states correct usage for typesetting command usage
examples with inline substitutions.

Thanks-to: Ramkumar Ramachandra artag...@gmail.com
Thanks-to: Stuart Rackham srack...@gmail.com
Thanks-to: Junio C Hamano gits...@pobox.com
Signed-off-by: Jason St. John jstj...@purdue.edu
---
This is a resubmit of a previous patch:
http://marc.info/?l=gitm=138431653412495w=2

I added a Thanks-to for Stuart Rackham because he's listed as the author of the
AsciiDoc User Guide, which part of this commit relies heavily on.


 Documentation/CodingGuidelines | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index a600e35..ef67b53 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -260,9 +260,11 @@ Writing Documentation:
 
  Every user-visible change should be reflected in the documentation.
  The same general rule as for code applies -- imitate the existing
- conventions.  A few commented examples follow to provide reference
- when writing or modifying command usage strings and synopsis sections
- in the manual pages:
+ conventions.
+
+ A few commented examples follow to provide reference when writing or
+ modifying command usage strings and synopsis sections in the manual
+ pages:
 
  Placeholders are spelled in lowercase and enclosed in angle brackets:
file
@@ -312,3 +314,29 @@ Writing Documentation:
Use 'git' (all lowercase) when talking about commands i.e. something
the user would type into a shell and use 'Git' (uppercase first letter)
when talking about the version control system and its properties.
+
+ A few commented examples follow to provide reference when writing or
+ modifying paragraphs or option/command explanations that contain options
+ or commands:
+
+ Literal examples (e.g. use of command-line options, command names, and
+ configuration variables) are typeset in monospace, and if you can use
+ `backticks around word phrases`, do so.
+   `--pretty=oneline`
+   `git rev-list`
+   `remote.pushdefault`
+
+ Word phrases enclosed in `backtick characters` are rendered literally
+ and will not be further expanded. The use of `backticks` to achieve the
+ previous rule means that literal examples should not use AsciiDoc
+ escapes.
+   Correct:
+  `--pretty=oneline`
+   Incorrect:
+  `\--pretty=oneline`
+
+ If some place in the documentation needs to typeset a command usage
+ example with inline substitutions, it is fine to use +monospaced and
+ inline substituted text+ instead of `monospaced literal text`, and with
+ the former, the part that should not get substituted must be
+ quoted/escaped.
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/21] pack bitmaps

2013-11-14 Thread Ramsay Jones
On 14/11/13 21:33, Jeff King wrote:
 On Thu, Nov 14, 2013 at 07:19:38PM +, Ramsay Jones wrote:
 
 Unfortunately, I didn't find time this weekend to finish the msvc build
 fixes. However, after a quick squint at these patches, I think you have
 almost done it for me! :-D

 I must have misunderstood the previous discussion, because my patch was
 written on the assumption that the ewah directory wouldn't be git-ified
 (e.g. #include git-compat-util.h).
 
 I think it was up for debate at some point, but we did decide to go
 ahead and git-ify. Please feel free to submit further fixups if you need
 them.

Yep, will do; at present it looks like that one-liner.

 
   - the ewah code used gcc's __builtin_ctzll, but did not provide a
 suitable fallback. We now provide a fallback in C.

 ... here.

 I was messing around with several implementations (including the use of
 msvc compiler intrinsics) with the intention of doing some timing tests
 etc. [I suspected my C fallback function (a different implementation to
 yours) would be slightly faster.]
 
 Yeah, I looked around for several implementations, and ultimately wrote
 one that was the most readable to me. The one I found shortest and most
 inscrutable was:
 
   return popcount((x  -x) - 1);
 
 the details of which I still haven't worked through in my head. ;)

Yeah, I stumbled over that one too!

 I do think on most platforms that intrinsics or inline assembler are the
 way to go. My main goal was to get something correct that would let it
 compile everywhere, and then people can use that as a base for
 optimizing. Patches welcome. :)

Indeed, I can happily leave that to another day (or to someone else
more motivated ;-)

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3 1/2] Fix single quotes, AsciiDoc escaping, and other formatting issues

2013-11-14 Thread Jason St. John
rev-list-options.txt:
-- Remove blank lines after some options subheadings to fix syntax
 highlighting in Vim
-- Typeset literal options, commands, and path names in monospace
-- Remove AsciiDoc escapes with literal
-- Replace some double quotes with proper AsciiDoc quotes (e.g. ``foo'')

Signed-off-by: Jason St. John jstj...@purdue.edu
---
This is a resubmit of a previous patch:
http://marc.info/?l=gitm=138431995913311w=2

I decided to remove the blank lines after option subheadings because the syntax
highlighting in Vim actually looks better with them removed. I'm not sure how 
this
affects line-rewrapping of the body text in Emacs.

I split the previous patch into two. This patch deals strictly with formatting
issues and backticks/literals. The second patch contains the grammatical and 
typo
fixes.


 Documentation/rev-list-options.txt | 226 +
 1 file changed, 78 insertions(+), 148 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index ec86d09..eb4b6bf 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -18,33 +18,27 @@ ordering and formatting options, such as `--reverse`.
 -number::
 -n number::
 --max-count=number::
-
Limit the number of commits to output.
 
 --skip=number::
-
Skip 'number' commits before starting to show the commit output.
 
 --since=date::
 --after=date::
-
Show commits more recent than a specific date.
 
 --until=date::
 --before=date::
-
Show commits older than a specific date.
 
 ifdef::git-rev-list[]
 --max-age=timestamp::
 --min-age=timestamp::
-
Limit the commits output to specified time range.
 endif::git-rev-list[]
 
 --author=pattern::
 --committer=pattern::
-
Limit the commits output to ones with author/committer
header lines that match the specified pattern (regular
expression).  With more than one `--author=pattern`,
@@ -52,7 +46,6 @@ endif::git-rev-list[]
chosen (similarly for multiple `--committer=pattern`).
 
 --grep-reflog=pattern::
-
Limit the commits output to ones with reflog entries that
match the specified pattern (regular expression). With
more than one `--grep-reflog`, commits whose reflog message
@@ -60,7 +53,6 @@ endif::git-rev-list[]
error to use this option unless `--walk-reflogs` is in use.
 
 --grep=pattern::
-
Limit the commits output to ones with log message that
matches the specified pattern (regular expression).  With
more than one `--grep=pattern`, commits whose message
@@ -71,46 +63,38 @@ When `--show-notes` is in effect, the message from the 
notes as
 if it is part of the log message.
 
 --all-match::
-   Limit the commits output to ones that match all given --grep,
+   Limit the commits output to ones that match all given `--grep`,
instead of ones that match at least one.
 
 -i::
 --regexp-ignore-case::
-
Match the regexp limiting patterns without regard to letters case.
 
 --basic-regexp::
-
Consider the limiting patterns to be basic regular expressions;
this is the default.
 
 -E::
 --extended-regexp::
-
Consider the limiting patterns to be extended regular expressions
instead of the default basic regular expressions.
 
 -F::
 --fixed-strings::
-
Consider the limiting patterns to be fixed strings (don't interpret
pattern as a regular expression).
 
 --perl-regexp::
-
Consider the limiting patterns to be Perl-compatible regexp.
Requires libpcre to be compiled in.
 
 --remove-empty::
-
Stop when a given path disappears from the tree.
 
 --merges::
-
Print only merge commits. This is exactly the same as `--min-parents=2`.
 
 --no-merges::
-
Do not print commits with more than one parent. This is
exactly the same as `--max-parents=1`.
 
@@ -118,7 +102,6 @@ if it is part of the log message.
 --max-parents=number::
 --no-min-parents::
 --no-max-parents::
-
Show only commits which have at least (or at most) that many parent
commits. In particular, `--max-parents=1` is the same as `--no-merges`,
`--min-parents=2` is the same as `--merges`.  `--max-parents=0`
@@ -138,31 +121,26 @@ parents) and `--max-parents=-1` (negative numbers denote 
no upper limit).
brought in to your history by such a merge.
 
 --not::
-
Reverses the meaning of the '{caret}' prefix (or lack thereof)
-   for all following revision specifiers, up to the next '--not'.
+   for all following revision specifiers, up to the next `--not`.
 
 --all::
-
Pretend as if all the refs in `refs/` are listed on the
command line as 'commit'.
 
 --branches[=pattern]::
-
Pretend as if all the refs in `refs/heads` are listed
on the command line as 'commit'. If 'pattern' is given, limit
branches to ones matching given shell glob. If pattern 

[PATCHv3 2/2] Documentation/rev-list-options.txt: fix some grammatical issues and typos

2013-11-14 Thread Jason St. John
Various fixes:
-- fix typos (e.g. show - shown)
-- use regular expression(s) instead of regexp where appropriate
-- reword some sentences for easier reading
-- fix/improve some grammatical issues (e.g. comma usage)
-- add missing articles (e.g. the)
-- change E-mail to email

Signed-off-by: Jason St. John jstj...@purdue.edu
---
I changed E-mail to email because that is clearly the dominant form:
http://www.google.com/trends/explore?q=#q=%22email%22%2C%20%22E-mail%22cmpt=q


 Documentation/rev-list-options.txt | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index eb4b6bf..2991d70 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -68,7 +68,8 @@ if it is part of the log message.
 
 -i::
 --regexp-ignore-case::
-   Match the regexp limiting patterns without regard to letters case.
+   Match the regular expression limiting patterns without regard to letter
+   case.
 
 --basic-regexp::
Consider the limiting patterns to be basic regular expressions;
@@ -85,7 +86,7 @@ if it is part of the log message.
pattern as a regular expression).
 
 --perl-regexp::
-   Consider the limiting patterns to be Perl-compatible regexp.
+   Consider the limiting patterns to be Perl-compatible regular 
expressions.
Requires libpcre to be compiled in.
 
 --remove-empty::
@@ -191,9 +192,9 @@ endif::git-rev-list[]
 For example, if you have two branches, `A` and `B`, a usual way
 to list all commits on only one side of them is with
 `--left-right` (see the example below in the description of
-the `--left-right` option).  It however shows the commits that were 
cherry-picked
-from the other branch (for example, ``3rd on b'' may be cherry-picked
-from branch A).  With this option, such pairs of commits are
+the `--left-right` option). However, it shows the commits that were
+cherry-picked from the other branch (for example, ``3rd on b'' may be
+cherry-picked from branch A). With this option, such pairs of commits are
 excluded from the output.
 
 --left-only::
@@ -447,7 +448,7 @@ The effect of this is best shown by way of comparing to
  `-'
 ---
 +
-Note the major differences in `N`, `P` and `Q` over `--full-history`:
+Note the major differences in `N`, `P`, and `Q` over `--full-history`:
 +
 --
 * `N`'s parent list had `I` removed, because it is an ancestor of the
@@ -467,7 +468,7 @@ Finally, there is a fifth simplification mode available:
Limit the displayed commits to those directly on the ancestry
chain between the ``from'' and ``to'' commits in the given commit
range. I.e. only display commits that are ancestor of the ``to''
-   commit, and descendants of the ``from'' commit.
+   commit and descendants of the ``from'' commit.
 +
 As an example use case, consider the following commit history:
 +
@@ -631,9 +632,9 @@ These options are mostly targeted for packing of Git 
repositories.
 --no-walk[=(sorted|unsorted)]::
Only show the given commits, but do not traverse their ancestors.
This has no effect if a range is specified. If the argument
-   `unsorted` is given, the commits are show in the order they were
+   `unsorted` is given, the commits are shown in the order they were
given on the command line. Otherwise (if `sorted` or no argument
-   was given), the commits are show in reverse chronological order
+   was given), the commits are shown in reverse chronological order
by commit time.
 
 --do-walk::
@@ -656,7 +657,7 @@ include::pretty-options.txt[]
 --date=(relative|local|default|iso|rfc|short|raw)::
Only takes effect for dates shown in human-readable format, such
as when using `--pretty`. `log.date` config variable sets a default
-   value for log command's `--date` option.
+   value for the log command's `--date` option.
 +
 `--date=relative` shows dates relative to the current time,
 e.g. ``2 hours ago''.
@@ -666,9 +667,9 @@ e.g. ``2 hours ago''.
 `--date=iso` (or `--date=iso8601`) shows timestamps in ISO 8601 format.
 +
 `--date=rfc` (or `--date=rfc2822`) shows timestamps in RFC 2822
-format, often found in E-mail messages.
+format, often found in email messages.
 +
-`--date=short` shows only date but not time, in `-MM-DD` format.
+`--date=short` shows only the date, but not the time, in `-MM-DD` format.
 +
 `--date=raw` shows the date in the internal raw Git format `%s %z` format.
 +
@@ -749,7 +750,7 @@ ifndef::git-rev-list[]
 Diff Formatting
 ~~~
 
-Below are listed options that control the formatting of diff output.
+Listed below are options that control the formatting of diff output.
 Some of them are specific to linkgit:git-rev-list[1], however other diff
 options may be given. See linkgit:git-diff-files[1] for 

Re: [PATCH 2/2] Fix minor grammatical and other formatting issues in the git log man page

2013-11-14 Thread Jason St. John
On Wed, Nov 13, 2013 at 4:56 PM, Junio C Hamano gits...@pobox.com wrote:
 Jason St. John jstj...@purdue.edu writes:

 Documentation/git-log.txt:
 -- replace single quotes around options/commands with backticks
 -- use single quotes around references to sections
 -- replaced some double quotes with proper AsciiDoc quotes (e.g.
  ``foo'')
 -- use backticks around files and file paths
 -- use title case when referring to section headings
 -- use backticks around option arguments/defaults

 Signed-off-by: Jason St. John jstj...@purdue.edu
 ---
 When working on this commit, I noticed a difference in how options and
 option descriptions are separated (e.g. with a blank line or not). At least
 with Vim's syntax highlighting, if there is a blank line between the option
 and its description, the text block is all colored the same; however, if
 there isn't a blank line, then the text block is not specially colored.

 Is there an existing convention for how this should be done?

 I do not think we have a written rule or convention (and I do not
 know if we want one).  While reading the text in the source form
 (and the point of choosing AsciiDoc was to be able to read the docs
 without formatting), I personally have a slight preference to
 immediately follow the body text to the label in the labelled list,
 and a blank line after the item, i.e.

 item label::
 This describes the item.

 next item label::
 This describes the next item.

 as it makes it clear that the body belongs to the heading that
 precedes it.

 But it does help to have a blank between the label and the beginning
 of the body when reflowing the body with fill-paragraph, i.e.

 item label::

 This describes the item.

 You say that it is also easier on Vim to have the blank line there,
 so perhaps we may want to aim for updating the documentation over
 time to consistently do so.  I dunno.
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

As I stated in my recent resubmit[1], I decided to remove the blank
lines after option subheadings because the syntax highlighting in Vim
actually looks better with the blank lines removed. As such, I would
prefer that we go with the option of removing these blank lines going
forward.

If we are in agreement on this, should I send in a patch for
CodingGuidelines to state this?

[1] http://marc.info/?l=gitm=138447927208462w=2
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] branch: fix --verbose output column alignment

2013-11-14 Thread Jiang Xin
2013/11/15 Torstein Hegge he...@resisty.net:
 Commit f2e0873 (branch: report invalid tracking branch as gone) removed
 an early return from fill_tracking_info() in the path taken when 'git
 branch -v' lists a branch in sync with its upstream. This resulted in an
 unconditionally added space in front of the subject line:

 $ git branch -v
 * master f5eb3da  commit pushed to upstream
   topic  f935eb6 unpublished topic

Thank you for catching this. Confirmed that the output of git branch -v
is not aligned well. This example may be more clear ;-)

$ git branch -v
 branch1f0ec0da [ahead 1, behind 2] divert from upstream
 branch2f5eb3da  commit pushed to upstream
 branch3f935eb6 unpublished topic


 Instead, only add the trailing space if a decoration have been added.

 To catch this kind of whitespace breakage in the tests, be a bit less
 smart when filtering the output through sed.

 Signed-off-by: Torstein Hegge he...@resisty.net
 ---
  builtin/branch.c |  8 +++-
  t/t6040-tracking-info.sh | 24 +---
  2 files changed, 20 insertions(+), 12 deletions(-)

 diff --git a/builtin/branch.c b/builtin/branch.c
 index 0bb0e93..636a16e 100644
 --- a/builtin/branch.c
 +++ b/builtin/branch.c
 @@ -424,6 +424,7 @@ static void fill_tracking_info(struct strbuf *stat, const 
 char *branch_name,
 struct branch *branch = branch_get(branch_name);
 struct strbuf fancy = STRBUF_INIT;
 int upstream_is_gone = 0;
 +   int added_decoration = 1;

 switch (stat_tracking_info(branch, ours, theirs)) {
 case 0:
 @@ -451,9 +452,13 @@ static void fill_tracking_info(struct strbuf *stat, 
 const char *branch_name,
 if (upstream_is_gone) {
 if (show_upstream_ref)
 strbuf_addf(stat, _([%s: gone]), fancy.buf);
 +   else
 +   added_decoration = 0;
 } else if (!ours  !theirs) {
 if (show_upstream_ref)
 strbuf_addf(stat, _([%s]), fancy.buf);
 +   else
 +   added_decoration = 0;
 } else if (!ours) {
 if (show_upstream_ref)
 strbuf_addf(stat, _([%s: behind %d]), fancy.buf, 
 theirs);
 @@ -474,7 +479,8 @@ static void fill_tracking_info(struct strbuf *stat, const 
 char *branch_name,
 ours, theirs);
 }
 strbuf_release(fancy);
 -   strbuf_addch(stat, ' ');
 +   if (added_decoration)
 +   strbuf_addch(stat, ' ');
 free(ref);
  }


How about add strbuf_addch(stat, ' '); to each condition directly,  like this:


if (upstream_is_gone) {
if (show_upstream_ref) {
strbuf_addf(stat, _([%s: gone]), fancy.buf);
strbuf_addch(stat, ' ');
}
} else if (!ours  !theirs) {
if (show_upstream_ref) {
strbuf_addf(stat, _([%s]), fancy.buf);
strbuf_addch(stat, ' ');
}
} else if (!ours) {
if (show_upstream_ref)
strbuf_addf(stat, _([%s: behind %d]),
fancy.buf, theirs);
else
strbuf_addf(stat, _([behind %d]), theirs);
strbuf_addch(stat, ' ');
} else if (!theirs) {
if (show_upstream_ref)
strbuf_addf(stat, _([%s: ahead %d]), fancy.buf, ours);
else
strbuf_addf(stat, _([ahead %d]), ours);
strbuf_addch(stat, ' ');
} else {
if (show_upstream_ref)
strbuf_addf(stat, _([%s: ahead %d, behind %d]),
fancy.buf, ours, theirs);
else
strbuf_addf(stat, _([ahead %d, behind %d]),
ours, theirs);
strbuf_addch(stat, ' ');
}
strbuf_release(fancy);
free(ref);


-- 
Jiang Xin
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Fix minor grammatical and other formatting issues in the git log man page

2013-11-14 Thread Jason St. John
On Thu, Nov 14, 2013 at 8:44 PM, Jason St. John jstj...@purdue.edu wrote:
 On Wed, Nov 13, 2013 at 4:56 PM, Junio C Hamano gits...@pobox.com wrote:
 Jason St. John jstj...@purdue.edu writes:

 Documentation/git-log.txt:
 -- replace single quotes around options/commands with backticks
 -- use single quotes around references to sections
 -- replaced some double quotes with proper AsciiDoc quotes (e.g.
  ``foo'')
 -- use backticks around files and file paths
 -- use title case when referring to section headings
 -- use backticks around option arguments/defaults

 Signed-off-by: Jason St. John jstj...@purdue.edu
 ---
 When working on this commit, I noticed a difference in how options and
 option descriptions are separated (e.g. with a blank line or not). At least
 with Vim's syntax highlighting, if there is a blank line between the option
 and its description, the text block is all colored the same; however, if
 there isn't a blank line, then the text block is not specially colored.

 Is there an existing convention for how this should be done?

 I do not think we have a written rule or convention (and I do not
 know if we want one).  While reading the text in the source form
 (and the point of choosing AsciiDoc was to be able to read the docs
 without formatting), I personally have a slight preference to
 immediately follow the body text to the label in the labelled list,
 and a blank line after the item, i.e.

 item label::
 This describes the item.

 next item label::
 This describes the next item.

 as it makes it clear that the body belongs to the heading that
 precedes it.

 But it does help to have a blank between the label and the beginning
 of the body when reflowing the body with fill-paragraph, i.e.

 item label::

 This describes the item.

 You say that it is also easier on Vim to have the blank line there,
 so perhaps we may want to aim for updating the documentation over
 time to consistently do so.  I dunno.
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 As I stated in my recent resubmit[1], I decided to remove the blank
 lines after option subheadings because the syntax highlighting in Vim
 actually looks better with the blank lines removed. As such, I would
 prefer that we go with the option of removing these blank lines going
 forward.

 If we are in agreement on this, should I send in a patch for
 CodingGuidelines to state this?

 [1] http://marc.info/?l=gitm=138447927208462w=2

I forgot to mention that if we do go with this, then I will need to
resubmit this patch.

Sorry for the extra email.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Nov 2013, #04; Wed, 13)

2013-11-14 Thread Jeff King
On Wed, Nov 13, 2013 at 03:07:54PM -0800, Junio C Hamano wrote:

 * nd/liteal-pathspecs (2013-10-28) 1 commit
   (merged to 'next' on 2013-11-01 at 1a91775)
  + pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses
 
  Will cook in 'next'.

I think we want this to be part of v1.8.5. It is a fix for a regression
that appeared in master post-1.8.4:

  $ git.v1.8.4 --literal-pathspecs blame Makefile | wc -l
  2596

  $ git.v1.8.5-rc2 --literal-pathspecs blame Makefile | wc -l
  fatal: Makefile: pathspec magic not supported by this command: 'literal'
  0

Sorry to mention it so late into the -rc cycle, but I just noticed that
the patch hadn't graduated.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Add a bugzilla website

2013-11-14 Thread ycollette . nospam
Hello,

I just wanted to ask a question: why is there no bugzilla website for git ?
It's better to put bugs into such a tool to follow there evolution than to 
manage bugs via a developpment mailing list ...

Best regards,

YC
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html