Re: Request: timeout option for remote operations, esp. git fetch
On Tue, Nov 12, 2013 at 10:33:49AM -0800, H. Peter Anvin wrote: Which means that your original wish may not be granted with SO_KEEPALIVE at all, no? I was wondering if you wanted a forced timeout based on alarm(2), something similar to what you added to git-daemon in 960deccb (git-daemon: timeout, eliminate double DWIM, 2005-10-19). Yes, something more like that on the client end. SO_KEEPALIVE is better than nothing, but not really good enough. Would it be enough to just use timeout(1), like: timeout 10m git fetch That will time the _whole_ fetch operation, which means a legitimately gigantic but fast fetch would still fail. Setting a shorter timeout only for periods of inactivity on the network socket would catch killed or very laggy connections. But it would not catch a server that feeds you data at a constant but ridiculously slow rate. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
On Thu, Nov 14, 2013 at 08:56:07AM +0100, Thomas Rast wrote: Whatever it was that happened to a hundred or more repos on the Jenkins project seems to be stirring up this debate in some circles. Making us so curious ... and then you just leave us hanging there ;-) Any pointers to this debate? I do not know about any particular debate in git circles, but I assume Sitaram is referring to this incident: https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ in which a Jenkins dev force-pushed and rewound history on 150 different repos. In this case the reflog made rollback easy, but if he had pushed a deletion, it would be harder. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
On Thu, Nov 14, 2013 at 05:48:50AM +0530, Sitaram Chamarty wrote: Is there *any* way we can preserve a reflog for a deleted branch, perhaps under logs/refs/deleted/timestamp/full/ref/name ? I had patches to do something like this here: http://thread.gmane.org/gmane.comp.version-control.git/201715/focus=201752 but there were definitely some buggy corners, as much of the code assumed you needed to have a ref to have a reflog. I don't even run with it locally anymore. At GitHub, we log each change to an audit log in addition to the regular reflog (we also stuff extra data from the environment into the reflog message). So even after a branch is deleted, its audit log entries remain, though you have to pull out the data by hand (git doesn't know about it at all, except as an append-only sink for writing). And git doesn't use the audit log for connectivity, either, so eventually the objects could be pruned. Just some basic protection -- don't delete the reflog, and instead, rename it to something that preserves the name but in a different namespace. That part is easy. Accessing it seamlessly and handling reflog expiration are a little harder. Not because they're intractable, but just because there are some low-level assumptions in the git code. The patch series I mentioned above mostly works. It probably just needs somebody to go through and find the corner cases. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem while cloning a git repo
I am not able to clone via https because gerrit doesn't propose this way of cloning. When I clone via http, I see that git is starting the download of objects: remote: Counting objects: 256, done remote: Finding sources: 100% (256/256) Receiving objects: 46% ... (this part always fails at 46 %). So, I think that the proxy part is not a problem (there is no proxy set here). If I set GIT_CURL_VERBOSE=1 before cloning, I've got the following error: * Problem (2) in the Chunked-Encoded data * Closing connection 1 YC Le 14/11/2013 00:40, brian m. carlson a écrit : On Wed, Nov 13, 2013 at 08:43:28AM +0100, Yann COLLETTE wrote: Hello, When I perform the git clone via git, it works. The problem is only happening via http. I tried to play with http.postBuffer and I set this parameter to it's maximum (a little bit before a git clone triggered a memory alloc problem) and I see that something big is trying to be downloaded. But I don't see such a big file in my history of commits. The maximum one seems to be around 50 Mo ... Please keep the list in CC. http.postBuffer doesn't affect clones, only pushes, so that isn't relevant here. You're experiencing something that is dropping the connection over HTTP. So you either have a bad proxy, or something else is causing the connection to be dropped. Since it's only over HTTP, I suspect it's the former. Do you have HTTPS, and if so, does it work if you try to clone over that? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] config: arbitrary number of matches for --unset and --replace-all
On Thu, Nov 14, 2013 at 08:50:43AM +0100, Thomas Rast wrote: git-config used a static match array to hold the matches we want to unset/replace when using --unset or --replace-all. Use a variable-sized array instead. This in particular fixes the symptoms git-svn had when storing large numbers of svn-remote.*.added-placeholder entries in the config file. While the tests are rather more paranoid than just --unset and --replace-all, the other operations already worked. Indeed git-svn's usage only breaks the first time *after* creating so many entries, when it wants to unset and re-add them all. Reported-by: Jess Hottenstein jess.hottenst...@gmail.com Signed-off-by: Thomas Rast t...@thomasrast.ch This looks good to me. I agree with your earlier assessment that adding tons of config keys is probably a bad idea. It's not just slow when looking up those keys, but it also slows down every single git operation (which might read config many times, if it is a script). Still, we should at least do the right thing in the face of such config. @@ -1260,11 +1259,15 @@ static int store_aux(const char *key, const char *value, void *cb) * Do not increment matches: this is no match, but we * just made sure we are in the desired section. */ + ALLOC_GROW(store.offset, store.seen+1, +store.offset_alloc); store.offset[store.seen] = cf-do_ftell(cf); /* fallthru */ case SECTION_END_SEEN: case START: if (matches(key, value)) { + ALLOC_GROW(store.offset, store.seen+1, +store.offset_alloc); store.offset[store.seen] = cf-do_ftell(cf); store.state = KEY_SEEN; store.seen++; This code is weird to follow because of the fall-throughs. I do not think you have introduced any bugs with your patch, but it seems weird to me that we set the offset at the top of the hunk. If we hit the conditional in the bottom half, we do actually increment storer.seen, but only _after_ having overwritten the value from above (with the same value, no less). But if we do not follow that code path, we may end up here: @@ -1272,6 +1275,9 @@ static int store_aux(const char *key, const char *value, void *cb) if (strrchr(key, '.') - key == store.baselen !strncmp(key, store.key, store.baselen)) { store.state = SECTION_SEEN; + ALLOC_GROW(store.offset, +store.seen+1, +store.offset_alloc); store.offset[store.seen] = cf-do_ftell(cf); } } where we overwrite it again, but do not update store.seen. Or we may trigger neither, and leave the function with our offset stored, but store.seen not incremented. So it seems odd to me that we would set the offset, but in some cases never actually increment our counter. AFAICT, we do not ever access those values. The reader limits itself to i store.seen, which makes sense. And the writers always call a fresh ftell before incrementing store.seen. But maybe I am missing something. Anyway, it is neither here nor there for your patch. Just something odd I noticed while reading the code. +setup_many() { + setup + # This time we want the newline so that we can tack on more + # entries. + echo .git/config + # Semi-efficient way of concatenating 5^5 = 3125 lines. Note + # that because 'setup' already put one line, this means 3126 + # entries for section.key in the config file. + cat 5to1 EOF + key = foo + key = foo + key = foo + key = foo + key = foo +EOF + cat 5to1 5to1 5to1 5to1 5to1 5to2 # 25 + cat 5to2 5to2 5to2 5to2 5to2 5to3 # 125 + cat 5to3 5to3 5to3 5to3 5to3 5to4 # 635 + cat 5to4 5to4 5to4 5to4 5to4 .git/config # 3125 +} You could make this even more semi-efficient by just storing the end result in 5to5, and then copying it into place rather than doing the whole dance for each test. I doubt it matters that much, though. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
On 11/14/2013 01:37 PM, Jeff King wrote: On Thu, Nov 14, 2013 at 08:56:07AM +0100, Thomas Rast wrote: Whatever it was that happened to a hundred or more repos on the Jenkins project seems to be stirring up this debate in some circles. Making us so curious ... and then you just leave us hanging there ;-) Oh my apologies; I missed the URL!! (But Peff supplied it before I saw this email!) Any pointers to this debate? I do not know about any particular debate in git circles, but I assume Sitaram is referring to this incident: https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ in which a Jenkins dev force-pushed and rewound history on 150 different repos. In this case the reflog made rollback easy, but if he had pushed a deletion, it would be harder. I don't know if they had a reflog on the server side; they used client-side reflogs if I understood correctly. I'm talking about server side (bare repo), assuming the site has core.logAllRefUpdates set. And I'll explain the some circles part as something on LinkedIn. To be honest there's been a fair bit of FUDding by CVCS types there so I stopped looking at the posts, but I get the subject lines by email and I saw one that said Git History Protection - if we needed proof... or something like that. I admit I didn't check to see if a debate actually followed that post :-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote: I do not know about any particular debate in git circles, but I assume Sitaram is referring to this incident: https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ in which a Jenkins dev force-pushed and rewound history on 150 different repos. In this case the reflog made rollback easy, but if he had pushed a deletion, it would be harder. I don't know if they had a reflog on the server side; they used client-side reflogs if I understood correctly. I'm talking about server side (bare repo), assuming the site has core.logAllRefUpdates set. Yes, they did have server-side reflogs (the pushes were to GitHub, and we reflog everything). Client-side reflogs would not be sufficient, as the client who pushed does not record the history he just rewound (he _might_ have it at refs/remotes/origin/master@{1}, but if somebody pushed since his last fetch, then he doesn't). The simplest way to recover is to just have everyone push again (without --force). The history will just silently fast-forward to whoever has the most recent tip. The downside is that you have to wait for that person to actually push. :) I think they started with that, and then eventually GitHub support got wind of it and pulled the last value for each repo out of the server-side reflog for them. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
Would be really useful anyway to have the ability to create a server-side reference based on a SHA-1, using the Git protocol. Alternatively, just fetching a remote repo based on a SHA-1 (not referenced by any ref-spec but still existent) so that you can create a new reference locally and push. Luca. On 14 Nov 2013, at 11:09, Jeff King p...@peff.net wrote: On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote: I do not know about any particular debate in git circles, but I assume Sitaram is referring to this incident: https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ in which a Jenkins dev force-pushed and rewound history on 150 different repos. In this case the reflog made rollback easy, but if he had pushed a deletion, it would be harder. I don't know if they had a reflog on the server side; they used client-side reflogs if I understood correctly. I'm talking about server side (bare repo), assuming the site has core.logAllRefUpdates set. Yes, they did have server-side reflogs (the pushes were to GitHub, and we reflog everything). Client-side reflogs would not be sufficient, as the client who pushed does not record the history he just rewound (he _might_ have it at refs/remotes/origin/master@{1}, but if somebody pushed since his last fetch, then he doesn't). The simplest way to recover is to just have everyone push again (without --force). The history will just silently fast-forward to whoever has the most recent tip. The downside is that you have to wait for that person to actually push. :) I think they started with that, and then eventually GitHub support got wind of it and pulled the last value for each repo out of the server-side reflog for them. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/21] sha1write: make buffer const-correct
We are passed a void * and write it out without ever touching it; let's indicate that by using const. Signed-off-by: Jeff King p...@peff.net --- csum-file.c | 6 +++--- csum-file.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/csum-file.c b/csum-file.c index 53f5375..465971c 100644 --- a/csum-file.c +++ b/csum-file.c @@ -11,7 +11,7 @@ #include progress.h #include csum-file.h -static void flush(struct sha1file *f, void *buf, unsigned int count) +static void flush(struct sha1file *f, const void *buf, unsigned int count) { if (0 = f-check_fd count) { unsigned char check_buffer[8192]; @@ -86,13 +86,13 @@ int sha1close(struct sha1file *f, unsigned char *result, unsigned int flags) return fd; } -int sha1write(struct sha1file *f, void *buf, unsigned int count) +int sha1write(struct sha1file *f, const void *buf, unsigned int count) { while (count) { unsigned offset = f-offset; unsigned left = sizeof(f-buffer) - offset; unsigned nr = count left ? left : count; - void *data; + const void *data; if (f-do_crc) f-crc32 = crc32(f-crc32, buf, nr); diff --git a/csum-file.h b/csum-file.h index 3b540bd..9dedb03 100644 --- a/csum-file.h +++ b/csum-file.h @@ -34,7 +34,7 @@ extern struct sha1file *sha1fd(int fd, const char *name); extern struct sha1file *sha1fd_check(const char *name); extern struct sha1file *sha1fd_throughput(int fd, const char *name, struct progress *tp); extern int sha1close(struct sha1file *, unsigned char *, unsigned int); -extern int sha1write(struct sha1file *, void *, unsigned int); +extern int sha1write(struct sha1file *, const void *, unsigned int); extern void sha1flush(struct sha1file *f); extern void crc32_begin(struct sha1file *); extern uint32_t crc32_end(struct sha1file *); -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/21] pack bitmaps
Here's another iteration of the pack bitmaps series. Compared to v2, it changes: - misc style/typo fixes - portability fixes from Ramsay and Torsten - count-objects garbage-reporting patch from Duy - disable bitmaps when is_repository_shallow(); this also covers the case where the client is shallow, since we feed pack-objects a --shallow-file in that case. This used to done by checking !internal_rev_list, but that doesn't apply after cdab485. - ewah sources now properly use git-compat-util.h and do not include system headers - the ewah code uses ewah_malloc, ewah_realloc, and so forth to let the project use a particular allocator (and we want to use xmalloc and friends). And we defined those in pack-bitmap.h, but of course that had no effect on the ewah/*.c files that did not include pack-bitmap.h. Since we are hacking up and git-ifying libewok anyway, we can just set the hardcoded fallback to xmalloc instead of malloc. - the ewah code used gcc's __builtin_ctzll, but did not provide a suitable fallback. We now provide a fallback in C. - The bitmap reading code only handles a single bitmapped pack (since they must be fully closed, there is not much point in having multiple). It used to silently ignore extra bitmap indices it found, but will now warn that they are being ignored. - The name-hash cache is now optional, controlled by pack.writeBitmapHashCache. - The test script will now do basic interoperability testing with jgit (if you have jgit in your $PATH). - There are now perf tests. Spoiler alert: bitmaps make clones faster. See patch 20 for details. We can also measure the speedup from the hash cache (see patch 21). Not addressed: - I did not include the NEEDS_ALIGNED_ACCESS patch. I note that we do not even have a Makefile knob for this, and the code in read-cache.c has probably never actually been used. Are there real systems that have a problem? The read-cache code was in support of the index v4 experiment, which did away with the 8-byte padding. So it could be that we simply don't see it, because everything is currently aligned. - On a related note, we do some cast-buffer-to-struct magic on the mmap'd file. I note that the regular packfile reader also does this. How careful do we want to be? - We still assume that reusing a slice from the front of the pack will never miss delta bases. This is the case currently for packs generated by both git and JGit, but it would be nice to mark the property in the bitmap index. Adding a new flag would break JGit compatibility, though. We can either make it an option, or assume it's good enough for now and worry about it in v2. [01/21]: sha1write: make buffer const-correct [02/21]: revindex: Export new APIs [03/21]: pack-objects: Refactor the packing list [04/21]: pack-objects: factor out name_hash [05/21]: revision: allow setting custom limiter function [06/21]: sha1_file: export `git_open_noatime` [07/21]: compat: add endianness helpers [08/21]: ewah: compressed bitmap implementation [09/21]: documentation: add documentation for the bitmap format [10/21]: pack-bitmap: add support for bitmap indexes [11/21]: pack-objects: use bitmaps when packing objects [12/21]: rev-list: add bitmap mode to speed up object lists [13/21]: pack-objects: implement bitmap writing [14/21]: repack: stop using magic number for ARRAY_SIZE(exts) [15/21]: repack: turn exts array into array-of-struct [16/21]: repack: handle optional files created by pack-objects [17/21]: repack: consider bitmaps when performing repacks [18/21]: count-objects: recognize .bitmap in garbage-checking [19/21]: t: add basic bitmap functionality tests [20/21]: t/perf: add tests for pack bitmaps [21/21]: pack-bitmap: implement optional name_hash cache -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 05/21] revision: allow setting custom limiter function
From: Vicent Marti tan...@gmail.com This commit enables users of `struct rev_info` to peform custom limiting during a revision walk (i.e. `get_revision`). If the field `include_check` has been set to a callback, this callback will be issued once for each commit before it is added to the pending list of the revwalk. If the include check returns 0, the commit will be marked as added but won't be pushed to the pending list, effectively limiting the walk. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- revision.c | 4 revision.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/revision.c b/revision.c index 956040c..260f4a1 100644 --- a/revision.c +++ b/revision.c @@ -779,6 +779,10 @@ static int add_parents_to_list(struct rev_info *revs, struct commit *commit, return 0; commit-object.flags |= ADDED; + if (revs-include_check + !revs-include_check(commit, revs-include_check_data)) + return 0; + /* * If the commit is uninteresting, don't try to * prune parents - we want the maximal uninteresting diff --git a/revision.h b/revision.h index 89132df..5658ddd 100644 --- a/revision.h +++ b/revision.h @@ -169,6 +169,8 @@ struct rev_info { unsigned long min_age; int min_parents; int max_parents; + int (*include_check)(struct commit *, void *); + void *include_check_data; /* diff info for patches and for paths limiting */ struct diff_options diffopt; -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 06/21] sha1_file: export `git_open_noatime`
From: Vicent Marti tan...@gmail.com The `git_open_noatime` helper can be of general interest for other consumers of git's different on-disk formats. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- cache.h | 1 + sha1_file.c | 4 +--- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/cache.h b/cache.h index ce377e1..6d4ef65 100644 --- a/cache.h +++ b/cache.h @@ -780,6 +780,7 @@ extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1); extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *); extern int force_object_loose(const unsigned char *sha1, time_t mtime); +extern int git_open_noatime(const char *name); extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size); extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz); extern int parse_sha1_header(const char *hdr, unsigned long *sizep); diff --git a/sha1_file.c b/sha1_file.c index 7dadd04..5557bd9 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -239,8 +239,6 @@ char *sha1_pack_index_name(const unsigned char *sha1) struct alternate_object_database *alt_odb_list; static struct alternate_object_database **alt_odb_tail; -static int git_open_noatime(const char *name); - /* * Prepare alternate object database registry. * @@ -1357,7 +1355,7 @@ int check_sha1_signature(const unsigned char *sha1, void *map, return hashcmp(sha1, real_sha1) ? -1 : 0; } -static int git_open_noatime(const char *name) +int git_open_noatime(const char *name) { static int sha1_file_open_flag = O_NOATIME; -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 07/21] compat: add endianness helpers
From: Vicent Marti tan...@gmail.com The POSIX standard doesn't currently define a `ntohll`/`htonll` function pair to perform network-to-host and host-to-network swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk storage of EWAH bitmaps if they are not in native byte order. Many thanks to Ramsay Jones ram...@ramsay1.demon.co.uk and Torsten Bögershausen tbo...@web.de for cygwin/mingw/msvc portability fixes. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- compat/bswap.h | 76 +- 1 file changed, 75 insertions(+), 1 deletion(-) diff --git a/compat/bswap.h b/compat/bswap.h index 5061214..c18a78e 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -17,7 +17,20 @@ static inline uint32_t default_swab32(uint32_t val) ((val 0x00ff) 24)); } +static inline uint64_t default_bswap64(uint64_t val) +{ + return (((val (uint64_t)0x00ffULL) 56) | + ((val (uint64_t)0xff00ULL) 40) | + ((val (uint64_t)0x00ffULL) 24) | + ((val (uint64_t)0xff00ULL) 8) | + ((val (uint64_t)0x00ffULL) 8) | + ((val (uint64_t)0xff00ULL) 24) | + ((val (uint64_t)0x00ffULL) 40) | + ((val (uint64_t)0xff00ULL) 56)); +} + #undef bswap32 +#undef bswap64 #if defined(__GNUC__) (defined(__i386__) || defined(__x86_64__)) @@ -32,15 +45,42 @@ static inline uint32_t git_bswap32(uint32_t x) return result; } +#define bswap64 git_bswap64 +#if defined(__x86_64__) +static inline uint64_t git_bswap64(uint64_t x) +{ + uint64_t result; + if (__builtin_constant_p(x)) + result = default_bswap64(x); + else + __asm__(bswap %q0 : =r (result) : 0 (x)); + return result; +} +#else +static inline uint64_t git_bswap64(uint64_t x) +{ + union { uint64_t i64; uint32_t i32[2]; } tmp, result; + if (__builtin_constant_p(x)) + result.i64 = default_bswap64(x); + else { + tmp.i64 = x; + result.i32[0] = git_bswap32(tmp.i32[1]); + result.i32[1] = git_bswap32(tmp.i32[0]); + } + return result.i64; +} +#endif + #elif defined(_MSC_VER) (defined(_M_IX86) || defined(_M_X64)) #include stdlib.h #define bswap32(x) _byteswap_ulong(x) +#define bswap64(x) _byteswap_uint64(x) #endif -#ifdef bswap32 +#if defined(bswap32) #undef ntohl #undef htonl @@ -48,3 +88,37 @@ static inline uint32_t git_bswap32(uint32_t x) #define htonl(x) bswap32(x) #endif + +#if defined(bswap64) + +#undef ntohll +#undef htonll +#define ntohll(x) bswap64(x) +#define htonll(x) bswap64(x) + +#else + +#undef ntohll +#undef htonll + +#if !defined(__BYTE_ORDER) +# if defined(BYTE_ORDER) defined(LITTLE_ENDIAN) defined(BIG_ENDIAN) +# define __BYTE_ORDER BYTE_ORDER +# define __LITTLE_ENDIAN LITTLE_ENDIAN +# define __BIG_ENDIAN BIG_ENDIAN +# endif +#endif + +#if !defined(__BYTE_ORDER) +# error Cannot determine endianness +#endif + +#if __BYTE_ORDER == __BIG_ENDIAN +# define ntohll(n) (n) +# define htonll(n) (n) +#else +# define ntohll(n) default_bswap64(n) +# define htonll(n) default_bswap64(n) +#endif + +#endif -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 04/21] pack-objects: factor out name_hash
From: Vicent Marti tan...@gmail.com As the pack-objects system grows beyond the single pack-objects.c file, more parts (like the soon-to-exist bitmap code) will need to compute hashes for matching deltas. Factor out name_hash to make it available to other files. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- builtin/pack-objects.c | 24 ++-- pack-objects.h | 20 2 files changed, 22 insertions(+), 22 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f3f0cf9..faf746b 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -778,26 +778,6 @@ static void write_pack_file(void) written, nr_result); } -static uint32_t name_hash(const char *name) -{ - uint32_t c, hash = 0; - - if (!name) - return 0; - - /* -* This effectively just creates a sortable number from the -* last sixteen non-whitespace characters. Last characters -* count most, so things that end in .c sort together. -*/ - while ((c = *name++) != 0) { - if (isspace(c)) - continue; - hash = (hash 2) + (c 24); - } - return hash; -} - static void setup_delta_attr_check(struct git_attr_check *check) { static struct git_attr *attr_delta; @@ -826,7 +806,7 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type, struct object_entry *entry; struct packed_git *p, *found_pack = NULL; off_t found_offset = 0; - uint32_t hash = name_hash(name); + uint32_t hash = pack_name_hash(name); uint32_t index_pos; entry = packlist_find(to_pack, sha1, index_pos); @@ -1082,7 +1062,7 @@ static void add_preferred_base_object(const char *name) { struct pbase_tree *it; int cmplen; - unsigned hash = name_hash(name); + unsigned hash = pack_name_hash(name); if (!num_preferred_base || check_pbase_path(hash)) return; diff --git a/pack-objects.h b/pack-objects.h index f528215..90ad0a8 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -44,4 +44,24 @@ struct object_entry *packlist_find(struct packing_data *pdata, const unsigned char *sha1, uint32_t *index_pos); +static inline uint32_t pack_name_hash(const char *name) +{ + uint32_t c, hash = 0; + + if (!name) + return 0; + + /* +* This effectively just creates a sortable number from the +* last sixteen non-whitespace characters. Last characters +* count most, so things that end in .c sort together. +*/ + while ((c = *name++) != 0) { + if (isspace(c)) + continue; + hash = (hash 2) + (c 24); + } + return hash; +} + #endif -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 03/21] pack-objects: Refactor the packing list
From: Vicent Marti tan...@gmail.com The hash table that stores the packing list for a given `pack-objects` run was tightly coupled to the pack-objects code. In this commit, we refactor the hash table and the underlying storage array into a `packing_data` struct. The functionality for accessing and adding entries to the packing list is hence accessible from other parts of Git besides the `pack-objects` builtin. This refactoring is a requirement for further patches in this series that will require accessing the commit packing list from outside of `pack-objects`. The hash table implementation has been minimally altered: we now use table sizes which are always a power of two, to ensure a uniform index distribution in the array. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Makefile | 2 + builtin/pack-objects.c | 175 +++-- pack-objects.c | 111 +++ pack-objects.h | 47 + 4 files changed, 200 insertions(+), 135 deletions(-) create mode 100644 pack-objects.c create mode 100644 pack-objects.h diff --git a/Makefile b/Makefile index af847f8..48ff0bd 100644 --- a/Makefile +++ b/Makefile @@ -694,6 +694,7 @@ LIB_H += notes-merge.h LIB_H += notes-utils.h LIB_H += notes.h LIB_H += object.h +LIB_H += pack-objects.h LIB_H += pack-revindex.h LIB_H += pack.h LIB_H += parse-options.h @@ -831,6 +832,7 @@ LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += object.o LIB_OBJS += pack-check.o +LIB_OBJS += pack-objects.o LIB_OBJS += pack-revindex.o LIB_OBJS += pack-write.o LIB_OBJS += pager.o diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 36273dd..f3f0cf9 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -14,6 +14,7 @@ #include diff.h #include revision.h #include list-objects.h +#include pack-objects.h #include progress.h #include refs.h #include streaming.h @@ -25,42 +26,15 @@ static const char *pack_usage[] = { NULL }; -struct object_entry { - struct pack_idx_entry idx; - unsigned long size; /* uncompressed size */ - struct packed_git *in_pack; /* already in pack */ - off_t in_pack_offset; - struct object_entry *delta; /* delta base object */ - struct object_entry *delta_child; /* deltified objects who bases me */ - struct object_entry *delta_sibling; /* other deltified objects who -* uses the same base as me -*/ - void *delta_data; /* cached delta (uncompressed) */ - unsigned long delta_size; /* delta data size (uncompressed) */ - unsigned long z_delta_size; /* delta data size (compressed) */ - enum object_type type; - enum object_type in_pack_type; /* could be delta */ - uint32_t hash; /* name hint hash */ - unsigned char in_pack_header_size; - unsigned preferred_base:1; /* - * we do not pack this, but is available - * to be used as the base object to delta - * objects against. - */ - unsigned no_try_delta:1; - unsigned tagged:1; /* near the very tip of refs */ - unsigned filled:1; /* assigned write-order */ -}; - /* - * Objects we are going to pack are collected in objects array (dynamically - * expanded). nr_objects nr_alloc controls this array. They are stored - * in the order we see -- typically rev-list --objects order that gives us - * nice minimum seek order. + * Objects we are going to pack are collected in the `to_pack` structure. + * It contains an array (dynamically expanded) of the object data, and a map + * that can resolve SHA1s to their position in the array. */ -static struct object_entry *objects; +static struct packing_data to_pack; + static struct pack_idx_entry **written_list; -static uint32_t nr_objects, nr_alloc, nr_result, nr_written; +static uint32_t nr_result, nr_written; static int non_empty; static int reuse_delta = 1, reuse_object = 1; @@ -90,21 +64,11 @@ static unsigned long cache_max_small_delta_size = 1000; static unsigned long window_memory_limit = 0; /* - * The object names in objects array are hashed with this hashtable, - * to help looking up the entry by object name. - * This hashtable is built after all the objects are seen. - */ -static int *object_ix; -static int object_ix_hashsz; -static struct object_entry *locate_object_entry(const unsigned char *sha1); - -/* * stats */ static uint32_t written, written_delta; static uint32_t reused, reused_delta; - static void *get_delta(struct object_entry *entry) { unsigned long size, base_size, delta_size; @@ -553,12 +517,12 @@ static int mark_tagged(const char *path, const unsigned char *sha1, int
[PATCH v3 02/21] revindex: Export new APIs
From: Vicent Marti tan...@gmail.com Allow users to efficiently lookup consecutive entries that are expected to be found on the same revindex by exporting `find_revindex_position`: this function takes a pointer to revindex itself, instead of looking up the proper revindex for a given packfile on each call. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- pack-revindex.c | 38 +- pack-revindex.h | 8 2 files changed, 33 insertions(+), 13 deletions(-) diff --git a/pack-revindex.c b/pack-revindex.c index b4d2b35..0bb13b1 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -16,11 +16,6 @@ * get the object sha1 from the main index. */ -struct pack_revindex { - struct packed_git *p; - struct revindex_entry *revindex; -}; - static struct pack_revindex *pack_revindex; static int pack_revindex_hashsz; @@ -201,15 +196,14 @@ static void create_pack_revindex(struct pack_revindex *rix) sort_revindex(rix-revindex, num_ent, p-pack_size); } -struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) +struct pack_revindex *revindex_for_pack(struct packed_git *p) { int num; - unsigned lo, hi; struct pack_revindex *rix; - struct revindex_entry *revindex; if (!pack_revindex_hashsz) init_pack_revindex(); + num = pack_revindex_ix(p); if (num 0) die(internal error: pack revindex fubar); @@ -217,21 +211,39 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) rix = pack_revindex[num]; if (!rix-revindex) create_pack_revindex(rix); - revindex = rix-revindex; - lo = 0; - hi = p-num_objects + 1; + return rix; +} + +int find_revindex_position(struct pack_revindex *pridx, off_t ofs) +{ + int lo = 0; + int hi = pridx-p-num_objects + 1; + struct revindex_entry *revindex = pridx-revindex; + do { unsigned mi = lo + (hi - lo) / 2; if (revindex[mi].offset == ofs) { - return revindex + mi; + return mi; } else if (ofs revindex[mi].offset) hi = mi; else lo = mi + 1; } while (lo hi); + error(bad offset for revindex); - return NULL; + return -1; +} + +struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) +{ + struct pack_revindex *pridx = revindex_for_pack(p); + int pos = find_revindex_position(pridx, ofs); + + if (pos 0) + return NULL; + + return pridx-revindex + pos; } void discard_revindex(void) diff --git a/pack-revindex.h b/pack-revindex.h index 8d5027a..866ca9c 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -6,6 +6,14 @@ struct revindex_entry { unsigned int nr; }; +struct pack_revindex { + struct packed_git *p; + struct revindex_entry *revindex; +}; + +struct pack_revindex *revindex_for_pack(struct packed_git *p); +int find_revindex_position(struct pack_revindex *pridx, off_t ofs); + struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs); void discard_revindex(void); -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 09/21] documentation: add documentation for the bitmap format
From: Vicent Marti tan...@gmail.com This is the technical documentation for the JGit-compatible Bitmap v1 on-disk format. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Documentation/technical/bitmap-format.txt | 131 ++ 1 file changed, 131 insertions(+) create mode 100644 Documentation/technical/bitmap-format.txt diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt new file mode 100644 index 000..7a86bd7 --- /dev/null +++ b/Documentation/technical/bitmap-format.txt @@ -0,0 +1,131 @@ +GIT bitmap v1 format + + + - A header appears at the beginning: + + 4-byte signature: {'B', 'I', 'T', 'M'} + + 2-byte version number (network byte order) + The current implementation only supports version 1 + of the bitmap index (the same one as JGit). + + 2-byte flags (network byte order) + + The following flags are supported: + + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the bitmap + index has been generated for a packfile with full closure + (i.e. where every single object in the packfile can find +its parent links inside the same packfile). This is a + requirement for the bitmap index format, also present in JGit, + that greatly reduces the complexity of the implementation. + + 4-byte entry count (network byte order) + + The total count of entries (bitmapped commits) in this bitmap index. + + 20-byte checksum + + The SHA1 checksum of the pack this bitmap index belongs to. + + - 4 EWAH bitmaps that act as type indexes + + Type indexes are serialized after the hash cache in the shape + of four EWAH bitmaps stored consecutively (see Appendix A for + the serialization format of an EWAH bitmap). + + There is a bitmap for each Git object type, stored in the following + order: + + - Commits + - Trees + - Blobs + - Tags + + In each bitmap, the `n`th bit is set to true if the `n`th object + in the packfile is of that type. + + The obvious consequence is that the OR of all 4 bitmaps will result + in a full set (all bits set), and the AND of all 4 bitmaps will + result in an empty bitmap (no bits set). + + - N entries with compressed bitmaps, one for each indexed commit + + Where `N` is the total amount of entries in this bitmap index. + Each entry contains the following: + + - 4-byte object position (network byte order) + The position **in the index for the packfile** where the + bitmap for this commit is found. + + - 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positive, and hence entries are always xor'ed + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + + - 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + - The compressed bitmap itself, see Appendix A. + +== Appendix A: Serialization format for an EWAH bitmap + +Ewah bitmaps are serialized in the same protocol as the JAVAEWAH +library, making them backwards compatible with the JGit +implementation: + + - 4-byte number of bits of the resulting UNCOMPRESSED bitmap + + - 4-byte number of words of the COMPRESSED bitmap, when stored + + - N x 8-byte words, as specified by the previous field + +
[PATCH v3 10/21] pack-bitmap: add support for bitmap indexes
From: Vicent Marti tan...@gmail.com A bitmap index is a `.bitmap` file that can be found inside `$GIT_DIR/objects/pack/`, next to its corresponding packfile, and contains precalculated reachability information for selected commits. The full specification of the format for these bitmap indexes can be found in `Documentation/technical/bitmap-format.txt`. For a given commit SHA1, if it happens to be available in the bitmap index, its bitmap will represent every single object that is reachable from the commit itself. The nth bit in the bitmap is the nth object in the packfile; if it's set to 1, the object is reachable. By using the bitmaps available in the index, this commit implements several new functions: - `prepare_bitmap_git` - `prepare_bitmap_walk` - `traverse_bitmap_commit_list` - `reuse_partial_packfile_from_bitmap` The `prepare_bitmap_walk` function tries to build a bitmap of all the objects that can be reached from the commit roots of a given `rev_info` struct by using the following algorithm: - If all the interesting commits for a revision walk are available in the index, the resulting reachability bitmap is the bitwise OR of all the individual bitmaps. - When the full set of WANTs is not available in the index, we perform a partial revision walk using the commits that don't have bitmaps as roots, and limiting the revision walk as soon as we reach a commit that has a corresponding bitmap. The earlier OR'ed bitmap with all the indexed commits can now be completed as this walk progresses, so the end result is the full reachability list. - For revision walks with a HAVEs set (a set of commits that are deemed uninteresting), first we perform the same method as for the WANTs, but using our HAVEs as roots, in order to obtain a full reachability bitmap of all the uninteresting commits. This bitmap then can be used to: a) limit the subsequent walk when building the WANTs bitmap b) finding the final set of interesting commits by performing an AND-NOT of the WANTs and the HAVEs. If `prepare_bitmap_walk` runs successfully, the resulting bitmap is stored and the equivalent of a `traverse_commit_list` call can be performed by using `traverse_bitmap_commit_list`; the bitmap version of this call yields the objects straight from the packfile index (without having to look them up or parse them) and hence is several orders of magnitude faster. As an extra optimization, when `prepare_bitmap_walk` succeeds, the `reuse_partial_packfile_from_bitmap` call can be attempted: it will find the amount of objects at the beginning of the on-disk packfile that can be reused as-is, and return an offset into the packfile. The source packfile can then be loaded and the bytes up to `offset` can be written directly to the result without having to consider the entires inside the packfile individually. If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files are available), the `rev_info` struct is left untouched, and can be used to perform a manual rev-walk using `traverse_commit_list`. Hence, this new set of functions are a generic API that allows to perform the equivalent of git rev-list --objects [roots...] [^uninteresting...] for any set of commits, even if they don't have specific bitmaps generated for them. In further patches, we'll use this bitmap traversal optimization to speed up the `pack-objects` and `rev-list` commands. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Makefile | 2 + khash.h | 338 pack-bitmap.c | 970 ++ pack-bitmap.h | 43 +++ 4 files changed, 1353 insertions(+) create mode 100644 khash.h create mode 100644 pack-bitmap.c create mode 100644 pack-bitmap.h diff --git a/Makefile b/Makefile index 64a1ed7..b983d78 100644 --- a/Makefile +++ b/Makefile @@ -699,6 +699,7 @@ LIB_H += object.h LIB_H += pack-objects.h LIB_H += pack-revindex.h LIB_H += pack.h +LIB_H += pack-bitmap.h LIB_H += parse-options.h LIB_H += patch-ids.h LIB_H += pathspec.h @@ -837,6 +838,7 @@ LIB_OBJS += notes-cache.o LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += object.o +LIB_OBJS += pack-bitmap.o LIB_OBJS += pack-check.o LIB_OBJS += pack-objects.o LIB_OBJS += pack-revindex.o diff --git a/khash.h b/khash.h new file mode 100644 index 000..57ff603 --- /dev/null +++ b/khash.h @@ -0,0 +1,338 @@ +/* The MIT License + + Copyright (c) 2008, 2009, 2011 by Attractive Chaos attrac...@live.co.uk + + Permission is hereby granted, free of charge, to any person obtaining + a copy of this software and associated documentation files (the + Software), to deal in the Software without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of the Software, and to + permit persons to whom the Software is
[PATCH v3 08/21] ewah: compressed bitmap implementation
From: Vicent Marti tan...@gmail.com EWAH is a word-aligned compressed variant of a bitset (i.e. a data structure that acts as a 0-indexed boolean array for many entries). It uses a 64-bit run-length encoding (RLE) compression scheme, trading some compression for better processing speed. The goal of this word-aligned implementation is not to achieve the best compression, but rather to improve query processing time. As it stands right now, this EWAH implementation will always be more efficient storage-wise than its uncompressed alternative. EWAH arrays will be used as the on-disk format to store reachability bitmaps for all objects in a repository while keeping reasonable sizes, in the same way that JGit does. This EWAH implementation is a mostly straightforward port of the original `javaewah` library that JGit currently uses. The library is self-contained and has been embedded whole (4 files) inside the `ewah` folder to ease redistribution. The library is re-licensed under the GPLv2 with the permission of Daniel Lemire, the original author. The source code for the C version can be found on GitHub: https://github.com/vmg/libewok The original Java implementation can also be found on GitHub: https://github.com/lemire/javaewah Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Makefile | 11 +- ewah/bitmap.c | 221 ewah/ewah_bitmap.c | 726 + ewah/ewah_io.c | 193 ++ ewah/ewah_rlw.c| 115 + ewah/ewok.h| 235 + ewah/ewok_rlw.h| 114 + 7 files changed, 1613 insertions(+), 2 deletions(-) create mode 100644 ewah/bitmap.c create mode 100644 ewah/ewah_bitmap.c create mode 100644 ewah/ewah_io.c create mode 100644 ewah/ewah_rlw.c create mode 100644 ewah/ewok.h create mode 100644 ewah/ewok_rlw.h diff --git a/Makefile b/Makefile index 48ff0bd..64a1ed7 100644 --- a/Makefile +++ b/Makefile @@ -667,6 +667,8 @@ LIB_H += diff.h LIB_H += diffcore.h LIB_H += dir.h LIB_H += exec_cmd.h +LIB_H += ewah/ewok.h +LIB_H += ewah/ewok_rlw.h LIB_H += fetch-pack.h LIB_H += fmt-merge-msg.h LIB_H += fsck.h @@ -800,6 +802,10 @@ LIB_OBJS += dir.o LIB_OBJS += editor.o LIB_OBJS += entry.o LIB_OBJS += environment.o +LIB_OBJS += ewah/bitmap.o +LIB_OBJS += ewah/ewah_bitmap.o +LIB_OBJS += ewah/ewah_io.o +LIB_OBJS += ewah/ewah_rlw.o LIB_OBJS += exec_cmd.o LIB_OBJS += fetch-pack.o LIB_OBJS += fsck.o @@ -2474,8 +2480,9 @@ profile-clean: $(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs))) clean: profile-clean coverage-clean - $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o xdiff/*.o vcs-svn/*.o \ - builtin/*.o $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) + $(RM) *.o *.res block-sha1/*.o ppc/*.o compat/*.o compat/*/*.o + $(RM) xdiff/*.o vcs-svn/*.o ewah/*.o builtin/*.o + $(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X $(RM) $(TEST_PROGRAMS) $(NO_INSTALL) $(RM) -r bin-wrappers $(dep_dirs) diff --git a/ewah/bitmap.c b/ewah/bitmap.c new file mode 100644 index 000..710e58c --- /dev/null +++ b/ewah/bitmap.c @@ -0,0 +1,221 @@ +/** + * Copyright 2013, GitHub, Inc + * Copyright 2009-2013, Daniel Lemire, Cliff Moon, + * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#include git-compat-util.h +#include ewok.h + +#define MASK(x) ((eword_t)1 (x % BITS_IN_WORD)) +#define BLOCK(x) (x / BITS_IN_WORD) + +struct bitmap *bitmap_new(void) +{ + struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap)); + bitmap-words = ewah_calloc(32, sizeof(eword_t)); + bitmap-word_alloc = 32; + return bitmap; +} + +void bitmap_set(struct bitmap *self, size_t pos) +{ + size_t block = BLOCK(pos); + + if (block = self-word_alloc) { + size_t old_size = self-word_alloc; + self-word_alloc = block * 2; + self-words = ewah_realloc(self-words, + self-word_alloc * sizeof(eword_t)); + + memset(self-words + old_size, 0x0, +
[PATCH v3 12/21] rev-list: add bitmap mode to speed up object lists
From: Vicent Marti tan...@gmail.com The bitmap reachability index used to speed up the counting objects phase during `pack-objects` can also be used to optimize a normal rev-list if the only thing required are the SHA1s of the objects during the list (i.e., not the path names at which trees and blobs were found). Calling `git rev-list --objects --use-bitmap-index [committish]` will perform an object iteration based on a bitmap result instead of actually walking the object graph. These are some example timings for `torvalds/linux` (warm cache, best-of-five): $ time git rev-list --objects master /dev/null real0m34.191s user0m33.904s sys 0m0.268s $ time git rev-list --objects --use-bitmap-index master /dev/null real0m1.041s user0m0.976s sys 0m0.064s Likewise, using `git rev-list --count --use-bitmap-index` will speed up the counting operation by building the resulting bitmap and performing a fast popcount (number of bits set on the bitmap) on the result. Here are some sample timings of different ways to count commits in `torvalds/linux`: $ time git rev-list master | wc -l 399882 real0m6.524s user0m6.060s sys 0m3.284s $ time git rev-list --count master 399882 real0m4.318s user0m4.236s sys 0m0.076s $ time git rev-list --use-bitmap-index --count master 399882 real0m0.217s user0m0.176s sys 0m0.040s This also respects negative refs, so you can use it to count a slice of history: $ time git rev-list --count v3.0..master 144843 real0m1.971s user0m1.932s sys 0m0.036s $ time git rev-list --use-bitmap-index --count v3.0..master real0m0.280s user0m0.220s sys 0m0.056s Though note that the closer the endpoints, the less it helps. In the traversal case, we have fewer commits to cross, so we take less time. But the bitmap time is dominated by generating the pack revindex, which is constant with respect to the refs given. Note that you cannot yet get a fast --left-right count of a symmetric difference (e.g., --count --left-right master...topic). The slow part of that walk actually happens during the merge-base determination when we parse master...topic. Even though a count does not actually need to know the real merge base (it only needs to take the symmetric difference of the bitmaps), the revision code would require some refactoring to handle this case. Additionally, a `--test-bitmap` flag has been added that will perform the same rev-list manually (i.e. using a normal revwalk) and using bitmaps, and verify that the results are the same. This can be used to exercise the bitmap code, and also to verify that the contents of the .bitmap file are sane. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Documentation/git-rev-list.txt | 1 + Documentation/rev-list-options.txt | 8 builtin/rev-list.c | 39 ++ 3 files changed, 48 insertions(+) diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index 045b37b..7a1585d 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -55,6 +55,7 @@ SYNOPSIS [ \--reverse ] [ \--walk-reflogs ] [ \--no-walk ] [ \--do-walk ] +[ \--use-bitmap-index ] commit... [ \-- paths... ] DESCRIPTION diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index ec86d09..f7c8a4d 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -274,6 +274,14 @@ See also linkgit:git-reflog[1]. Output excluded boundary commits. Boundary commits are prefixed with `-`. +ifdef::git-rev-list[] +--use-bitmap-index:: + + Try to speed up the traversal using the pack bitmap index (if + one is available). Note that when traversing with `--objects`, + trees and blobs will not have their associated path printed. +endif::git-rev-list[] + -- History Simplification diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 0745e2d..9f92905 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -3,6 +3,8 @@ #include diff.h #include revision.h #include list-objects.h +#include pack.h +#include pack-bitmap.h #include builtin.h #include log-tree.h #include graph.h @@ -257,6 +259,18 @@ static int show_bisect_vars(struct rev_list_info *info, int reaches, int all) return 0; } +static int show_object_fast( + const unsigned char *sha1, + enum object_type type, + int exclude, + uint32_t name_hash, + struct packed_git *found_pack, + off_t found_offset) +{ + fprintf(stdout, %s\n, sha1_to_hex(sha1)); + return 1; +} + int
[PATCH v3 11/21] pack-objects: use bitmaps when packing objects
From: Vicent Marti tan...@gmail.com In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the have objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ /dev/null /dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real0m44.111s user0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout /dev/null /dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real0m5.413s user0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout /dev/null /dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real0m1.636s user0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Documentation/config.txt | 6 ++ builtin/pack-objects.c | 169 --- 2 files changed, 150 insertions(+), 25 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab26963..a981369 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1858,6 +1858,12 @@ pack.packSizeLimit:: Common unit suffixes of 'k', 'm', or 'g' are supported. +pack.useBitmaps:: + When true, git will use pack bitmaps (if available) when packing + to stdout (e.g., during the server side of a fetch). Defaults to + true. You should not generally need to turn this off unless + you are debugging pack bitmaps. + pager.cmd:: If the value is boolean, turns on or off pagination of the output of a particular Git subcommand when writing to a tty. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index faf746b..f04cce5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -19,6 +19,7 @@ #include refs.h #include streaming.h #include thread-utils.h +#include pack-bitmap.h static const char *pack_usage[] = { N_(git pack-objects --stdout [options...] [ ref-list | object-list]), @@ -57,12 +58,23 @@ static struct progress *progress_state; static int pack_compression_level = Z_DEFAULT_COMPRESSION; static int pack_compression_seen; +static struct packed_git *reuse_packfile; +static uint32_t reuse_packfile_objects; +static off_t reuse_packfile_offset; + +static int use_bitmap_index = 1; + static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 256 * 1024 * 1024; static unsigned long cache_max_small_delta_size = 1000; static unsigned long window_memory_limit = 0; +enum { + OBJECT_ENTRY_EXCLUDE = (1 0), + OBJECT_ENTRY_NO_TRY_DELTA = (1 1) +}; + /* * stats */ @@ -678,6 +690,49 @@ static struct object_entry **compute_write_order(void) return wo; } +static off_t write_reused_pack(struct sha1file *f) +{ + uint8_t buffer[8192]; + off_t to_write; + int fd; + + if (!is_pack_valid(reuse_packfile)) + return 0; + + fd = git_open_noatime(reuse_packfile-pack_name); + if (fd 0) + return 0; + + if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) { + close(fd); + return 0; + } + + if (reuse_packfile_offset 0) + reuse_packfile_offset = reuse_packfile-pack_size - 20; + + to_write = reuse_packfile_offset - sizeof(struct pack_header); + + while
[PATCH v3 15/21] repack: turn exts array into array-of-struct
This is slightly more verbose, but will let us annotate the extensions with further options in future commits. Signed-off-by: Jeff King p...@peff.net --- builtin/repack.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index 2e88975..a176de2 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -115,7 +115,12 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name) int cmd_repack(int argc, const char **argv, const char *prefix) { - const char *exts[] = {.pack, .idx}; + struct { + const char *name; + } exts[] = { + {.pack}, + {.idx}, + }; struct child_process cmd; struct string_list_item *item; struct argv_array cmd_args = ARGV_ARRAY_INIT; @@ -261,14 +266,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix) for (ext = 0; ext ARRAY_SIZE(exts); ext++) { char *fname, *fname_old; fname = mkpathdup(%s/%s%s, packdir, - item-string, exts[ext]); + item-string, exts[ext].name); if (!file_exists(fname)) { free(fname); continue; } fname_old = mkpath(%s/old-%s%s, packdir, - item-string, exts[ext]); + item-string, exts[ext].name); if (file_exists(fname_old)) if (unlink(fname_old)) failed = 1; @@ -319,9 +324,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) char *fname, *fname_old; struct stat statbuffer; fname = mkpathdup(%s/pack-%s%s, - packdir, item-string, exts[ext]); + packdir, item-string, exts[ext].name); fname_old = mkpathdup(%s-%s%s, - packtmp, item-string, exts[ext]); + packtmp, item-string, exts[ext].name); if (!stat(fname_old, statbuffer)) { statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | S_IWOTH); chmod(fname_old, statbuffer.st_mode); @@ -340,7 +345,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) fname = mkpath(%s/old-pack-%s%s, packdir, item-string, - exts[ext]); + exts[ext].name); if (remove_path(fname)) warning(_(removing '%s' failed), fname); } -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 14/21] repack: stop using magic number for ARRAY_SIZE(exts)
We have a static array of extensions, but hardcode the size of the array in our loops. Let's pull out this magic number, which will make it easier to change. Signed-off-by: Jeff King p...@peff.net --- builtin/repack.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index a0ff5c7..2e88975 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -115,7 +115,7 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name) int cmd_repack(int argc, const char **argv, const char *prefix) { - const char *exts[2] = {.pack, .idx}; + const char *exts[] = {.pack, .idx}; struct child_process cmd; struct string_list_item *item; struct argv_array cmd_args = ARGV_ARRAY_INIT; @@ -258,7 +258,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) */ failed = 0; for_each_string_list_item(item, names) { - for (ext = 0; ext 2; ext++) { + for (ext = 0; ext ARRAY_SIZE(exts); ext++) { char *fname, *fname_old; fname = mkpathdup(%s/%s%s, packdir, item-string, exts[ext]); @@ -315,7 +315,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) /* Now the ones with the same name are out of the way... */ for_each_string_list_item(item, names) { - for (ext = 0; ext 2; ext++) { + for (ext = 0; ext ARRAY_SIZE(exts); ext++) { char *fname, *fname_old; struct stat statbuffer; fname = mkpathdup(%s/pack-%s%s, @@ -335,7 +335,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) /* Remove the old- files */ for_each_string_list_item(item, names) { - for (ext = 0; ext 2; ext++) { + for (ext = 0; ext ARRAY_SIZE(exts); ext++) { char *fname; fname = mkpath(%s/old-pack-%s%s, packdir, -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 16/21] repack: handle optional files created by pack-objects
We ask pack-objects to pack to a set of temporary files, and then rename them into place. Some files that pack-objects creates may be optional (like a .bitmap file), in which case we would not want to call rename(). We already call stat() and make the chmod optional if the file cannot be accessed. We could simply skip the rename step in this case, but that would be a minor regression in noticing problems with non-optional files (like the .pack and .idx files). Instead, we can now annotate extensions as optional, and skip them if they don't exist (and otherwise rely on rename() to barf). Signed-off-by: Jeff King p...@peff.net --- builtin/repack.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index a176de2..8b7dfd0 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -117,6 +117,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) { struct { const char *name; + unsigned optional:1; } exts[] = { {.pack}, {.idx}, @@ -323,6 +324,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) for (ext = 0; ext ARRAY_SIZE(exts); ext++) { char *fname, *fname_old; struct stat statbuffer; + int exists = 0; fname = mkpathdup(%s/pack-%s%s, packdir, item-string, exts[ext].name); fname_old = mkpathdup(%s-%s%s, @@ -330,9 +332,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (!stat(fname_old, statbuffer)) { statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | S_IWOTH); chmod(fname_old, statbuffer.st_mode); + exists = 1; + } + if (exists || !exts[ext].optional) { + if (rename(fname_old, fname)) + die_errno(_(renaming '%s' failed), fname_old); } - if (rename(fname_old, fname)) - die_errno(_(renaming '%s' failed), fname_old); free(fname); free(fname_old); } -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 18/21] count-objects: recognize .bitmap in garbage-checking
From: Nguyễn Thái Ngọc Duy pclo...@gmail.com Count-objects will report any garbage files in the packs directory, including files whose extensions it does not know (case 1), and files whose matching .pack file is missing (case 2). Without having learned about .bitmap files, the current code reports all such files as garbage (case 1), even if their pack exists. Instead, they should be treated as case 2. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com Signed-off-by: Jeff King p...@peff.net --- sha1_file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sha1_file.c b/sha1_file.c index 5557bd9..d810b58 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1194,6 +1194,7 @@ static void prepare_packed_git_one(char *objdir, int local) if (has_extension(de-d_name, .idx) || has_extension(de-d_name, .pack) || + has_extension(de-d_name, .bitmap) || has_extension(de-d_name, .keep)) string_list_append(garbage, path); else -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 17/21] repack: consider bitmaps when performing repacks
From: Vicent Marti tan...@gmail.com Since `pack-objects` will write a `.bitmap` file next to the `.pack` and `.idx` files, this commit teaches `git-repack` to consider the new bitmap indexes (if they exist) when performing repack operations. This implies moving old bitmap indexes out of the way if we are repacking a repository that already has them, and moving the newly generated bitmap indexes into the `objects/pack` directory, next to their corresponding packfiles. Since `git repack` is now capable of handling these `.bitmap` files, a normal `git gc` run on a repository that has `pack.writebitmaps` set to true in its config file will generate bitmap indexes as part of the garbage collection process. Alternatively, `git repack` can be called with the `-b` switch to explicitly generate bitmap indexes if you are experimenting and don't want them on all the time. Signed-off-by: Vicent Marti tan...@gmail.com Signed-off-by: Jeff King p...@peff.net --- Documentation/git-repack.txt | 9 - builtin/repack.c | 9 - 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 509cf73..002cfd5 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository SYNOPSIS [verse] -'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [--window=n] [--depth=n] +'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=n] [--depth=n] DESCRIPTION --- @@ -110,6 +110,13 @@ other objects in that pack they already have locally. The default is unlimited, unless the config variable `pack.packSizeLimit` is set. +-b:: +--write-bitmap-index:: + Write a reachability bitmap index as part of the repack. This + only makes sense when used with `-a` or `-A`, as the bitmaps + must be able to refer to all reachable objects. This option + overrides the setting of `pack.writebitmaps`. + Configuration - diff --git a/builtin/repack.c b/builtin/repack.c index 8b7dfd0..239f278 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -94,7 +94,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list) static void remove_redundant_pack(const char *dir_name, const char *base_name) { - const char *exts[] = {.pack, .idx, .keep}; + const char *exts[] = {.pack, .idx, .keep, .bitmap}; int i; struct strbuf buf = STRBUF_INIT; size_t plen; @@ -121,6 +121,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) } exts[] = { {.pack}, {.idx}, + {.bitmap, 1}, }; struct child_process cmd; struct string_list_item *item; @@ -143,6 +144,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) int no_update_server_info = 0; int quiet = 0; int local = 0; + int write_bitmap = -1; struct option builtin_repack_options[] = { OPT_BIT('a', NULL, pack_everything, @@ -161,6 +163,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) OPT__QUIET(quiet, N_(be quiet)), OPT_BOOL('l', local, local, N_(pass --local to git-pack-objects)), + OPT_BOOL('b', write-bitmap-index, write_bitmap, + N_(write bitmap index)), OPT_STRING(0, unpack-unreachable, unpack_unreachable, N_(approxidate), N_(with -A, do not loosen objects older than this)), OPT_INTEGER(0, window, window, @@ -202,6 +206,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) argv_array_pushf(cmd_args, --no-reuse-delta); if (no_reuse_object) argv_array_pushf(cmd_args, --no-reuse-object); + if (write_bitmap = 0) + argv_array_pushf(cmd_args, --%swrite-bitmap-index, +write_bitmap ? : no-); if (pack_everything ALL_INTO_ONE) { get_non_kept_pack_filenames(existing_packs); -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 13/21] pack-objects: implement bitmap writing
From: Vicent Marti tan...@gmail.com This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as missing a commit. The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process
[PATCH v3 19/21] t: add basic bitmap functionality tests
Now that we can read and write bitmaps, we can exercise them with some basic functionality tests. These tests aren't particularly useful for seeing the benefit, as the test repo is too small for it to make a difference. However, we can at least check that using bitmaps does not break anything. Signed-off-by: Jeff King p...@peff.net --- t/t5310-pack-bitmaps.sh | 138 1 file changed, 138 insertions(+) create mode 100755 t/t5310-pack-bitmaps.sh diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh new file mode 100755 index 000..bd743f4 --- /dev/null +++ b/t/t5310-pack-bitmaps.sh @@ -0,0 +1,138 @@ +#!/bin/sh + +test_description='exercise basic bitmap functionality' +. ./test-lib.sh + +test_expect_success 'setup repo with moderate-sized history' ' + for i in $(test_seq 1 10); do + test_commit $i + done + git checkout -b other HEAD~5 + for i in $(test_seq 1 10); do + test_commit side-$i + done + git checkout master + blob=$(echo tagged-blob | git hash-object -w --stdin) + git tag tagged-blob $blob + git config pack.writebitmaps true +' + +test_expect_success 'full repack creates bitmaps' ' + git repack -ad + ls .git/objects/pack/ | grep bitmap output + test_line_count = 1 output +' + +test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' + git rev-list --test-bitmap HEAD +' + +rev_list_tests() { + state=$1 + + test_expect_success counting commits via bitmap ($state) ' + git rev-list --count HEAD expect + git rev-list --use-bitmap-index --count HEAD actual + test_cmp expect actual + ' + + test_expect_success counting partial commits via bitmap ($state) ' + git rev-list --count HEAD~5..HEAD expect + git rev-list --use-bitmap-index --count HEAD~5..HEAD actual + test_cmp expect actual + ' + + test_expect_success counting non-linear history ($state) ' + git rev-list --count other...master expect + git rev-list --use-bitmap-index --count other...master actual + test_cmp expect actual + ' + + test_expect_success enumerate --objects ($state) ' + git rev-list --objects --use-bitmap-index HEAD tmp + cut -d -f1 tmp tmp2 + sort tmp2 actual + git rev-list --objects HEAD tmp + cut -d -f1 tmp tmp2 + sort tmp2 expect + test_cmp expect actual + ' + + test_expect_success bitmap --objects handles non-commit objects ($state) ' + git rev-list --objects --use-bitmap-index HEAD tagged-blob actual + grep $blob actual + ' +} + +rev_list_tests 'full bitmap' + +test_expect_success 'clone from bitmapped repository' ' + git clone --no-local --bare . clone.git + git rev-parse HEAD expect + git --git-dir=clone.git rev-parse HEAD actual + test_cmp expect actual +' + +test_expect_success 'setup further non-bitmapped commits' ' + for i in $(test_seq 1 10); do + test_commit further-$i + done +' + +rev_list_tests 'partial bitmap' + +test_expect_success 'fetch (partial bitmap)' ' + git --git-dir=clone.git fetch origin master:master + git rev-parse HEAD expect + git --git-dir=clone.git rev-parse HEAD actual + test_cmp expect actual +' + +test_expect_success 'incremental repack cannot create bitmaps' ' + test_commit more-1 + test_must_fail git repack -d +' + +test_expect_success 'incremental repack can disable bitmaps' ' + test_commit more-2 + git repack -d --no-write-bitmap-index +' + +test_expect_success 'full repack, reusing previous bitmaps' ' + git repack -ad + ls .git/objects/pack/ | grep bitmap output + test_line_count = 1 output +' + +test_expect_success 'fetch (full bitmap)' ' + git --git-dir=clone.git fetch origin master:master + git rev-parse HEAD expect + git --git-dir=clone.git rev-parse HEAD actual + test_cmp expect actual +' + +test_lazy_prereq JGIT ' + type jgit +' + +test_expect_success JGIT 'we can read jgit bitmaps' ' + git clone . compat-jgit + ( + cd compat-jgit + rm -f .git/objects/pack/*.bitmap + jgit gc + git rev-list --test-bitmap HEAD + ) +' + +test_expect_success JGIT 'jgit can read our bitmaps' ' + git clone . compat-us.git + ( + cd compat-us.git + git repack -adb + # jgit gc will barf if it does not like our bitmaps + jgit gc + ) +' + +test_done -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More
[PATCH v3 20/21] t/perf: add tests for pack bitmaps
This adds a few basic perf tests for the pack bitmap code to show off its improvements. The tests are: 1. How long does it take to do a repack (it gets slower with bitmaps, since we have to do extra work)? 2. How long does it take to do a clone (it gets faster with bitmaps)? 3. How does a small fetch perform when we've just repacked? 4. How does a clone perform when we haven't repacked since a week of pushes? Here are results against linux.git: Test origin/master this tree --- 5310.2: repack to disk33.64(32.64+2.04) 67.67(66.75+1.84) +101.2% 5310.3: simulated clone 30.49(29.47+2.05) 1.20(1.10+0.10) -96.1% 5310.4: simulated fetch 3.49(6.79+0.06) 5.57(22.35+0.07) +59.6% 5310.6: partial bitmap36.70(43.87+1.81) 8.18(21.92+0.73) -77.7% You can see that we do take longer to repack, but we do way better for further clones. A small fetch performs a bit worse, as we spend way more time on delta compression (note the heavy user CPU time, as we have 8 threads) due to the lack of name hashes for the bitmapped objects. The final test shows how the bitmaps degrade over time between packs. There's still a significant speedup over the non-bitmap case, but we don't do quite as well (we have to spend time accessing the new objects the old fashioned way, including delta compression). Signed-off-by: Jeff King p...@peff.net --- t/perf/p5310-pack-bitmaps.sh | 56 1 file changed, 56 insertions(+) create mode 100755 t/perf/p5310-pack-bitmaps.sh diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh new file mode 100755 index 000..ac036d0 --- /dev/null +++ b/t/perf/p5310-pack-bitmaps.sh @@ -0,0 +1,56 @@ +#!/bin/sh + +test_description='Tests pack performance using bitmaps' +. ./perf-lib.sh + +test_perf_large_repo + +# note that we do everything through config, +# since we want to be able to compare bitmap-aware +# git versus non-bitmap git +test_expect_success 'setup bitmap config' ' + git config pack.writebitmaps true +' + +test_perf 'repack to disk' ' + git repack -ad +' + +test_perf 'simulated clone' ' + git pack-objects --stdout --all /dev/null /dev/null +' + +test_perf 'simulated fetch' ' + have=$(git rev-list HEAD --until=1.week.ago -1) + { + echo HEAD + echo ^$have + } | git pack-objects --revs --stdout /dev/null +' + +test_expect_success 'create partial bitmap state' ' + # pick a commit to represent the repo tip in the past + cutoff=$(git rev-list HEAD --until=1.week.ago -1) + orig_tip=$(git rev-parse HEAD) + + # now kill off all of the refs and pretend we had + # just the one tip + rm -rf .git/logs .git/refs/* .git/packed-refs + git update-ref HEAD $cutoff + + # and then repack, which will leave us with a nice + # big bitmap pack of the old history, and all of + # the new history will be loose, as if it had been pushed + # up incrementally and exploded via unpack-objects + git repack -Ad + + # and now restore our original tip, as if the pushes + # had happened + git update-ref HEAD $orig_tip +' + +test_perf 'partial bitmap' ' + git pack-objects --stdout --all /dev/null /dev/null +' + +test_done -- 1.8.5.rc0.443.g2df7f3f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 64-bit support.
On Thu, 14 Nov 2013 16:58:31 +0400 Лежанкин Иван abys...@gmail.com wrote: Do you plan to implement the 64-bit support in git? - Right now I have a problems sometimes with a huge repo and renaming detection. If I merge more than 32768 files at once, then the renaming detection fails, because of limitation inside git. The limitation is put by max 32-bit value. I tweaked sources locally, to use 64-bit value as a number of merging files - generally, it works. But I'm not so brave to try to replace it everywhere in git. $ apt-cache policy git git: Installed: 1:1.7.10.4-1+wheezy1 Candidate: 1:1.7.10.4-1+wheezy1 Version table: *** 1:1.7.10.4-1+wheezy1 0 500 http://cdn.debian.net/debian/ wheezy/main amd64 Packages 100 /var/lib/dpkg/status Notice the amd64 arch. So what the question really is about? If you have found a place where Git explicitly uses a 32-bit integer where it would better be using a 64-bit one, please propose a patch to discuss. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
On 11/14/2013 04:39 PM, Jeff King wrote: On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote: I do not know about any particular debate in git circles, but I assume Sitaram is referring to this incident: https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ in which a Jenkins dev force-pushed and rewound history on 150 different repos. In this case the reflog made rollback easy, but if he had pushed a deletion, it would be harder. I don't know if they had a reflog on the server side; they used client-side reflogs if I understood correctly. I'm talking about server side (bare repo), assuming the site has core.logAllRefUpdates set. Yes, they did have server-side reflogs (the pushes were to GitHub, and we reflog everything). Client-side reflogs would not be sufficient, as the client who pushed does not record the history he just rewound (he _might_ have it at refs/remotes/origin/master@{1}, but if somebody pushed since his last fetch, then he doesn't). The simplest way to recover is to just have everyone push again (without --force). The history will just silently fast-forward to whoever has the most recent tip. The downside is that you have to wait for that person to actually push. :) I think they started with that, and then eventually GitHub support got wind of it and pulled the last value for each repo out of the server-side reflog for them. Great. But what does github do if the branches were *deleted* by mistake (say someone does a git push --mirror; most likely in a script, for added fun and laughs!) Github may be able to help people recover from that also, but plain Git won't. And that's what I would like to see a change in. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
On 11/14/2013 04:47 PM, Luca Milanesio wrote: Would be really useful anyway to have the ability to create a server-side reference based on a SHA-1, using the Git protocol. Alternatively, just fetching a remote repo based on a SHA-1 (not referenced by any ref-spec but still existent) so that you can create a new reference locally and push. That's a security issue. Just to clarify, what I am asking for is the ability to recover on the server, where you have access to the actual files that comprise the repo. sitaram Luca. On 14 Nov 2013, at 11:09, Jeff King p...@peff.net wrote: On Thu, Nov 14, 2013 at 04:26:46PM +0530, Sitaram Chamarty wrote: I do not know about any particular debate in git circles, but I assume Sitaram is referring to this incident: https://groups.google.com/d/msg/jenkinsci-dev/-myjRIPcVwU/t4nkXONp8qgJ in which a Jenkins dev force-pushed and rewound history on 150 different repos. In this case the reflog made rollback easy, but if he had pushed a deletion, it would be harder. I don't know if they had a reflog on the server side; they used client-side reflogs if I understood correctly. I'm talking about server side (bare repo), assuming the site has core.logAllRefUpdates set. Yes, they did have server-side reflogs (the pushes were to GitHub, and we reflog everything). Client-side reflogs would not be sufficient, as the client who pushed does not record the history he just rewound (he _might_ have it at refs/remotes/origin/master@{1}, but if somebody pushed since his last fetch, then he doesn't). The simplest way to recover is to just have everyone push again (without --force). The history will just silently fast-forward to whoever has the most recent tip. The downside is that you have to wait for that person to actually push. :) I think they started with that, and then eventually GitHub support got wind of it and pulled the last value for each repo out of the server-side reflog for them. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
- Original Message - From: Jeff King p...@peff.net Sent: Thursday, November 14, 2013 3:14:56 AM Subject: Re: can we prevent reflog deletion when branch is deleted? On Thu, Nov 14, 2013 at 05:48:50AM +0530, Sitaram Chamarty wrote: Is there *any* way we can preserve a reflog for a deleted branch, perhaps under logs/refs/deleted/timestamp/full/ref/name ? At GitHub, we log each change to an audit log in addition to the regular reflog (we also stuff extra data from the environment into the reflog message). So even after a branch is deleted, its audit log entries remain, though you have to pull out the data by hand (git doesn't know about it at all, except as an append-only sink for writing). We recently ran into a similar situation at my $dayjob, so I made our server side update hook log all pushes (including deletes) and added the new log file to logrotate(8) -- note: make sure if logrotate recreates the file that it allows everyone to write to it. I'm sure it's not as comprehensive as Peff's solution, but it's pretty simple for smaller shops that want a little more protection. Here are the relevant excerpts from the script: #!/usr/bin/env python import os, sys, pwd, stat from datetime import datetime def log_push(too_many_changes): log_file = 'push-log.txt' try: f = open(log_file, 'a') try: # In case we just created the file, attempt to chmod it os.chmod(log_file, 0666) except OSError: # chmod will fail if the current user isn't the owner, but # if we've gotten this far we already have write permissions, # so just continue quietly pass # Linux/Mac okay, bad for Windows username = pwd.getpwuid(os.getuid())[0] f.write('%s: %s push by %s of %s from %s to %s\n'% \ (datetime.now().strftime('%Y-%m-%d %H:%M:%S'), 'Failed' if too_many_changes else 'Successful', username, refname, oldsha, newsha)) f.close() except IOError: try: log_stats = os.stat(log_file) # Figure out owner and permissions log_owner = pwd.getpwuid(log_stats.st_uid).pw_name log_perm = oct(stat.S_IMODE(log_stats.st_mode)) print_flush('Unable to open %s for appending. Current owner ' + \ 'is %s and permissions are %s.'%(log_file, log_owner, log_perm)) except: exception,desc,stack = sys.exc_info() print_flush('Unable to open log file. While generating error' + \ ' message encountered error: %s'%(desc)) if len(sys.argv) != 4: print_flush('Usage: %s refname oldsha newsha'%sys.argv[0]) sys.exit(1) refname = sys.argv[1] oldsha = sys.argv[2] newsha = sys.argv[3] if newsha == '0'*40: # Deleted ref, nothing to do log_push(False) sys.exit(0) # ... checking for various rule/style violations ... log_push(too_many_changes) if too_many_changes: sys.exit(1) else: sys.exit(0) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 64-bit support.
Can you be more specific about the limitation you suspect you are hitting? 32,768 is not the max 32-bit value. On Nov 14, 2013, at 6:58, Лежанкин Иван abys...@gmail.com wrote: Hi! Do you plan to implement the 64-bit support in git? - Right now I have a problems sometimes with a huge repo and renaming detection. If I merge more than 32768 files at once, then the renaming detection fails, because of limitation inside git. The limitation is put by max 32-bit value. I tweaked sources locally, to use 64-bit value as a number of merging files - generally, it works. But I'm not so brave to try to replace it everywhere in git. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 64-bit support.
I hit this limit in file 'diffcore-rename.c': if (rename_limit = 0 || rename_limit 32767) rename_limit = 32767; I just guess, that this limit comes from the O(N^2) complexity of the comparison algorithm. Since the max 32-bit signed value is 2^31, then the 2^15 = 32768 is somehow correlated with its square root, maybe, like 2^(32/2 - 1) - to prevent overflow. I'm trying to prepare the patch right now, that changes the `int rename_limit` = `long rename_limit` and all intermediate variable types. Is it a correct way to do? On 14 November 2013 18:40, Kent R. Spillner kspill...@acm.org wrote: Can you be more specific about the limitation you suspect you are hitting? 32,768 is not the max 32-bit value. On Nov 14, 2013, at 6:58, Лежанкин Иван abys...@gmail.com wrote: Hi! Do you plan to implement the 64-bit support in git? - Right now I have a problems sometimes with a huge repo and renaming detection. If I merge more than 32768 files at once, then the renaming detection fails, because of limitation inside git. The limitation is put by max 32-bit value. I tweaked sources locally, to use 64-bit value as a number of merging files - generally, it works. But I'm not so brave to try to replace it everywhere in git. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 64-bit support.
On Thu, 14 Nov 2013 18:55:52 +0400 Лежанкин Иван abys...@gmail.com wrote: I hit this limit in file 'diffcore-rename.c': if (rename_limit = 0 || rename_limit 32767) rename_limit = 32767; I just guess, that this limit comes from the O(N^2) complexity of the comparison algorithm. Since the max 32-bit signed value is 2^31, then the 2^15 = 32768 is somehow correlated with its square root, maybe, like 2^(32/2 - 1) - to prevent overflow. I'm trying to prepare the patch right now, that changes the `int rename_limit` = `long rename_limit` and all intermediate variable types. Is it a correct way to do? I beleive rename_limit comes from reading the diff.renameLimit configuration variable. The gitconfig(1) manual page hints to look at the -l command-line option of `git diff` which is described this way: -lnum The -M and -C options require O(n^2) processing time where n is the number of potential rename/copy targets. This option prevents rename/copy detection from running if the number of rename/copy targets exceeds the specified number. This description is not too clear, I admit. Looks like you're on the right track but the patch appears to require a more wide impact: * 32767 should be a default limit, applied in the case the user did not specify neither diff.renameLimit nor -l. * If whatever value read from those sources is less than 0, an error should be thrown--it looks strange to just revert it to the default value in this case. * If the user-supplied value is = 0, then just use it, assume the user knows what they are doing. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Error with git-svn pushing a rename
Hi guys, I'm using git as a subversion client, which works great so far. But today I tried to push back a rename to the subversion server, which resulted in the following error: perl: subversion/libsvn_subr/dirent_uri.c:2489: svn_fspath__skip_ancestor: Assertion `svn_fspath__is_canonical(child_fspath)' failed. error: git-svn died of signal 6 After searching the web I found a similar problem at stackoverflow: http://stackoverflow.com/questions/17693255/git-svn-dcommit-fails-because-of-assertion-error-svn-fspath-is-canonicalchildj So I tried to downgrade subversion (1.7.x) which didn't help. Then I also tried to downgrade git (1.7.x) which resulted in a different error: 'tempfile' can't be called as a method at /usr/share/perl5/vendor_perl/Git.pm line 1042. So I updated both packages to the current version: $ git --version git version 1.8.4.2 $ svn --version svn, version 1.8.3 (r1516576) compiled Sep 13 2013, 00:34:15 on x86_64-unknown-linux-gnu Copyright (C) 2013 The Apache Software Foundation. This software consists of contributions made by many people; see the NOTICE file for more information. Subversion is open source software, see http://subversion.apache.org/ The following repository access (RA) modules are available: * ra_svn : Module for accessing a repository using the svn network protocol. - with Cyrus SASL authentication - handles 'svn' scheme * ra_local : Module for accessing a repository on local disk. - handles 'file' scheme * ra_serf : Module for accessing a repository via WebDAV protocol using serf. - using serf 1.3.1 - handles 'http' scheme - handles 'https' scheme But now I'm back at the first error. Just for completeness, the error occurs on the following operation: $ git svn dcommit Committing to https://x ... R //SomeFile = //SomeOtherFile perl: subversion/libsvn_subr/dirent_uri.c:2489: svn_fspath__skip_ancestor: Assertion `svn_fspath__is_canonical(child_fspath)' failed. error: git-svn died of signal 6 Any ideas on handling this error? Sorry for the (wrongly sent) first mail (incomplete). Would be happy to hear from you. Greetings Ben -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
On 11/14/2013 01:44 PM, Jeff King wrote: On Thu, Nov 14, 2013 at 05:48:50AM +0530, Sitaram Chamarty wrote: Is there *any* way we can preserve a reflog for a deleted branch, perhaps under logs/refs/deleted/timestamp/full/ref/name ? I had patches to do something like this here: http://thread.gmane.org/gmane.comp.version-control.git/201715/focus=201752 but there were definitely some buggy corners, as much of the code assumed you needed to have a ref to have a reflog. I don't even run with it locally anymore. At GitHub, we log each change to an audit log in addition to the regular reflog (we also stuff extra data from the environment into the reflog message). So even after a branch is deleted, its audit log entries remain, though you have to pull out the data by hand (git doesn't know about it at all, except as an append-only sink for writing). And git doesn't use the audit log for connectivity, either, so eventually the objects could be pruned. Just some basic protection -- don't delete the reflog, and instead, rename it to something that preserves the name but in a different namespace. That part is easy. Accessing it seamlessly and handling reflog expiration are a little harder. Not because they're intractable, but just because there are some low-level assumptions in the git code. The patch series I mentioned above mostly works. It probably just needs somebody to go through and find the corner cases. The use cases I am talking about are those where someone deleted something and it was noticed well within Git's the earliest of Git's expire timeouts. So, no need to worry about expiry times and connecting it with object pruning. Really, just the eqvt of a cp or mv of one file is all that most people need. Gitolite's log is the same. So no one who uses Gitolite needs this feature. But people shouldn't have to install Gitolite or anything else just to get this either! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we prevent reflog deletion when branch is deleted?
I can't resist... On 11/14/2013 08:12 PM, Stephen Bash wrote: [snipped some stuff from Peff] [snipped 60 lines of python] In honor of your last name, here's what I would do if I needed to log ref updates (and wasn't using Gitolite): #!/bin/bash # -- use this as a post-receive hook while read old new ref do echo $(date +%F %T): Successful push by $USER of $ref from $old to $new done push-log.txt -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v1.8.5-rc2
On 13-11-13 06:07 PM, Junio C Hamano wrote: A release candidate Git v1.8.5-rc2 is now available for testing at the usual places. [ snip ] Updates since v1.8.4 Foreign interfaces, subsystems and ports. * git-svn used with SVN 1.8.0 when talking over https:// connection dumped core due to a bug in the serf library that SVN uses. Work it around on our side, even though the SVN side is being fixed. * On MacOS X, we detected if the filesystem needs the pre-composed unicode strings workaround, but did not automatically enable it. Now we do. * remote-hg remote helper misbehaved when interacting with a local Hg repository relative to the home directory, e.g. clone hg::~/there. * imap-send ported to OS X uses Apple's security framework instead of OpenSSL one. * Subversion 1.8.0 that was recently released breaks older subversion clients coming over http/https in various ways. Isn't this the same as the serf fixes ([1],[2])? If not, what git change is it referring to? BTW, $ git log --no-merges --oneline v1.8.4..v1.8.5-rc2 -- git-svn.perl perl/Git/SVN/ shows: f849bb6 git-svn: Warn about changing default for --prefix in Git v2.0 60786bd git-svn: fix signed commit parsing 73ffac3 git-svn: fix termination issues for remote svn connections 8ac251b git-svn: allow git-svn fetching to work using serf The release notes don't seem to mention 60786bd. M. [1] http://thread.gmane.org/gmane.comp.version-control.git/229720 [2] http://thread.gmane.org/gmane.comp.version-control.git/233701 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] RelNotes: Spelling grammar fixes.
--- Mostly just tweaks, although I did change the foo^{tag} description a lot. M. Documentation/RelNotes/1.8.5.txt | 158 --- 1 file changed, 80 insertions(+), 78 deletions(-) diff --git a/Documentation/RelNotes/1.8.5.txt b/Documentation/RelNotes/1.8.5.txt index 13b4336..352dbbb 100644 --- a/Documentation/RelNotes/1.8.5.txt +++ b/Documentation/RelNotes/1.8.5.txt @@ -8,7 +8,7 @@ When git push [$there] does not say what to push, we have used the traditional matching semantics so far (all your branches were sent to the remote as long as there already are branches of the same name over there). In Git 2.0, the default will change to the simple -semantics that pushes: +semantics, which pushes: - only the current branch to the branch with the same name, and only when the current branch is set to integrate with that remote @@ -55,7 +55,7 @@ Foreign interfaces, subsystems and ports. * git-svn used with SVN 1.8.0 when talking over https:// connection dumped core due to a bug in the serf library that SVN uses. Work - it around on our side, even though the SVN side is being fixed. + around it on our side, even though the SVN side is being fixed. * On MacOS X, we detected if the filesystem needs the pre-composed unicode strings workaround, but did not automatically enable it. @@ -65,7 +65,7 @@ Foreign interfaces, subsystems and ports. repository relative to the home directory, e.g. clone hg::~/there. * imap-send ported to OS X uses Apple's security framework instead of - OpenSSL one. + OpenSSL's. * Subversion 1.8.0 that was recently released breaks older subversion clients coming over http/https in various ways. @@ -79,22 +79,22 @@ UI, Workflows Features * xdg-open can be used as a browser backend for git web-browse (hence to show git help -w output), when available. - * git grep and git show pays attention to --textconv option + * git grep and git show pay attention to the --textconv option when these commands are told to operate on blob objects (e.g. git - grep -e pattern HEAD:Makefile). + grep -e pattern --textconv HEAD:Makefile). * git replace helper no longer allows an object to be replaced with another object of a different type to avoid confusion (you can - still manually craft such replacement using git update-ref, as an + still manually craft such a replacement using git update-ref, as an escape hatch). - * git status no longer prints dirty status information for + * git status no longer prints the dirty status information of submodules for which submodule.$name.ignore is set to all. * git rebase -i honours core.abbrev when preparing the insn sheet for editing. - * git status during a cherry-pick shows what original commit is + * git status during a cherry-pick shows which original commit is being picked. * Instead of typing four capital letters HEAD, you can say @ now, @@ -102,21 +102,21 @@ UI, Workflows Features * git check-ignore follows the same rule as git add and git status in that the ignore/exclude mechanism does not take effect - on paths that are already tracked. With --no-index option, it + on paths that are already tracked. With the --no-index option, it can be used to diagnose which paths that should have been ignored have been mistakenly added to the index. * Some irrelevant advice messages that are shared with git status output have been removed from the commit log template. - * update-refs learnt a --stdin option to read multiple update + * update-refs learned a --stdin option to read multiple update requests and perform them in an all-or-none fashion. * Just like make -C directory, git -C directory ... tells Git to go there before doing anything else. - * Just like git checkout - knows to check out and git merge - - knows to merge the branch you were previously on, git cherry-pick + * Just like git checkout - knows to check out, and git merge - + knows to merge, the branch you were previously on, git cherry-pick now understands git cherry-pick - to pick from the previous branch. @@ -126,56 +126,58 @@ UI, Workflows Features git status --porcelain instead, as its format is stable and easier to parse. - * Make foo^{tag} to peel a tag to itself, i.e. no-op., and fail if - foo is not a tag. git rev-parse --verify v1.0^{tag} would be - a more convenient way to say test $(git cat-file -t v1.0) = tag. + * The ref syntax foo^{tag} (with the literal string {tag}) peels a + tag ref to itself, i.e. it's a no-op., and fails if + foo is not a tag. git rev-parse --verify v1.0^{tag} is + a more convenient way than test $(git cat-file -t v1.0) = tag to + check if v1.0 is a tag. * git branch -v -v (and git status) did not distinguish among a - branch that does not build on any other branch, a branch that is in - sync with the branch it builds on, and a
[PATCH] branch: fix --verbose output column alignment
Commit f2e0873 (branch: report invalid tracking branch as gone) removed an early return from fill_tracking_info() in the path taken when 'git branch -v' lists a branch in sync with its upstream. This resulted in an unconditionally added space in front of the subject line: $ git branch -v * master f5eb3da commit pushed to upstream topic f935eb6 unpublished topic Instead, only add the trailing space if a decoration have been added. To catch this kind of whitespace breakage in the tests, be a bit less smart when filtering the output through sed. Signed-off-by: Torstein Hegge he...@resisty.net --- builtin/branch.c | 8 +++- t/t6040-tracking-info.sh | 24 +--- 2 files changed, 20 insertions(+), 12 deletions(-) diff --git a/builtin/branch.c b/builtin/branch.c index 0bb0e93..636a16e 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -424,6 +424,7 @@ static void fill_tracking_info(struct strbuf *stat, const char *branch_name, struct branch *branch = branch_get(branch_name); struct strbuf fancy = STRBUF_INIT; int upstream_is_gone = 0; + int added_decoration = 1; switch (stat_tracking_info(branch, ours, theirs)) { case 0: @@ -451,9 +452,13 @@ static void fill_tracking_info(struct strbuf *stat, const char *branch_name, if (upstream_is_gone) { if (show_upstream_ref) strbuf_addf(stat, _([%s: gone]), fancy.buf); + else + added_decoration = 0; } else if (!ours !theirs) { if (show_upstream_ref) strbuf_addf(stat, _([%s]), fancy.buf); + else + added_decoration = 0; } else if (!ours) { if (show_upstream_ref) strbuf_addf(stat, _([%s: behind %d]), fancy.buf, theirs); @@ -474,7 +479,8 @@ static void fill_tracking_info(struct strbuf *stat, const char *branch_name, ours, theirs); } strbuf_release(fancy); - strbuf_addch(stat, ' '); + if (added_decoration) + strbuf_addch(stat, ' '); free(ref); } diff --git a/t/t6040-tracking-info.sh b/t/t6040-tracking-info.sh index ba26cfe..7ac8fd0 100755 --- a/t/t6040-tracking-info.sh +++ b/t/t6040-tracking-info.sh @@ -39,12 +39,14 @@ test_expect_success setup ' advance h ' -script='s/^..\(b.\)[0-9a-f]*\[\([^]]*\)\].*/\1 \2/p' +script='s/^..\(b.\) *[0-9a-f]* \(.*\)$/\1 \2/p' cat expect \EOF -b1 ahead 1, behind 1 -b2 ahead 1, behind 1 -b3 behind 1 -b4 ahead 2 +b1 [ahead 1, behind 1] d +b2 [ahead 1, behind 1] d +b3 [behind 1] b +b4 [ahead 2] f +b5 g +b6 c EOF test_expect_success 'branch -v' ' @@ -57,12 +59,12 @@ test_expect_success 'branch -v' ' ' cat expect \EOF -b1 origin/master: ahead 1, behind 1 -b2 origin/master: ahead 1, behind 1 -b3 origin/master: behind 1 -b4 origin/master: ahead 2 -b5 brokenbase: gone -b6 origin/master +b1 [origin/master: ahead 1, behind 1] d +b2 [origin/master: ahead 1, behind 1] d +b3 [origin/master: behind 1] b +b4 [origin/master: ahead 2] f +b5 [brokenbase: gone] g +b6 [origin/master] c EOF test_expect_success 'branch -vv' ' -- 1.8.5.rc0.216.ge00de29 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 00/14] New hash table implementation
Also here: https://github.com/kblees/git/commits/kb/hashmap-v5 Changes since v3: - changed table resizing and grow / shrink thresholds (#02) - hashmap_free: changed free_function to boolean (#02, #06, #07, #09) - swapped xcalloc args (num elements, element size) (#02) - fixed pointer cast formatting (type*) - (type *) (#02) - removed function pointer casts (*fn)(...) - fn(...) (#02) - api-hashmap.txt: clarified removal / replacement of entries (#02) - update-index.c: fixed const-casts (#13, #14) Karsten Jens Lehmann (1): submodule: don't access the .gitmodules cache entry after removing it Karsten Blees (13): add a hashtable implementation that supports O(1) removal buitin/describe.c: use new hash map implementation diffcore-rename.c: move code around to prepare for the next patch diffcore-rename.c: simplify finding exact renames diffcore-rename.c: use new hash map implementation name-hash.c: use new hash map implementation for directories name-hash.c: remove unreferenced directory entries name-hash.c: use new hash map implementation for cache entries name-hash.c: remove cache entries instead of marking them CE_UNHASHED remove old hash.[ch] implementation fix 'git update-index --verbose --again' output builtin/update-index.c: cleanup update_one read-cache.c: fix memory leaks caused by removed cache entries Documentation/technical/api-hash.txt| 52 --- Documentation/technical/api-hashmap.txt | 235 + Makefile| 5 +- builtin/describe.c | 53 +++ builtin/rm.c| 2 +- builtin/update-index.c | 39 +++-- cache.h | 18 +-- diffcore-rename.c | 185 --- hash.c | 110 -- hash.h | 50 --- hashmap.c | 228 hashmap.h | 71 + name-hash.c | 156 +++ read-cache.c| 10 +- resolve-undo.c | 7 +- submodule.c | 25 +--- t/t0011-hashmap.sh | 240 ++ test-hashmap.c | 256 unpack-trees.c | 3 +- 19 files changed, 1216 insertions(+), 529 deletions(-) delete mode 100644 Documentation/technical/api-hash.txt create mode 100644 Documentation/technical/api-hashmap.txt delete mode 100644 hash.c delete mode 100644 hash.h create mode 100644 hashmap.c create mode 100644 hashmap.h create mode 100755 t/t0011-hashmap.sh create mode 100644 test-hashmap.c -- git diff kb/hashmap-v4: diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt index fc46e56..b2280f1 100644 --- a/Documentation/technical/api-hashmap.txt +++ b/Documentation/technical/api-hashmap.txt @@ -42,11 +42,6 @@ parameters that have the same hash code. When looking up an entry, the `key` and `keydata` parameters to hashmap_get and hashmap_remove are always passed as second and third argument, respectively. Otherwise, `keydata` is NULL. -`void (*hashmap_free_fn)(void *entry)`:: - - User-supplied function to free a hashmap_entry. If entries are simply - malloc'ed memory, use stdlib's free() function. - Functions - @@ -76,14 +71,14 @@ If the total number of entries is known in advance, the `initial_size` parameter may be used to preallocate a sufficiently large table and thus prevent expensive resizing. If 0, the table is dynamically resized. -`void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)`:: +`void hashmap_free(struct hashmap *map, int free_entries)`:: Frees a hashmap structure and allocated memory. + `map` is the hashmap to free. + -If `free_function` is specified (not NULL), it is called to free each -hashmap_entry in the map. If entries are simply malloc'ed, use stdlib's free(). +If `free_entries` is true, each hashmap_entry in the map is freed as well +(using stdlib's free()). `void hashmap_entry_init(void *entry, int hash)`:: @@ -127,17 +122,20 @@ call to `hashmap_get` or `hashmap_get_next`. `void *hashmap_put(struct hashmap *map, void *entry)`:: - Adds or replaces a hashmap entry. + Adds or replaces a hashmap entry. If the hashmap contains duplicate + entries equal to the specified entry, only one of them will be replaced. + `map` is the hashmap structure. + -`entry` is the entry to add or update. +`entry` is the entry to add or replace. + -Returns the previous entry, or NULL if not found (i.e. the entry was added). +Returns the replaced entry, or NULL if not found (i.e. the entry was added). `void *hashmap_remove(struct hashmap *map,
[PATCH v5 01/14] submodule: don't access the .gitmodules cache entry after removing it
From: Jens Lehmann jens.lehm...@web.de Commit 5fee995244e introduced the stage_updated_gitmodules() function to add submodule configuration updates to the index. It assumed that even after calling remove_cache_entry_at() the same cache entry would still be valid. This was true in the old days, as cache entries could never be freed, but that is not so sure in the present as there is ongoing work to free removed cache entries, which makes this code segfault. Fix that by calling add_file_to_cache() instead of open coding it. Also remove the could not find .gitmodules in index warning, as that won't happen in regular use cases (and by then just silently adding it to the index we do the right thing). Thanks-to: Karsten Blees karsten.bl...@gmail.com Signed-off-by: Jens Lehmann jens.lehm...@web.de Signed-off-by: Junio C Hamano gits...@pobox.com --- submodule.c | 25 + 1 file changed, 1 insertion(+), 24 deletions(-) diff --git a/submodule.c b/submodule.c index 1905d75..e388487 100644 --- a/submodule.c +++ b/submodule.c @@ -116,30 +116,7 @@ int remove_path_from_gitmodules(const char *path) void stage_updated_gitmodules(void) { - struct strbuf buf = STRBUF_INIT; - struct stat st; - int pos; - struct cache_entry *ce; - int namelen = strlen(.gitmodules); - - pos = cache_name_pos(.gitmodules, namelen); - if (pos 0) { - warning(_(could not find .gitmodules in index)); - return; - } - ce = active_cache[pos]; - ce-ce_flags = namelen; - if (strbuf_read_file(buf, .gitmodules, 0) 0) - die(_(reading updated .gitmodules failed)); - if (lstat(.gitmodules, st) 0) - die_errno(_(unable to stat updated .gitmodules)); - fill_stat_cache_info(ce, st); - ce-ce_mode = ce_mode_from_stat(ce, st.st_mode); - if (remove_cache_entry_at(pos) 0) - die(_(unable to remove .gitmodules from index)); - if (write_sha1_file(buf.buf, buf.len, blob_type, ce-sha1)) - die(_(adding updated .gitmodules failed)); - if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE)) + if (add_file_to_cache(.gitmodules, 0)) die(_(staging updated .gitmodules failed)); } -- 1.8.5.rc0.333.g5394214 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 03/14] buitin/describe.c: use new hash map implementation
Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- builtin/describe.c | 53 - 1 file changed, 24 insertions(+), 29 deletions(-) diff --git a/builtin/describe.c b/builtin/describe.c index 6f62109..0b2cef4 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -6,7 +6,7 @@ #include exec_cmd.h #include parse-options.h #include diff.h -#include hash.h +#include hashmap.h #include argv-array.h #define SEEN (1u 0) @@ -25,7 +25,7 @@ static int longformat; static int first_parent; static int abbrev = -1; /* unspecified */ static int max_candidates = 10; -static struct hash_table names; +static struct hashmap names; static int have_util; static const char *pattern; static int always; @@ -37,7 +37,7 @@ static const char *diff_index_args[] = { }; struct commit_name { - struct commit_name *next; + struct hashmap_entry entry; unsigned char peeled[20]; struct tag *tag; unsigned prio:2; /* annotated tag = 2, tag = 1, head = 0 */ @@ -50,6 +50,12 @@ static const char *prio_names[] = { head, lightweight, annotated, }; +static int commit_name_cmp(const struct commit_name *cn1, + const struct commit_name *cn2, const void *peeled) +{ + return hashcmp(cn1-peeled, peeled ? peeled : cn2-peeled); +} + static inline unsigned int hash_sha1(const unsigned char *sha1) { unsigned int hash; @@ -59,21 +65,9 @@ static inline unsigned int hash_sha1(const unsigned char *sha1) static inline struct commit_name *find_commit_name(const unsigned char *peeled) { - struct commit_name *n = lookup_hash(hash_sha1(peeled), names); - while (n !!hashcmp(peeled, n-peeled)) - n = n-next; - return n; -} - -static int set_util(void *chain, void *data) -{ - struct commit_name *n; - for (n = chain; n; n = n-next) { - struct commit *c = lookup_commit_reference_gently(n-peeled, 1); - if (c) - c-util = n; - } - return 0; + struct commit_name key; + hashmap_entry_init(key, hash_sha1(peeled)); + return hashmap_get(names, key, peeled); } static int replace_name(struct commit_name *e, @@ -118,16 +112,10 @@ static void add_to_known_names(const char *path, struct tag *tag = NULL; if (replace_name(e, prio, sha1, tag)) { if (!e) { - void **pos; e = xmalloc(sizeof(struct commit_name)); hashcpy(e-peeled, peeled); - pos = insert_hash(hash_sha1(peeled), e, names); - if (pos) { - e-next = *pos; - *pos = e; - } else { - e-next = NULL; - } + hashmap_entry_init(e, hash_sha1(peeled)); + hashmap_add(names, e); e-path = NULL; } e-tag = tag; @@ -292,7 +280,14 @@ static void describe(const char *arg, int last_one) fprintf(stderr, _(searching to describe %s\n), arg); if (!have_util) { - for_each_hash(names, set_util, NULL); + struct hashmap_iter iter; + struct commit *c; + struct commit_name *n = hashmap_iter_first(names, iter); + for (; n; n = hashmap_iter_next(iter)) { + c = lookup_commit_reference_gently(n-peeled, 1); + if (c) + c-util = n; + } have_util = 1; } @@ -463,9 +458,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix) return cmd_name_rev(args.argc, args.argv, prefix); } - init_hash(names); + hashmap_init(names, (hashmap_cmp_fn) commit_name_cmp, 0); for_each_rawref(get_name, NULL); - if (!names.nr !always) + if (!names.size !always) die(_(No names found, cannot describe anything.)); if (argc == 0) { -- 1.8.5.rc0.333.g5394214 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 04/14] diffcore-rename.c: move code around to prepare for the next patch
No actual code changes, just move hash_filespec up and outdent part of find_identical_files. Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- diffcore-rename.c | 98 +++ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 6c7a72f..008a60c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -248,6 +248,18 @@ struct file_similarity { struct file_similarity *next; }; +static unsigned int hash_filespec(struct diff_filespec *filespec) +{ + unsigned int hash; + if (!filespec-sha1_valid) { + if (diff_populate_filespec(filespec, 0)) + return 0; + hash_sha1_file(filespec-data, filespec-size, blob, filespec-sha1); + } + memcpy(hash, filespec-sha1, sizeof(hash)); + return hash; +} + static int find_identical_files(struct file_similarity *src, struct file_similarity *dst, struct diff_options *options) @@ -258,46 +270,46 @@ static int find_identical_files(struct file_similarity *src, * Walk over all the destinations ... */ do { - struct diff_filespec *target = dst-filespec; - struct file_similarity *p, *best; - int i = 100, best_score = -1; - - /* -* .. to find the best source match -*/ - best = NULL; - for (p = src; p; p = p-next) { - int score; - struct diff_filespec *source = p-filespec; - - /* False hash collision? */ - if (hashcmp(source-sha1, target-sha1)) - continue; - /* Non-regular files? If so, the modes must match! */ - if (!S_ISREG(source-mode) || !S_ISREG(target-mode)) { - if (source-mode != target-mode) - continue; - } - /* Give higher scores to sources that haven't been used already */ - score = !source-rename_used; - if (source-rename_used options-detect_rename != DIFF_DETECT_COPY) - continue; - score += basename_same(source, target); - if (score best_score) { - best = p; - best_score = score; - if (score == 2) - break; - } + struct diff_filespec *target = dst-filespec; + struct file_similarity *p, *best; + int i = 100, best_score = -1; - /* Too many identical alternatives? Pick one */ - if (!--i) - break; + /* +* .. to find the best source match +*/ + best = NULL; + for (p = src; p; p = p-next) { + int score; + struct diff_filespec *source = p-filespec; + + /* False hash collision? */ + if (hashcmp(source-sha1, target-sha1)) + continue; + /* Non-regular files? If so, the modes must match! */ + if (!S_ISREG(source-mode) || !S_ISREG(target-mode)) { + if (source-mode != target-mode) + continue; } - if (best) { - record_rename_pair(dst-index, best-index, MAX_SCORE); - renames++; + /* Give higher scores to sources that haven't been used already */ + score = !source-rename_used; + if (source-rename_used options-detect_rename != DIFF_DETECT_COPY) + continue; + score += basename_same(source, target); + if (score best_score) { + best = p; + best_score = score; + if (score == 2) + break; } + + /* Too many identical alternatives? Pick one */ + if (!--i) + break; + } + if (best) { + record_rename_pair(dst-index, best-index, MAX_SCORE); + renames++; + } } while ((dst = dst-next) != NULL); return renames; } @@ -343,18 +355,6 @@ static int find_same_files(void *ptr, void *data) return ret; } -static unsigned int hash_filespec(struct diff_filespec *filespec) -{ - unsigned int hash; - if (!filespec-sha1_valid) { - if (diff_populate_filespec(filespec, 0)) - return 0; -
Re: [PATCH v3 0/21] pack bitmaps
On 14/11/13 12:41, Jeff King wrote: Here's another iteration of the pack bitmaps series. Compared to v2, it changes: - misc style/typo fixes - portability fixes from Ramsay and Torsten Unfortunately, I didn't find time this weekend to finish the msvc build fixes. However, after a quick squint at these patches, I think you have almost done it for me! :-D I must have misunderstood the previous discussion, because my patch was written on the assumption that the ewah directory wouldn't be git-ified (e.g. #include git-compat-util.h). So, most of my patch is no longer necessary, given the use of the git compat header (and removal of system headers). I suspect that you only need to add an '#define PRIx64 I64x' definition (Hmm, probably to the compat/mingw.h header). I won't know for sure until I actually try them out, of course. I will wait until these patches land in pu. [Note: the msvc build is still broken, but the failure is not caused by these patches. Unfortunately, the tests in t5310-*.sh fail. However, if I include some debug code, the tests pass ... :-P ] The part of the patch I was still working on was ... - count-objects garbage-reporting patch from Duy - disable bitmaps when is_repository_shallow(); this also covers the case where the client is shallow, since we feed pack-objects a --shallow-file in that case. This used to done by checking !internal_rev_list, but that doesn't apply after cdab485. - ewah sources now properly use git-compat-util.h and do not include system headers - the ewah code uses ewah_malloc, ewah_realloc, and so forth to let the project use a particular allocator (and we want to use xmalloc and friends). And we defined those in pack-bitmap.h, but of course that had no effect on the ewah/*.c files that did not include pack-bitmap.h. Since we are hacking up and git-ifying libewok anyway, we can just set the hardcoded fallback to xmalloc instead of malloc. - the ewah code used gcc's __builtin_ctzll, but did not provide a suitable fallback. We now provide a fallback in C. ... here. I was messing around with several implementations (including the use of msvc compiler intrinsics) with the intention of doing some timing tests etc. [I suspected my C fallback function (a different implementation to yours) would be slightly faster.] ATB, Ramsay Jones -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 05/14] diffcore-rename.c: simplify finding exact renames
The find_exact_renames function currently only uses the hash table for grouping, i.e.: 1. add sources 2. add destinations 3. iterate all buckets, per bucket: 4. split sources from destinations 5. iterate destinations, per destination: 6. iterate sources to find best match This can be simplified by utilizing the lookup functionality of the hash table, i.e.: 1. add sources 2. iterate destinations, per destination: 3. lookup sources matching the current destination 4. iterate sources to find best match This saves several iterations and file_similarity allocations for the destinations. Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- diffcore-rename.c | 75 +++ 1 file changed, 20 insertions(+), 55 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 008a60c..cfeb408 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -243,7 +243,7 @@ static int score_compare(const void *a_, const void *b_) } struct file_similarity { - int src_dst, index; + int index; struct diff_filespec *filespec; struct file_similarity *next; }; @@ -260,25 +260,21 @@ static unsigned int hash_filespec(struct diff_filespec *filespec) return hash; } -static int find_identical_files(struct file_similarity *src, - struct file_similarity *dst, +static int find_identical_files(struct hash_table *srcs, + int dst_index, struct diff_options *options) { int renames = 0; - /* -* Walk over all the destinations ... -*/ - do { - struct diff_filespec *target = dst-filespec; + struct diff_filespec *target = rename_dst[dst_index].two; struct file_similarity *p, *best; int i = 100, best_score = -1; /* -* .. to find the best source match +* Find the best source match for specified destination. */ best = NULL; - for (p = src; p; p = p-next) { + for (p = lookup_hash(hash_filespec(target), srcs); p; p = p-next) { int score; struct diff_filespec *source = p-filespec; @@ -307,61 +303,28 @@ static int find_identical_files(struct file_similarity *src, break; } if (best) { - record_rename_pair(dst-index, best-index, MAX_SCORE); + record_rename_pair(dst_index, best-index, MAX_SCORE); renames++; } - } while ((dst = dst-next) != NULL); return renames; } -static void free_similarity_list(struct file_similarity *p) +static int free_similarity_list(void *p, void *unused) { while (p) { struct file_similarity *entry = p; - p = p-next; + p = entry-next; free(entry); } + return 0; } -static int find_same_files(void *ptr, void *data) -{ - int ret; - struct file_similarity *p = ptr; - struct file_similarity *src = NULL, *dst = NULL; - struct diff_options *options = data; - - /* Split the hash list up into sources and destinations */ - do { - struct file_similarity *entry = p; - p = p-next; - if (entry-src_dst 0) { - entry-next = src; - src = entry; - } else { - entry-next = dst; - dst = entry; - } - } while (p); - - /* -* If we have both sources *and* destinations, see if -* we can match them up -*/ - ret = (src dst) ? find_identical_files(src, dst, options) : 0; - - /* Free the hashes and return the number of renames found */ - free_similarity_list(src); - free_similarity_list(dst); - return ret; -} - -static void insert_file_table(struct hash_table *table, int src_dst, int index, struct diff_filespec *filespec) +static void insert_file_table(struct hash_table *table, int index, struct diff_filespec *filespec) { void **pos; unsigned int hash; struct file_similarity *entry = xmalloc(sizeof(*entry)); - entry-src_dst = src_dst; entry-index = index; entry-filespec = filespec; entry-next = NULL; @@ -385,24 +348,26 @@ static void insert_file_table(struct hash_table *table, int src_dst, int index, */ static int find_exact_renames(struct diff_options *options) { - int i; + int i, renames = 0; struct hash_table file_table; + /* Add all sources to the hash table */ init_hash(file_table); - preallocate_hash(file_table, rename_src_nr + rename_dst_nr); + preallocate_hash(file_table, rename_src_nr); for (i = 0; i rename_src_nr; i++) - insert_file_table(file_table, -1, i, rename_src[i].p-one); +
[PATCH v5 06/14] diffcore-rename.c: use new hash map implementation
Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- diffcore-rename.c | 48 +--- 1 file changed, 13 insertions(+), 35 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index cfeb408..9b4f068 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -4,7 +4,7 @@ #include cache.h #include diff.h #include diffcore.h -#include hash.h +#include hashmap.h #include progress.h /* Table of rename/copy destinations */ @@ -243,9 +243,9 @@ static int score_compare(const void *a_, const void *b_) } struct file_similarity { + struct hashmap_entry entry; int index; struct diff_filespec *filespec; - struct file_similarity *next; }; static unsigned int hash_filespec(struct diff_filespec *filespec) @@ -260,21 +260,22 @@ static unsigned int hash_filespec(struct diff_filespec *filespec) return hash; } -static int find_identical_files(struct hash_table *srcs, +static int find_identical_files(struct hashmap *srcs, int dst_index, struct diff_options *options) { int renames = 0; struct diff_filespec *target = rename_dst[dst_index].two; - struct file_similarity *p, *best; + struct file_similarity *p, *best, dst; int i = 100, best_score = -1; /* * Find the best source match for specified destination. */ best = NULL; - for (p = lookup_hash(hash_filespec(target), srcs); p; p = p-next) { + hashmap_entry_init(dst, hash_filespec(target)); + for (p = hashmap_get(srcs, dst, NULL); p; p = hashmap_get_next(srcs, p)) { int score; struct diff_filespec *source = p-filespec; @@ -309,34 +310,15 @@ static int find_identical_files(struct hash_table *srcs, return renames; } -static int free_similarity_list(void *p, void *unused) +static void insert_file_table(struct hashmap *table, int index, struct diff_filespec *filespec) { - while (p) { - struct file_similarity *entry = p; - p = entry-next; - free(entry); - } - return 0; -} - -static void insert_file_table(struct hash_table *table, int index, struct diff_filespec *filespec) -{ - void **pos; - unsigned int hash; struct file_similarity *entry = xmalloc(sizeof(*entry)); entry-index = index; entry-filespec = filespec; - entry-next = NULL; - - hash = hash_filespec(filespec); - pos = insert_hash(hash, entry, table); - /* We already had an entry there? */ - if (pos) { - entry-next = *pos; - *pos = entry; - } + hashmap_entry_init(entry, hash_filespec(filespec)); + hashmap_add(table, entry); } /* @@ -349,11 +331,10 @@ static void insert_file_table(struct hash_table *table, int index, struct diff_f static int find_exact_renames(struct diff_options *options) { int i, renames = 0; - struct hash_table file_table; + struct hashmap file_table; /* Add all sources to the hash table */ - init_hash(file_table); - preallocate_hash(file_table, rename_src_nr); + hashmap_init(file_table, NULL, rename_src_nr); for (i = 0; i rename_src_nr; i++) insert_file_table(file_table, i, rename_src[i].p-one); @@ -361,11 +342,8 @@ static int find_exact_renames(struct diff_options *options) for (i = 0; i rename_dst_nr; i++) renames += find_identical_files(file_table, i, options); - /* Free source file_similarity chains */ - for_each_hash(file_table, free_similarity_list, options); - - /* .. and free the hash data structure */ - free_hash(file_table); + /* Free the hash data structure and entries */ + hashmap_free(file_table, 1); return renames; } -- 1.8.5.rc0.333.g5394214 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 07/14] name-hash.c: use new hash map implementation for directories
Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- cache.h | 3 ++- name-hash.c | 77 +++-- 2 files changed, 20 insertions(+), 60 deletions(-) diff --git a/cache.h b/cache.h index d1f3c71..84e9ad6 100644 --- a/cache.h +++ b/cache.h @@ -4,6 +4,7 @@ #include git-compat-util.h #include strbuf.h #include hash.h +#include hashmap.h #include advice.h #include gettext.h #include convert.h @@ -278,7 +279,7 @@ struct index_state { unsigned name_hash_initialized : 1, initialized : 1; struct hash_table name_hash; - struct hash_table dir_hash; + struct hashmap dir_hash; }; extern struct index_state the_index; diff --git a/name-hash.c b/name-hash.c index e5b6e1a..c75fadf 100644 --- a/name-hash.c +++ b/name-hash.c @@ -8,49 +8,28 @@ #define NO_THE_INDEX_COMPATIBILITY_MACROS #include cache.h -/* - * This removes bit 5 if bit 6 is set. - * - * That will make US-ASCII characters hash to their upper-case - * equivalent. We could easily do this one whole word at a time, - * but that's for future worries. - */ -static inline unsigned char icase_hash(unsigned char c) -{ - return c ~((c 0x40) 1); -} - -static unsigned int hash_name(const char *name, int namelen) -{ - unsigned int hash = 0x123; - - while (namelen--) { - unsigned char c = *name++; - c = icase_hash(c); - hash = hash*101 + c; - } - return hash; -} - struct dir_entry { - struct dir_entry *next; + struct hashmap_entry ent; struct dir_entry *parent; struct cache_entry *ce; int nr; unsigned int namelen; }; +static int dir_entry_cmp(const struct dir_entry *e1, + const struct dir_entry *e2, const char *name) +{ + return e1-namelen != e2-namelen || strncasecmp(e1-ce-name, + name ? name : e2-ce-name, e1-namelen); +} + static struct dir_entry *find_dir_entry(struct index_state *istate, const char *name, unsigned int namelen) { - unsigned int hash = hash_name(name, namelen); - struct dir_entry *dir; - - for (dir = lookup_hash(hash, istate-dir_hash); dir; dir = dir-next) - if (dir-namelen == namelen - !strncasecmp(dir-ce-name, name, namelen)) - return dir; - return NULL; + struct dir_entry key; + hashmap_entry_init(key, memihash(name, namelen)); + key.namelen = namelen; + return hashmap_get(istate-dir_hash, key, name); } static struct dir_entry *hash_dir_entry(struct index_state *istate, @@ -84,18 +63,11 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate, dir = find_dir_entry(istate, ce-name, namelen); if (!dir) { /* not found, create it and add to hash table */ - void **pdir; - unsigned int hash = hash_name(ce-name, namelen); - dir = xcalloc(1, sizeof(struct dir_entry)); + hashmap_entry_init(dir, memihash(ce-name, namelen)); dir-namelen = namelen; dir-ce = ce; - - pdir = insert_hash(hash, dir, istate-dir_hash); - if (pdir) { - dir-next = *pdir; - *pdir = dir; - } + hashmap_add(istate-dir_hash, dir); /* recursively add missing parent directories */ dir-parent = hash_dir_entry(istate, ce, namelen); @@ -134,7 +106,7 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce) return; ce-ce_flags |= CE_HASHED; ce-next = NULL; - hash = hash_name(ce-name, ce_namelen(ce)); + hash = memihash(ce-name, ce_namelen(ce)); pos = insert_hash(hash, ce, istate-name_hash); if (pos) { ce-next = *pos; @@ -153,6 +125,7 @@ static void lazy_init_name_hash(struct index_state *istate) return; if (istate-cache_nr) preallocate_hash(istate-name_hash, istate-cache_nr); + hashmap_init(istate-dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0); for (nr = 0; nr istate-cache_nr; nr++) hash_index_entry(istate, istate-cache[nr]); istate-name_hash_initialized = 1; @@ -247,7 +220,7 @@ struct cache_entry *index_dir_exists(struct index_state *istate, const char *nam struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int icase) { - unsigned int hash = hash_name(name, namelen); + unsigned int hash = memihash(name, namelen); struct cache_entry *ce; lazy_init_name_hash(istate); @@ -270,26 +243,12 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na return index_file_exists(istate, name, namelen, icase); }
[PATCH v5 08/14] name-hash.c: remove unreferenced directory entries
The new hashmap implementation supports remove, so remove and free directory entries that are no longer referenced by active cache entries. Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- name-hash.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/name-hash.c b/name-hash.c index c75fadf..effe96d 100644 --- a/name-hash.c +++ b/name-hash.c @@ -86,15 +86,16 @@ static void add_dir_entry(struct index_state *istate, struct cache_entry *ce) static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce) { /* -* Release reference to the directory entry (and parents if 0). -* -* Note: we do not remove / free the entry because there's no -* hash.[ch]::remove_hash and dir-next may point to other entries -* that are still valid, so we must not free the memory. +* Release reference to the directory entry. If 0, remove and continue +* with parent directory. */ struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce)); - while (dir dir-nr !(--dir-nr)) - dir = dir-parent; + while (dir !(--dir-nr)) { + struct dir_entry *parent = dir-parent; + hashmap_remove(istate-dir_hash, dir, NULL); + free(dir); + dir = parent; + } } static void hash_index_entry(struct index_state *istate, struct cache_entry *ce) -- 1.8.5.rc0.333.g5394214 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 09/14] name-hash.c: use new hash map implementation for cache entries
Note: the ce-next = NULL; in unpack-trees.c::do_add_entry can safely be removed, as ce-next (now ce-ent.next) is always properly initialized in name-hash.c::hash_index_entry. Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- cache.h| 8 +--- name-hash.c| 24 unpack-trees.c | 1 - 3 files changed, 13 insertions(+), 20 deletions(-) diff --git a/cache.h b/cache.h index 84e9ad6..cedc7ae 100644 --- a/cache.h +++ b/cache.h @@ -131,12 +131,12 @@ struct stat_data { }; struct cache_entry { + struct hashmap_entry ent; struct stat_data ce_stat_data; unsigned int ce_mode; unsigned int ce_flags; unsigned int ce_namelen; unsigned char sha1[20]; - struct cache_entry *next; char name[FLEX_ARRAY]; /* more */ }; @@ -203,7 +203,9 @@ static inline void copy_cache_entry(struct cache_entry *dst, unsigned int state = dst-ce_flags CE_STATE_MASK; /* Don't copy hash chain and name */ - memcpy(dst, src, offsetof(struct cache_entry, next)); + memcpy(dst-ce_stat_data, src-ce_stat_data, + offsetof(struct cache_entry, name) - + offsetof(struct cache_entry, ce_stat_data)); /* Restore the hash state */ dst-ce_flags = (dst-ce_flags ~CE_STATE_MASK) | state; @@ -278,7 +280,7 @@ struct index_state { struct cache_time timestamp; unsigned name_hash_initialized : 1, initialized : 1; - struct hash_table name_hash; + struct hashmap name_hash; struct hashmap dir_hash; }; diff --git a/name-hash.c b/name-hash.c index effe96d..488eccf 100644 --- a/name-hash.c +++ b/name-hash.c @@ -100,19 +100,11 @@ static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce) static void hash_index_entry(struct index_state *istate, struct cache_entry *ce) { - void **pos; - unsigned int hash; - if (ce-ce_flags CE_HASHED) return; ce-ce_flags |= CE_HASHED; - ce-next = NULL; - hash = memihash(ce-name, ce_namelen(ce)); - pos = insert_hash(hash, ce, istate-name_hash); - if (pos) { - ce-next = *pos; - *pos = ce; - } + hashmap_entry_init(ce, memihash(ce-name, ce_namelen(ce))); + hashmap_add(istate-name_hash, ce); if (ignore_case !(ce-ce_flags CE_UNHASHED)) add_dir_entry(istate, ce); @@ -124,8 +116,7 @@ static void lazy_init_name_hash(struct index_state *istate) if (istate-name_hash_initialized) return; - if (istate-cache_nr) - preallocate_hash(istate-name_hash, istate-cache_nr); + hashmap_init(istate-name_hash, NULL, istate-cache_nr); hashmap_init(istate-dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0); for (nr = 0; nr istate-cache_nr; nr++) hash_index_entry(istate, istate-cache[nr]); @@ -221,18 +212,19 @@ struct cache_entry *index_dir_exists(struct index_state *istate, const char *nam struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int icase) { - unsigned int hash = memihash(name, namelen); struct cache_entry *ce; + struct hashmap_entry key; lazy_init_name_hash(istate); - ce = lookup_hash(hash, istate-name_hash); + hashmap_entry_init(key, memihash(name, namelen)); + ce = hashmap_get(istate-name_hash, key, NULL); while (ce) { if (!(ce-ce_flags CE_UNHASHED)) { if (same_name(ce, name, namelen, icase)) return ce; } - ce = ce-next; + ce = hashmap_get_next(istate-name_hash, ce); } return NULL; } @@ -250,6 +242,6 @@ void free_name_hash(struct index_state *istate) return; istate-name_hash_initialized = 0; - free_hash(istate-name_hash); + hashmap_free(istate-name_hash, 0); hashmap_free(istate-dir_hash, 1); } diff --git a/unpack-trees.c b/unpack-trees.c index ad3e9a0..f8985d4 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -110,7 +110,6 @@ static void do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce, if (set CE_REMOVE) set |= CE_WT_REMOVE; - ce-next = NULL; ce-ce_flags = (ce-ce_flags ~clear) | set; add_index_entry(o-result, ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE); -- 1.8.5.rc0.333.g5394214 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 10/14] name-hash.c: remove cache entries instead of marking them CE_UNHASHED
The new hashmap implementation supports remove, so really remove unused cache entries from the name hashmap instead of just marking them. The CE_UNHASHED flag and CE_STATE_MASK are no longer needed. Keep the CE_HASHED flag to prevent adding entries twice. Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- cache.h| 6 ++ name-hash.c| 46 ++ read-cache.c | 2 +- unpack-trees.c | 2 +- 4 files changed, 26 insertions(+), 30 deletions(-) diff --git a/cache.h b/cache.h index cedc7ae..05b8011 100644 --- a/cache.h +++ b/cache.h @@ -160,7 +160,6 @@ struct cache_entry { #define CE_ADDED (1 19) #define CE_HASHED(1 20) -#define CE_UNHASHED (1 21) #define CE_WT_REMOVE (1 22) /* remove in work directory */ #define CE_CONFLICTED(1 23) @@ -196,11 +195,10 @@ struct pathspec; * Copy the sha1 and stat state of a cache entry from one to * another. But we never change the name, or the hash state! */ -#define CE_STATE_MASK (CE_HASHED | CE_UNHASHED) static inline void copy_cache_entry(struct cache_entry *dst, const struct cache_entry *src) { - unsigned int state = dst-ce_flags CE_STATE_MASK; + unsigned int state = dst-ce_flags CE_HASHED; /* Don't copy hash chain and name */ memcpy(dst-ce_stat_data, src-ce_stat_data, @@ -208,7 +206,7 @@ static inline void copy_cache_entry(struct cache_entry *dst, offsetof(struct cache_entry, ce_stat_data)); /* Restore the hash state */ - dst-ce_flags = (dst-ce_flags ~CE_STATE_MASK) | state; + dst-ce_flags = (dst-ce_flags ~CE_HASHED) | state; } static inline unsigned create_ce_flags(unsigned stage) diff --git a/name-hash.c b/name-hash.c index 488eccf..9a3bd3f 100644 --- a/name-hash.c +++ b/name-hash.c @@ -106,17 +106,29 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce) hashmap_entry_init(ce, memihash(ce-name, ce_namelen(ce))); hashmap_add(istate-name_hash, ce); - if (ignore_case !(ce-ce_flags CE_UNHASHED)) + if (ignore_case) add_dir_entry(istate, ce); } +static int cache_entry_cmp(const struct cache_entry *ce1, + const struct cache_entry *ce2, const void *remove) +{ + /* +* For remove_name_hash, find the exact entry (pointer equality); for +* index_name_exists, find all entries with matching hash code and +* decide whether the entry matches in same_name. +*/ + return remove ? !(ce1 == ce2) : 0; +} + static void lazy_init_name_hash(struct index_state *istate) { int nr; if (istate-name_hash_initialized) return; - hashmap_init(istate-name_hash, NULL, istate-cache_nr); + hashmap_init(istate-name_hash, (hashmap_cmp_fn) cache_entry_cmp, + istate-cache_nr); hashmap_init(istate-dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0); for (nr = 0; nr istate-cache_nr; nr++) hash_index_entry(istate, istate-cache[nr]); @@ -125,31 +137,19 @@ static void lazy_init_name_hash(struct index_state *istate) void add_name_hash(struct index_state *istate, struct cache_entry *ce) { - /* if already hashed, add reference to directory entries */ - if (ignore_case (ce-ce_flags CE_STATE_MASK) == CE_STATE_MASK) - add_dir_entry(istate, ce); - - ce-ce_flags = ~CE_UNHASHED; if (istate-name_hash_initialized) hash_index_entry(istate, ce); } -/* - * We don't actually *remove* it, we can just mark it invalid so that - * we won't find it in lookups. - * - * Not only would we have to search the lists (simple enough), but - * we'd also have to rehash other hash buckets in case this makes the - * hash bucket empty (common). So it's much better to just mark - * it. - */ void remove_name_hash(struct index_state *istate, struct cache_entry *ce) { - /* if already hashed, release reference to directory entries */ - if (ignore_case (ce-ce_flags CE_STATE_MASK) == CE_HASHED) - remove_dir_entry(istate, ce); + if (!istate-name_hash_initialized || !(ce-ce_flags CE_HASHED)) + return; + ce-ce_flags = ~CE_HASHED; + hashmap_remove(istate-name_hash, ce, ce); - ce-ce_flags |= CE_UNHASHED; + if (ignore_case) + remove_dir_entry(istate, ce); } static int slow_same_name(const char *name1, int len1, const char *name2, int len2) @@ -220,10 +220,8 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na hashmap_entry_init(key, memihash(name, namelen)); ce = hashmap_get(istate-name_hash, key, NULL); while (ce) { - if (!(ce-ce_flags CE_UNHASHED)) { - if (same_name(ce, name, namelen,
[PATCH v5 11/14] remove old hash.[ch] implementation
Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- Documentation/technical/api-hash.txt | 52 - Makefile | 2 - cache.h | 1 - hash.c | 110 --- hash.h | 50 test-hashmap.c | 84 -- 6 files changed, 299 deletions(-) delete mode 100644 Documentation/technical/api-hash.txt delete mode 100644 hash.c delete mode 100644 hash.h diff --git a/Documentation/technical/api-hash.txt b/Documentation/technical/api-hash.txt deleted file mode 100644 index e5061e0..000 --- a/Documentation/technical/api-hash.txt +++ /dev/null @@ -1,52 +0,0 @@ -hash API - - -The hash API is a collection of simple hash table functions. Users are expected -to implement their own hashing. - -Data Structures - -`struct hash_table`:: - - The hash table structure. The `array` member points to the hash table - entries. The `size` member counts the total number of valid and invalid - entries in the table. The `nr` member keeps track of the number of - valid entries. - -`struct hash_table_entry`:: - - An opaque structure representing an entry in the hash table. The `hash` - member is the entry's hash key and the `ptr` member is the entry's - value. - -Functions -- - -`init_hash`:: - - Initialize the hash table. - -`free_hash`:: - - Release memory associated with the hash table. - -`insert_hash`:: - - Insert a pointer into the hash table. If an entry with that hash - already exists, a pointer to the existing entry's value is returned. - Otherwise NULL is returned. This allows callers to implement - chaining, etc. - -`lookup_hash`:: - - Lookup an entry in the hash table. If an entry with that hash exists - the entry's value is returned. Otherwise NULL is returned. - -`for_each_hash`:: - - Call a function for each entry in the hash table. The function is - expected to take the entry's value as its only argument and return an - int. If the function returns a negative int the loop is aborted - immediately. Otherwise, the return value is accumulated and the sum - returned upon completion of the loop. diff --git a/Makefile b/Makefile index 75bb1a6..d566768 100644 --- a/Makefile +++ b/Makefile @@ -673,7 +673,6 @@ LIB_H += git-compat-util.h LIB_H += gpg-interface.h LIB_H += graph.h LIB_H += grep.h -LIB_H += hash.h LIB_H += hashmap.h LIB_H += help.h LIB_H += http.h @@ -805,7 +804,6 @@ LIB_OBJS += gettext.o LIB_OBJS += gpg-interface.o LIB_OBJS += graph.o LIB_OBJS += grep.o -LIB_OBJS += hash.o LIB_OBJS += hashmap.o LIB_OBJS += help.o LIB_OBJS += hex.o diff --git a/cache.h b/cache.h index 05b8011..27067b8 100644 --- a/cache.h +++ b/cache.h @@ -3,7 +3,6 @@ #include git-compat-util.h #include strbuf.h -#include hash.h #include hashmap.h #include advice.h #include gettext.h diff --git a/hash.c b/hash.c deleted file mode 100644 index 749ecfe..000 --- a/hash.c +++ /dev/null @@ -1,110 +0,0 @@ -/* - * Some generic hashing helpers. - */ -#include cache.h -#include hash.h - -/* - * Look up a hash entry in the hash table. Return the pointer to - * the existing entry, or the empty slot if none existed. The caller - * can then look at the (*ptr) to see whether it existed or not. - */ -static struct hash_table_entry *lookup_hash_entry(unsigned int hash, const struct hash_table *table) -{ - unsigned int size = table-size, nr = hash % size; - struct hash_table_entry *array = table-array; - - while (array[nr].ptr) { - if (array[nr].hash == hash) - break; - nr++; - if (nr = size) - nr = 0; - } - return array + nr; -} - - -/* - * Insert a new hash entry pointer into the table. - * - * If that hash entry already existed, return the pointer to - * the existing entry (and the caller can create a list of the - * pointers or do anything else). If it didn't exist, return - * NULL (and the caller knows the pointer has been inserted). - */ -static void **insert_hash_entry(unsigned int hash, void *ptr, struct hash_table *table) -{ - struct hash_table_entry *entry = lookup_hash_entry(hash, table); - - if (!entry-ptr) { - entry-ptr = ptr; - entry-hash = hash; - table-nr++; - return NULL; - } - return entry-ptr; -} - -static void grow_hash_table(struct hash_table *table) -{ - unsigned int i; - unsigned int old_size = table-size, new_size; - struct hash_table_entry *old_array = table-array, *new_array; - - new_size = alloc_nr(old_size); - new_array = xcalloc(sizeof(struct hash_table_entry), new_size); -
[PATCH v5 12/14] fix 'git update-index --verbose --again' output
'git update-index --verbose' consistently reports paths relative to the work-tree root. The only exception is the '--again' option, which reports paths relative to the current working directory. Change do_reupdate to use non-prefixed paths. Signed-off-by: Karsten Blees bl...@dcon.de --- builtin/update-index.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index e3a10d7..d180d80 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -579,7 +579,7 @@ static int do_reupdate(int ac, const char **av, * or worse yet 'allow_replace', active_nr may decrease. */ save_nr = active_nr; - update_one(ce-name + prefix_length, prefix, prefix_length); + update_one(ce-name, NULL, 0); if (save_nr != active_nr) goto redo; } -- 1.8.5.rc0.333.g5394214 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 13/14] builtin/update-index.c: cleanup update_one
do_reupdate calls update_one with a cache_entry.name, there's no need for the extra sanitation / normalization that happens in prefix_path. cmd_update_index calls update_one with an already prefixed path, no need to prefix_path twice. Remove the extra prefix_path from update_one. Also remove the now unused 'prefix' and 'prefix_length' parameters. As of d089eba setup: sanitize absolute and funny paths in get_pathspec(), prefix_path uncoditionally returns a copy, even if the passed in path isn't changed. Lets unconditionally free() the result. Signed-off-by: Karsten Blees bl...@dcon.de --- builtin/update-index.c | 36 +++- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index d180d80..c8f0d5f 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -274,36 +274,32 @@ static void chmod_path(int flip, const char *path) die(git update-index: cannot chmod %cx '%s', flip, path); } -static void update_one(const char *path, const char *prefix, int prefix_length) +static void update_one(const char *path) { - const char *p = prefix_path(prefix, prefix_length, path); - if (!verify_path(p)) { + if (!verify_path(path)) { fprintf(stderr, Ignoring path %s\n, path); - goto free_return; + return; } if (mark_valid_only) { - if (mark_ce_flags(p, CE_VALID, mark_valid_only == MARK_FLAG)) + if (mark_ce_flags(path, CE_VALID, mark_valid_only == MARK_FLAG)) die(Unable to mark file %s, path); - goto free_return; + return; } if (mark_skip_worktree_only) { - if (mark_ce_flags(p, CE_SKIP_WORKTREE, mark_skip_worktree_only == MARK_FLAG)) + if (mark_ce_flags(path, CE_SKIP_WORKTREE, mark_skip_worktree_only == MARK_FLAG)) die(Unable to mark file %s, path); - goto free_return; + return; } if (force_remove) { - if (remove_file_from_cache(p)) + if (remove_file_from_cache(path)) die(git update-index: unable to remove %s, path); report(remove '%s', path); - goto free_return; + return; } - if (process_path(p)) + if (process_path(path)) die(Unable to process path %s, path); report(add '%s', path); - free_return: - if (p path || p path + strlen(path)) - free((char *)p); } static void read_index_info(int line_termination) @@ -579,7 +575,7 @@ static int do_reupdate(int ac, const char **av, * or worse yet 'allow_replace', active_nr may decrease. */ save_nr = active_nr; - update_one(ce-name, NULL, 0); + update_one(ce-name); if (save_nr != active_nr) goto redo; } @@ -836,11 +832,10 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) setup_work_tree(); p = prefix_path(prefix, prefix_length, path); - update_one(p, NULL, 0); + update_one(p); if (set_executable_bit) chmod_path(set_executable_bit, p); - if (p path || p path + strlen(path)) - free((char *)p); + free((char *)p); ctx.argc--; ctx.argv++; break; @@ -879,11 +874,10 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_swap(buf, nbuf); } p = prefix_path(prefix, prefix_length, buf.buf); - update_one(p, NULL, 0); + update_one(p); if (set_executable_bit) chmod_path(set_executable_bit, p); - if (p buf.buf || p buf.buf + buf.len) - free((char *)p); + free((char *)p); } strbuf_release(nbuf); strbuf_release(buf); -- 1.8.5.rc0.333.g5394214 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 14/14] read-cache.c: fix memory leaks caused by removed cache entries
When cache_entry structs are removed from index_state.cache, they are not properly freed. Freeing those entries wasn't possible before because we couldn't remove them from index_state.name_hash. Now that we _do_ remove the entries from name_hash, we can also free them. Add 'free(cache_entry)' to all call sites of name-hash.c::remove_name_hash in read-cache.c (we could free() directly in remove_name_hash(), but name-hash.c isn't concerned with cache_entry allocation at all). Accessing a cache_entry after removing it from the index is now no longer allowed, as the memory has been freed. The following functions need minor fixes (typically by copying ce-name before use): - builtin/rm.c::cmd_rm - builtin/update-index.c::do_reupdate - read-cache.c::read_index_unmerged - resolve-undo.c::unmerge_index_entry_at Signed-off-by: Karsten Blees bl...@dcon.de --- builtin/rm.c | 2 +- builtin/update-index.c | 5 - read-cache.c | 8 ++-- resolve-undo.c | 7 +-- 4 files changed, 16 insertions(+), 6 deletions(-) diff --git a/builtin/rm.c b/builtin/rm.c index 3a0e0ea..171f37c 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -311,7 +311,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix) if (!match_pathspec_depth(pathspec, ce-name, ce_namelen(ce), 0, seen)) continue; ALLOC_GROW(list.entry, list.nr + 1, list.alloc); - list.entry[list.nr].name = ce-name; + list.entry[list.nr].name = xstrdup(ce-name); list.entry[list.nr].is_submodule = S_ISGITLINK(ce-ce_mode); if (list.entry[list.nr++].is_submodule !is_staging_gitmodules_ok()) diff --git a/builtin/update-index.c b/builtin/update-index.c index c8f0d5f..00313f3 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -559,6 +559,7 @@ static int do_reupdate(int ac, const char **av, const struct cache_entry *ce = active_cache[pos]; struct cache_entry *old = NULL; int save_nr; + char *path; if (ce_stage(ce) || !ce_path_match(ce, pathspec)) continue; @@ -575,7 +576,9 @@ static int do_reupdate(int ac, const char **av, * or worse yet 'allow_replace', active_nr may decrease. */ save_nr = active_nr; - update_one(ce-name); + path = xstrdup(ce-name); + update_one(path); + free(path); if (save_nr != active_nr) goto redo; } diff --git a/read-cache.c b/read-cache.c index 00af9ad..3f735f3 100644 --- a/read-cache.c +++ b/read-cache.c @@ -47,6 +47,7 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache struct cache_entry *old = istate-cache[nr]; remove_name_hash(istate, old); + free(old); set_index_entry(istate, nr, ce); istate-cache_changed = 1; } @@ -478,6 +479,7 @@ int remove_index_entry_at(struct index_state *istate, int pos) record_resolve_undo(istate, ce); remove_name_hash(istate, ce); + free(ce); istate-cache_changed = 1; istate-cache_nr--; if (pos = istate-cache_nr) @@ -499,8 +501,10 @@ void remove_marked_cache_entries(struct index_state *istate) unsigned int i, j; for (i = j = 0; i istate-cache_nr; i++) { - if (ce_array[i]-ce_flags CE_REMOVE) + if (ce_array[i]-ce_flags CE_REMOVE) { remove_name_hash(istate, ce_array[i]); + free(ce_array[i]); + } else ce_array[j++] = ce_array[i]; } @@ -1894,7 +1898,7 @@ int read_index_unmerged(struct index_state *istate) new_ce-ce_mode = ce-ce_mode; if (add_index_entry(istate, new_ce, 0)) return error(%s: cannot drop to stage #0, -ce-name); +new_ce-name); i = index_name_pos(istate, new_ce-name, len); } return unmerged; diff --git a/resolve-undo.c b/resolve-undo.c index c09b006..49ebaaf 100644 --- a/resolve-undo.c +++ b/resolve-undo.c @@ -119,6 +119,7 @@ int unmerge_index_entry_at(struct index_state *istate, int pos) struct string_list_item *item; struct resolve_undo_info *ru; int i, err = 0, matched; + char *name; if (!istate-resolve_undo) return pos; @@ -138,20 +139,22 @@ int unmerge_index_entry_at(struct index_state *istate, int pos) if (!ru) return pos; matched = ce-ce_flags CE_MATCHED; + name = xstrdup(ce-name); remove_index_entry_at(istate, pos); for (i = 0; i 3; i++) { struct cache_entry *nce; if (!ru-mode[i])
[PATCH v5 02/14] add a hashtable implementation that supports O(1) removal
The existing hashtable implementation (in hash.[ch]) uses open addressing (i.e. resolve hash collisions by distributing entries across the table). Thus, removal is difficult to implement with less than O(n) complexity. Resolving collisions of entries with identical hashes (e.g. via chaining) is left to the client code. Add a hashtable implementation that supports O(1) removal and is slightly easier to use due to builtin entry chaining. Supports all basic operations init, free, get, add, remove and iteration. Also includes ready-to-use hash functions based on the public domain FNV-1 algorithm (http://www.isthe.com/chongo/tech/comp/fnv). The per-entry data structure (hashmap_entry) is piggybacked in front of the client's data structure to save memory. See test-hashmap.c for usage examples. The hashtable is resized by a factor of four when 80% full. With these settings, average memory consumption is about 2/3 of hash.[ch], and insertion is about twice as fast due to less frequent resizing. Lookups are also slightly faster, because entries are strictly confined to their bucket (i.e. no data of other buckets needs to be traversed). Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Junio C Hamano gits...@pobox.com --- Documentation/technical/api-hashmap.txt | 235 ++ Makefile| 3 + hashmap.c | 228 + hashmap.h | 71 +++ t/t0011-hashmap.sh | 240 ++ test-hashmap.c | 340 6 files changed, 1117 insertions(+) create mode 100644 Documentation/technical/api-hashmap.txt create mode 100644 hashmap.c create mode 100644 hashmap.h create mode 100755 t/t0011-hashmap.sh create mode 100644 test-hashmap.c diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt new file mode 100644 index 000..b2280f1 --- /dev/null +++ b/Documentation/technical/api-hashmap.txt @@ -0,0 +1,235 @@ +hashmap API +=== + +The hashmap API is a generic implementation of hash-based key-value mappings. + +Data Structures +--- + +`struct hashmap`:: + + The hash table structure. ++ +The `size` member keeps track of the total number of entries. The `cmpfn` +member is a function used to compare two entries for equality. The `table` and +`tablesize` members store the hash table and its size, respectively. + +`struct hashmap_entry`:: + + An opaque structure representing an entry in the hash table, which must + be used as first member of user data structures. Ideally it should be + followed by an int-sized member to prevent unused memory on 64-bit + systems due to alignment. ++ +The `hash` member is the entry's hash code and the `next` member points to the +next entry in case of collisions (i.e. if multiple entries map to the same +bucket). + +`struct hashmap_iter`:: + + An iterator structure, to be used with hashmap_iter_* functions. + +Types +- + +`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`:: + + User-supplied function to test two hashmap entries for equality. Shall + return 0 if the entries are equal. ++ +This function is always called with non-NULL `entry` / `entry_or_key` +parameters that have the same hash code. When looking up an entry, the `key` +and `keydata` parameters to hashmap_get and hashmap_remove are always passed +as second and third argument, respectively. Otherwise, `keydata` is NULL. + +Functions +- + +`unsigned int strhash(const char *buf)`:: +`unsigned int strihash(const char *buf)`:: +`unsigned int memhash(const void *buf, size_t len)`:: +`unsigned int memihash(const void *buf, size_t len)`:: + + Ready-to-use hash functions for strings, using the FNV-1 algorithm (see + http://www.isthe.com/chongo/tech/comp/fnv). ++ +`strhash` and `strihash` take 0-terminated strings, while `memhash` and +`memihash` operate on arbitrary-length memory. ++ +`strihash` and `memihash` are case insensitive versions. + +`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`:: + + Initializes a hashmap structure. ++ +`map` is the hashmap to initialize. ++ +The `equals_function` can be specified to compare two entries for equality. +If NULL, entries are considered equal if their hash codes are equal. ++ +If the total number of entries is known in advance, the `initial_size` +parameter may be used to preallocate a sufficiently large table and thus +prevent expensive resizing. If 0, the table is dynamically resized. + +`void hashmap_free(struct hashmap *map, int free_entries)`:: + + Frees a hashmap structure and allocated memory. ++ +`map` is the hashmap to free. ++ +If `free_entries` is true, each hashmap_entry in the map is freed as well +(using stdlib's free()). + +`void
Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
Am 29.10.2013 10:09, schrieb Karsten Blees: On Mon, Oct 28, 2013 at 10:04 PM, Vicent Martí tan...@gmail.com wrote: On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees karsten.bl...@gmail.com wrote: Regarding performance, khash uses open addressing, which requires more key comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). However, any measurable differences will most likely be dwarfed by IO costs in this particular use case. I don't think this is true. If you actually run a couple insertion and lookup benchmarks comparing the two implementations, you'll find khash to be around ~30% faster for most workloads (venturing a guess from past experience). I am obviously not the author of khash, but I've ... Just out of curiosity, I added performance test code for khash to the test in my current hashmap patch series [1]. It turns out that khash is by far the slowest of the bunch, especially with many collisions. Again, I don't think that performance matters all that much (or in other words: _any_ hash table implementation will probably be fast enough compared to the rest that's going on). Its more a question of whether we really need two different hash table implementations (and a queasy feeling about the macro kludge in khash.h...). Khash doesn't store the hash codes along with the entries (as both hash.[ch] and hashmap.[ch] do), so it needs to re-calculate hash codes on every resize. For a fair comparison, the khash test uses keys with pre-calculated hash codes in the key structure. This should be similar to a hash function that just copies 4 bytes from a sha1 key. Khash maps with more complex hash functions will be slower (see khstr). The khstr test uses khash's predefined string map and khash's X31 hash function for strings (therefore no separate values for different hash functions here). The table is similar to what I posted for hashmap-v2 [2] (i.e. real time in seconds for 1,000 rounds á 100,000 entries). I just turned it around a bit to make room for khash columns. test | hash_fn | hashmap | hash | khash | khstr | -+-+-+-+-++ | FNV | 2.429 | 14.366 | 11.780 | 18.677 | | FNV x2 | 2.946 | 14.558 | 10.922 || add | i | 1.708 | 7.419 | 4.132 || | ix2 | 1.791 | 8.565 | 4.502 || | i/10| 1.555 | 1.805 | 344.691 || | i/10 x2 | 1.543 | 1.808 | 319.559 || -+-+-+-+-++ | FNV | 1.822 | 3.452 | 4.922 | 8.309 | get | FNV x2 | 2.298 | 3.194 | 4.473 || 100% | i | 1.252 | 1.344 | 0.944 || hits | ix2 | 1.286 | 1.434 | 1.220 || | i/10| 6.720 | 5.138 | 281.815 || | i/10 x2 | 6.297 | 5.188 | 257.021 || -+-+-+-+-++ | FNV | 1.023 | 3.949 | 4.115 | 4.878 | get | FNV x2 | 1.538 | 3.915 | 4.571 || 10% | i | 0.654 | 397.457 | 38.125 || hits | ix2 | 0.718 | 0.722 | 9.111 || | i/10| 1.128 | 30.235 | 60.376 || | i/10 x2 | 1.260 | 1.082 | 43.354 || -+-+-+-+-++ [1] https://github.com/kblees/git/commits/kb/hashmap-v5-khash [2] http://article.gmane.org/gmane.comp.version-control.git/235290 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] config: arbitrary number of matches for --unset and --replace-all
Jeff King p...@peff.net writes: This code is weird to follow because of the fall-throughs. I do not think you have introduced any bugs with your patch, but it seems weird to me that we set the offset at the top of the hunk. If we hit the conditional in the bottom half, we do actually increment storer.seen, but only _after_ having overwritten the value from above (with the same value, no less). But if we do not follow that code path, we may end up here: @@ -1272,6 +1275,9 @@ static int store_aux(const char *key, const char *value, void *cb) if (strrchr(key, '.') - key == store.baselen !strncmp(key, store.key, store.baselen)) { store.state = SECTION_SEEN; +ALLOC_GROW(store.offset, + store.seen+1, + store.offset_alloc); store.offset[store.seen] = cf-do_ftell(cf); } } where we overwrite it again, but do not update store.seen. Or we may trigger neither, and leave the function with our offset stored, but store.seen not incremented. It's doubly strange that we write in this hunk without any protection against overflow. I was too lazy to think about it long enough to come up with a possible example that triggers this, and instead just put in the defensive ALLOC_GROW(). But if you can trigger it, it will probably cause the algorithm to go off the rails because it overwrote store.state and possibly even store.seen. -- Thomas Rast t...@thomasrast.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/21] pack bitmaps
On Thu, Nov 14, 2013 at 07:19:38PM +, Ramsay Jones wrote: Unfortunately, I didn't find time this weekend to finish the msvc build fixes. However, after a quick squint at these patches, I think you have almost done it for me! :-D I must have misunderstood the previous discussion, because my patch was written on the assumption that the ewah directory wouldn't be git-ified (e.g. #include git-compat-util.h). I think it was up for debate at some point, but we did decide to go ahead and git-ify. Please feel free to submit further fixups if you need them. - the ewah code used gcc's __builtin_ctzll, but did not provide a suitable fallback. We now provide a fallback in C. ... here. I was messing around with several implementations (including the use of msvc compiler intrinsics) with the intention of doing some timing tests etc. [I suspected my C fallback function (a different implementation to yours) would be slightly faster.] Yeah, I looked around for several implementations, and ultimately wrote one that was the most readable to me. The one I found shortest and most inscrutable was: return popcount((x -x) - 1); the details of which I still haven't worked through in my head. ;) I do think on most platforms that intrinsics or inline assembler are the way to go. My main goal was to get something correct that would let it compile everywhere, and then people can use that as a base for optimizing. Patches welcome. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 64-bit support.
On Thu, Nov 14, 2013 at 07:11:31PM +0400, Konstantin Khomoutov wrote: I just guess, that this limit comes from the O(N^2) complexity of the comparison algorithm. Since the max 32-bit signed value is 2^31, then the 2^15 = 32768 is somehow correlated with its square root, maybe, like 2^(32/2 - 1) - to prevent overflow. I'm trying to prepare the patch right now, that changes the `int rename_limit` = `long rename_limit` and all intermediate variable types. Is it a correct way to do? Yes, the 32768 limit is there to prevent 32-bit overflow of the squared value. We can use a uint64_t and bump the overflow limit to 2^32 if we are actually bumping up against this. I beleive rename_limit comes from reading the diff.renameLimit configuration variable. The gitconfig(1) manual page hints to look at the -l command-line option of `git diff` which is described this way: Yes, though there are also builtin defaults, depending on the operation (400 for diffs, 1000 for merges). Looks like you're on the right track but the patch appears to require a more wide impact: * 32767 should be a default limit, applied in the case the user did not specify neither diff.renameLimit nor -l. That is probably too high for a default limit. The point is that we are making an O(n^2) matrix, and consuming CPU and memory in line with that. The defaults were determined empirically on reasonable times (and are now based on old hardware, so it may be time to bump them up). The idea was that 32767 was unreasonably high, and you would not want to spend that much CPU time anyway. On my shiny new machine, doing a 2048^2 matrix takes 5s. A 32767^2 matrix would be 21 minutes. And it was also expected that you would simply not have that big a matrix anyway (you might 32767 candidates on one side, but you will not have a square bigger than rename_limit^2, which is what this is checking). * If whatever value read from those sources is less than 0, an error should be thrown--it looks strange to just revert it to the default value in this case. It's not the default, though. The default is something like 400. So saying -1 (or 0) is a way of saying turn off the limit without having to guess at an arbitrarily high number. * If the user-supplied value is = 0, then just use it, assume the user knows what they are doing. At some point we have to deal with integer overflow. It was assumed that a 32-bit max was big enough and that nobody would have a real world case that hit that. I'd be curious to hear if that has actually been hit, or if there is simply confusion over how the rename-limit config works (the limit is the square root of the max matrix size). Here's what the 64-bit patch could look like (I notice that we are not using unsigned 32-bit integers, either, and we probably should be): --- diff --git a/diffcore-rename.c b/diffcore-rename.c index 6c7a72f..8356a62 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -427,11 +427,16 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) * 1 if we need to disable inexact rename detection; * 2 if we would be under the limit if we were given -C instead of -C -C. */ -static int too_many_rename_candidates(int num_create, +static int too_many_rename_candidates(uint64_t num_create, struct diff_options *options) { - int rename_limit = options-rename_limit; - int num_src = rename_src_nr; + /* +* signed to unsigned conversion here is intentional, and +* converts limits 0 into large (effectively infinite) +* limits +*/ + uint64_t rename_limit = options-rename_limit; + uint64_t num_src = rename_src_nr; int i; options-needed_rename_limit = 0; @@ -442,11 +447,10 @@ static int too_many_rename_candidates(int num_create, * *num_create * num_src rename_limit * rename_limit * -* but handles the potential overflow case specially (and we -* assume at least 32-bit integers) +* but handles the potential overflow case specially. */ - if (rename_limit = 0 || rename_limit 32767) - rename_limit = 32767; + if (!rename_limit || rename_limit 4294967295) + rename_limit = 4294967295; if ((num_create = rename_limit || num_src = rename_limit) (num_create * num_src = rename_limit * rename_limit)) return 0; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] State correct usage of literal examples in man pages in the coding standards
The man pages contain inconsistent usage of backticks vs. single quotes around options, commands, etc. that are in paragraphs. This commit states that backticks should always be used around literal examples. This commit states that -- and friends should not be escaped (e.g. use `--pretty=oneline` instead of `\--pretty=oneline`). This commit also states correct usage for typesetting command usage examples with inline substitutions. Thanks-to: Ramkumar Ramachandra artag...@gmail.com Thanks-to: Stuart Rackham srack...@gmail.com Thanks-to: Junio C Hamano gits...@pobox.com Signed-off-by: Jason St. John jstj...@purdue.edu --- This is a resubmit of a previous patch: http://marc.info/?l=gitm=138431653412495w=2 I added a Thanks-to for Stuart Rackham because he's listed as the author of the AsciiDoc User Guide, which part of this commit relies heavily on. Documentation/CodingGuidelines | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index a600e35..ef67b53 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -260,9 +260,11 @@ Writing Documentation: Every user-visible change should be reflected in the documentation. The same general rule as for code applies -- imitate the existing - conventions. A few commented examples follow to provide reference - when writing or modifying command usage strings and synopsis sections - in the manual pages: + conventions. + + A few commented examples follow to provide reference when writing or + modifying command usage strings and synopsis sections in the manual + pages: Placeholders are spelled in lowercase and enclosed in angle brackets: file @@ -312,3 +314,29 @@ Writing Documentation: Use 'git' (all lowercase) when talking about commands i.e. something the user would type into a shell and use 'Git' (uppercase first letter) when talking about the version control system and its properties. + + A few commented examples follow to provide reference when writing or + modifying paragraphs or option/command explanations that contain options + or commands: + + Literal examples (e.g. use of command-line options, command names, and + configuration variables) are typeset in monospace, and if you can use + `backticks around word phrases`, do so. + `--pretty=oneline` + `git rev-list` + `remote.pushdefault` + + Word phrases enclosed in `backtick characters` are rendered literally + and will not be further expanded. The use of `backticks` to achieve the + previous rule means that literal examples should not use AsciiDoc + escapes. + Correct: + `--pretty=oneline` + Incorrect: + `\--pretty=oneline` + + If some place in the documentation needs to typeset a command usage + example with inline substitutions, it is fine to use +monospaced and + inline substituted text+ instead of `monospaced literal text`, and with + the former, the part that should not get substituted must be + quoted/escaped. -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/21] pack bitmaps
On 14/11/13 21:33, Jeff King wrote: On Thu, Nov 14, 2013 at 07:19:38PM +, Ramsay Jones wrote: Unfortunately, I didn't find time this weekend to finish the msvc build fixes. However, after a quick squint at these patches, I think you have almost done it for me! :-D I must have misunderstood the previous discussion, because my patch was written on the assumption that the ewah directory wouldn't be git-ified (e.g. #include git-compat-util.h). I think it was up for debate at some point, but we did decide to go ahead and git-ify. Please feel free to submit further fixups if you need them. Yep, will do; at present it looks like that one-liner. - the ewah code used gcc's __builtin_ctzll, but did not provide a suitable fallback. We now provide a fallback in C. ... here. I was messing around with several implementations (including the use of msvc compiler intrinsics) with the intention of doing some timing tests etc. [I suspected my C fallback function (a different implementation to yours) would be slightly faster.] Yeah, I looked around for several implementations, and ultimately wrote one that was the most readable to me. The one I found shortest and most inscrutable was: return popcount((x -x) - 1); the details of which I still haven't worked through in my head. ;) Yeah, I stumbled over that one too! I do think on most platforms that intrinsics or inline assembler are the way to go. My main goal was to get something correct that would let it compile everywhere, and then people can use that as a base for optimizing. Patches welcome. :) Indeed, I can happily leave that to another day (or to someone else more motivated ;-) ATB, Ramsay Jones -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv3 1/2] Fix single quotes, AsciiDoc escaping, and other formatting issues
rev-list-options.txt: -- Remove blank lines after some options subheadings to fix syntax highlighting in Vim -- Typeset literal options, commands, and path names in monospace -- Remove AsciiDoc escapes with literal -- Replace some double quotes with proper AsciiDoc quotes (e.g. ``foo'') Signed-off-by: Jason St. John jstj...@purdue.edu --- This is a resubmit of a previous patch: http://marc.info/?l=gitm=138431995913311w=2 I decided to remove the blank lines after option subheadings because the syntax highlighting in Vim actually looks better with them removed. I'm not sure how this affects line-rewrapping of the body text in Emacs. I split the previous patch into two. This patch deals strictly with formatting issues and backticks/literals. The second patch contains the grammatical and typo fixes. Documentation/rev-list-options.txt | 226 + 1 file changed, 78 insertions(+), 148 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index ec86d09..eb4b6bf 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -18,33 +18,27 @@ ordering and formatting options, such as `--reverse`. -number:: -n number:: --max-count=number:: - Limit the number of commits to output. --skip=number:: - Skip 'number' commits before starting to show the commit output. --since=date:: --after=date:: - Show commits more recent than a specific date. --until=date:: --before=date:: - Show commits older than a specific date. ifdef::git-rev-list[] --max-age=timestamp:: --min-age=timestamp:: - Limit the commits output to specified time range. endif::git-rev-list[] --author=pattern:: --committer=pattern:: - Limit the commits output to ones with author/committer header lines that match the specified pattern (regular expression). With more than one `--author=pattern`, @@ -52,7 +46,6 @@ endif::git-rev-list[] chosen (similarly for multiple `--committer=pattern`). --grep-reflog=pattern:: - Limit the commits output to ones with reflog entries that match the specified pattern (regular expression). With more than one `--grep-reflog`, commits whose reflog message @@ -60,7 +53,6 @@ endif::git-rev-list[] error to use this option unless `--walk-reflogs` is in use. --grep=pattern:: - Limit the commits output to ones with log message that matches the specified pattern (regular expression). With more than one `--grep=pattern`, commits whose message @@ -71,46 +63,38 @@ When `--show-notes` is in effect, the message from the notes as if it is part of the log message. --all-match:: - Limit the commits output to ones that match all given --grep, + Limit the commits output to ones that match all given `--grep`, instead of ones that match at least one. -i:: --regexp-ignore-case:: - Match the regexp limiting patterns without regard to letters case. --basic-regexp:: - Consider the limiting patterns to be basic regular expressions; this is the default. -E:: --extended-regexp:: - Consider the limiting patterns to be extended regular expressions instead of the default basic regular expressions. -F:: --fixed-strings:: - Consider the limiting patterns to be fixed strings (don't interpret pattern as a regular expression). --perl-regexp:: - Consider the limiting patterns to be Perl-compatible regexp. Requires libpcre to be compiled in. --remove-empty:: - Stop when a given path disappears from the tree. --merges:: - Print only merge commits. This is exactly the same as `--min-parents=2`. --no-merges:: - Do not print commits with more than one parent. This is exactly the same as `--max-parents=1`. @@ -118,7 +102,6 @@ if it is part of the log message. --max-parents=number:: --no-min-parents:: --no-max-parents:: - Show only commits which have at least (or at most) that many parent commits. In particular, `--max-parents=1` is the same as `--no-merges`, `--min-parents=2` is the same as `--merges`. `--max-parents=0` @@ -138,31 +121,26 @@ parents) and `--max-parents=-1` (negative numbers denote no upper limit). brought in to your history by such a merge. --not:: - Reverses the meaning of the '{caret}' prefix (or lack thereof) - for all following revision specifiers, up to the next '--not'. + for all following revision specifiers, up to the next `--not`. --all:: - Pretend as if all the refs in `refs/` are listed on the command line as 'commit'. --branches[=pattern]:: - Pretend as if all the refs in `refs/heads` are listed on the command line as 'commit'. If 'pattern' is given, limit branches to ones matching given shell glob. If pattern
[PATCHv3 2/2] Documentation/rev-list-options.txt: fix some grammatical issues and typos
Various fixes: -- fix typos (e.g. show - shown) -- use regular expression(s) instead of regexp where appropriate -- reword some sentences for easier reading -- fix/improve some grammatical issues (e.g. comma usage) -- add missing articles (e.g. the) -- change E-mail to email Signed-off-by: Jason St. John jstj...@purdue.edu --- I changed E-mail to email because that is clearly the dominant form: http://www.google.com/trends/explore?q=#q=%22email%22%2C%20%22E-mail%22cmpt=q Documentation/rev-list-options.txt | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index eb4b6bf..2991d70 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -68,7 +68,8 @@ if it is part of the log message. -i:: --regexp-ignore-case:: - Match the regexp limiting patterns without regard to letters case. + Match the regular expression limiting patterns without regard to letter + case. --basic-regexp:: Consider the limiting patterns to be basic regular expressions; @@ -85,7 +86,7 @@ if it is part of the log message. pattern as a regular expression). --perl-regexp:: - Consider the limiting patterns to be Perl-compatible regexp. + Consider the limiting patterns to be Perl-compatible regular expressions. Requires libpcre to be compiled in. --remove-empty:: @@ -191,9 +192,9 @@ endif::git-rev-list[] For example, if you have two branches, `A` and `B`, a usual way to list all commits on only one side of them is with `--left-right` (see the example below in the description of -the `--left-right` option). It however shows the commits that were cherry-picked -from the other branch (for example, ``3rd on b'' may be cherry-picked -from branch A). With this option, such pairs of commits are +the `--left-right` option). However, it shows the commits that were +cherry-picked from the other branch (for example, ``3rd on b'' may be +cherry-picked from branch A). With this option, such pairs of commits are excluded from the output. --left-only:: @@ -447,7 +448,7 @@ The effect of this is best shown by way of comparing to `-' --- + -Note the major differences in `N`, `P` and `Q` over `--full-history`: +Note the major differences in `N`, `P`, and `Q` over `--full-history`: + -- * `N`'s parent list had `I` removed, because it is an ancestor of the @@ -467,7 +468,7 @@ Finally, there is a fifth simplification mode available: Limit the displayed commits to those directly on the ancestry chain between the ``from'' and ``to'' commits in the given commit range. I.e. only display commits that are ancestor of the ``to'' - commit, and descendants of the ``from'' commit. + commit and descendants of the ``from'' commit. + As an example use case, consider the following commit history: + @@ -631,9 +632,9 @@ These options are mostly targeted for packing of Git repositories. --no-walk[=(sorted|unsorted)]:: Only show the given commits, but do not traverse their ancestors. This has no effect if a range is specified. If the argument - `unsorted` is given, the commits are show in the order they were + `unsorted` is given, the commits are shown in the order they were given on the command line. Otherwise (if `sorted` or no argument - was given), the commits are show in reverse chronological order + was given), the commits are shown in reverse chronological order by commit time. --do-walk:: @@ -656,7 +657,7 @@ include::pretty-options.txt[] --date=(relative|local|default|iso|rfc|short|raw):: Only takes effect for dates shown in human-readable format, such as when using `--pretty`. `log.date` config variable sets a default - value for log command's `--date` option. + value for the log command's `--date` option. + `--date=relative` shows dates relative to the current time, e.g. ``2 hours ago''. @@ -666,9 +667,9 @@ e.g. ``2 hours ago''. `--date=iso` (or `--date=iso8601`) shows timestamps in ISO 8601 format. + `--date=rfc` (or `--date=rfc2822`) shows timestamps in RFC 2822 -format, often found in E-mail messages. +format, often found in email messages. + -`--date=short` shows only date but not time, in `-MM-DD` format. +`--date=short` shows only the date, but not the time, in `-MM-DD` format. + `--date=raw` shows the date in the internal raw Git format `%s %z` format. + @@ -749,7 +750,7 @@ ifndef::git-rev-list[] Diff Formatting ~~~ -Below are listed options that control the formatting of diff output. +Listed below are options that control the formatting of diff output. Some of them are specific to linkgit:git-rev-list[1], however other diff options may be given. See linkgit:git-diff-files[1] for
Re: [PATCH 2/2] Fix minor grammatical and other formatting issues in the git log man page
On Wed, Nov 13, 2013 at 4:56 PM, Junio C Hamano gits...@pobox.com wrote: Jason St. John jstj...@purdue.edu writes: Documentation/git-log.txt: -- replace single quotes around options/commands with backticks -- use single quotes around references to sections -- replaced some double quotes with proper AsciiDoc quotes (e.g. ``foo'') -- use backticks around files and file paths -- use title case when referring to section headings -- use backticks around option arguments/defaults Signed-off-by: Jason St. John jstj...@purdue.edu --- When working on this commit, I noticed a difference in how options and option descriptions are separated (e.g. with a blank line or not). At least with Vim's syntax highlighting, if there is a blank line between the option and its description, the text block is all colored the same; however, if there isn't a blank line, then the text block is not specially colored. Is there an existing convention for how this should be done? I do not think we have a written rule or convention (and I do not know if we want one). While reading the text in the source form (and the point of choosing AsciiDoc was to be able to read the docs without formatting), I personally have a slight preference to immediately follow the body text to the label in the labelled list, and a blank line after the item, i.e. item label:: This describes the item. next item label:: This describes the next item. as it makes it clear that the body belongs to the heading that precedes it. But it does help to have a blank between the label and the beginning of the body when reflowing the body with fill-paragraph, i.e. item label:: This describes the item. You say that it is also easier on Vim to have the blank line there, so perhaps we may want to aim for updating the documentation over time to consistently do so. I dunno. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html As I stated in my recent resubmit[1], I decided to remove the blank lines after option subheadings because the syntax highlighting in Vim actually looks better with the blank lines removed. As such, I would prefer that we go with the option of removing these blank lines going forward. If we are in agreement on this, should I send in a patch for CodingGuidelines to state this? [1] http://marc.info/?l=gitm=138447927208462w=2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] branch: fix --verbose output column alignment
2013/11/15 Torstein Hegge he...@resisty.net: Commit f2e0873 (branch: report invalid tracking branch as gone) removed an early return from fill_tracking_info() in the path taken when 'git branch -v' lists a branch in sync with its upstream. This resulted in an unconditionally added space in front of the subject line: $ git branch -v * master f5eb3da commit pushed to upstream topic f935eb6 unpublished topic Thank you for catching this. Confirmed that the output of git branch -v is not aligned well. This example may be more clear ;-) $ git branch -v branch1f0ec0da [ahead 1, behind 2] divert from upstream branch2f5eb3da commit pushed to upstream branch3f935eb6 unpublished topic Instead, only add the trailing space if a decoration have been added. To catch this kind of whitespace breakage in the tests, be a bit less smart when filtering the output through sed. Signed-off-by: Torstein Hegge he...@resisty.net --- builtin/branch.c | 8 +++- t/t6040-tracking-info.sh | 24 +--- 2 files changed, 20 insertions(+), 12 deletions(-) diff --git a/builtin/branch.c b/builtin/branch.c index 0bb0e93..636a16e 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -424,6 +424,7 @@ static void fill_tracking_info(struct strbuf *stat, const char *branch_name, struct branch *branch = branch_get(branch_name); struct strbuf fancy = STRBUF_INIT; int upstream_is_gone = 0; + int added_decoration = 1; switch (stat_tracking_info(branch, ours, theirs)) { case 0: @@ -451,9 +452,13 @@ static void fill_tracking_info(struct strbuf *stat, const char *branch_name, if (upstream_is_gone) { if (show_upstream_ref) strbuf_addf(stat, _([%s: gone]), fancy.buf); + else + added_decoration = 0; } else if (!ours !theirs) { if (show_upstream_ref) strbuf_addf(stat, _([%s]), fancy.buf); + else + added_decoration = 0; } else if (!ours) { if (show_upstream_ref) strbuf_addf(stat, _([%s: behind %d]), fancy.buf, theirs); @@ -474,7 +479,8 @@ static void fill_tracking_info(struct strbuf *stat, const char *branch_name, ours, theirs); } strbuf_release(fancy); - strbuf_addch(stat, ' '); + if (added_decoration) + strbuf_addch(stat, ' '); free(ref); } How about add strbuf_addch(stat, ' '); to each condition directly, like this: if (upstream_is_gone) { if (show_upstream_ref) { strbuf_addf(stat, _([%s: gone]), fancy.buf); strbuf_addch(stat, ' '); } } else if (!ours !theirs) { if (show_upstream_ref) { strbuf_addf(stat, _([%s]), fancy.buf); strbuf_addch(stat, ' '); } } else if (!ours) { if (show_upstream_ref) strbuf_addf(stat, _([%s: behind %d]), fancy.buf, theirs); else strbuf_addf(stat, _([behind %d]), theirs); strbuf_addch(stat, ' '); } else if (!theirs) { if (show_upstream_ref) strbuf_addf(stat, _([%s: ahead %d]), fancy.buf, ours); else strbuf_addf(stat, _([ahead %d]), ours); strbuf_addch(stat, ' '); } else { if (show_upstream_ref) strbuf_addf(stat, _([%s: ahead %d, behind %d]), fancy.buf, ours, theirs); else strbuf_addf(stat, _([ahead %d, behind %d]), ours, theirs); strbuf_addch(stat, ' '); } strbuf_release(fancy); free(ref); -- Jiang Xin -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Fix minor grammatical and other formatting issues in the git log man page
On Thu, Nov 14, 2013 at 8:44 PM, Jason St. John jstj...@purdue.edu wrote: On Wed, Nov 13, 2013 at 4:56 PM, Junio C Hamano gits...@pobox.com wrote: Jason St. John jstj...@purdue.edu writes: Documentation/git-log.txt: -- replace single quotes around options/commands with backticks -- use single quotes around references to sections -- replaced some double quotes with proper AsciiDoc quotes (e.g. ``foo'') -- use backticks around files and file paths -- use title case when referring to section headings -- use backticks around option arguments/defaults Signed-off-by: Jason St. John jstj...@purdue.edu --- When working on this commit, I noticed a difference in how options and option descriptions are separated (e.g. with a blank line or not). At least with Vim's syntax highlighting, if there is a blank line between the option and its description, the text block is all colored the same; however, if there isn't a blank line, then the text block is not specially colored. Is there an existing convention for how this should be done? I do not think we have a written rule or convention (and I do not know if we want one). While reading the text in the source form (and the point of choosing AsciiDoc was to be able to read the docs without formatting), I personally have a slight preference to immediately follow the body text to the label in the labelled list, and a blank line after the item, i.e. item label:: This describes the item. next item label:: This describes the next item. as it makes it clear that the body belongs to the heading that precedes it. But it does help to have a blank between the label and the beginning of the body when reflowing the body with fill-paragraph, i.e. item label:: This describes the item. You say that it is also easier on Vim to have the blank line there, so perhaps we may want to aim for updating the documentation over time to consistently do so. I dunno. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html As I stated in my recent resubmit[1], I decided to remove the blank lines after option subheadings because the syntax highlighting in Vim actually looks better with the blank lines removed. As such, I would prefer that we go with the option of removing these blank lines going forward. If we are in agreement on this, should I send in a patch for CodingGuidelines to state this? [1] http://marc.info/?l=gitm=138447927208462w=2 I forgot to mention that if we do go with this, then I will need to resubmit this patch. Sorry for the extra email. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Nov 2013, #04; Wed, 13)
On Wed, Nov 13, 2013 at 03:07:54PM -0800, Junio C Hamano wrote: * nd/liteal-pathspecs (2013-10-28) 1 commit (merged to 'next' on 2013-11-01 at 1a91775) + pathspec: stop --*-pathspecs impact on internal parse_pathspec() uses Will cook in 'next'. I think we want this to be part of v1.8.5. It is a fix for a regression that appeared in master post-1.8.4: $ git.v1.8.4 --literal-pathspecs blame Makefile | wc -l 2596 $ git.v1.8.5-rc2 --literal-pathspecs blame Makefile | wc -l fatal: Makefile: pathspec magic not supported by this command: 'literal' 0 Sorry to mention it so late into the -rc cycle, but I just noticed that the patch hadn't graduated. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Add a bugzilla website
Hello, I just wanted to ask a question: why is there no bugzilla website for git ? It's better to put bugs into such a tool to follow there evolution than to manage bugs via a developpment mailing list ... Best regards, YC -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html