[PATCH 00/38] pack version 4 basic functionalities
After the initial posting here: http://news.gmane.org/group/gmane.comp.version-control.git/thread=233061 This is a repost plus the basic read side working, at least to validate the write side and the pack format itself. And many many bug fixes. This can also be fetched here: git://git.linaro.org/people/nico/git I consider the actual pack format definition final as implemented by this code. TODO: - index-pack support - native tree walk support - native commit graph walk support - better heuristics when creating tree delta encoding - integration with pack-objects - transfer protocol backward compatibility - thin pack completion - figure out unexplained runtime performance issues However, as I mentioned already, I've put more time on this project lately than I actually had available. I really wanted to bring this project far enough to be able to kick it out the door for others to take over, and there we are. I'm always available for design discussions and code review. But don't expect much additional code from me at this point. @junio: I'm hoping you can take this branch as is, and apply any ffurther patches on top. The diffstat goes like this: Makefile|3 + cache.h | 11 + hex.c | 11 + pack-check.c|4 +- pack-revindex.c |7 +- pack-write.c|6 +- packv4-create.c | 1105 + packv4-parse.c | 408 ++ packv4-parse.h |9 + sha1_file.c | 110 - 10 files changed, 1648 insertions(+), 26 deletions(-) Enjoy ! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 34/38] pack v4: code to retrieve a path component
Because the path dictionary table is located right after the name dictionary table, we currently need to load the later to find the former. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- cache.h| 2 ++ packv4-parse.c | 36 2 files changed, 38 insertions(+) diff --git a/cache.h b/cache.h index 6ce327e..5f2147a 100644 --- a/cache.h +++ b/cache.h @@ -1030,6 +1030,8 @@ extern struct packed_git { int version; int index_version; struct packv4_dict *name_dict; + off_t name_dict_end; + struct packv4_dict *path_dict; time_t mtime; int pack_fd; unsigned pack_local:1, diff --git a/packv4-parse.c b/packv4-parse.c index 431f47e..b80b73e 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -114,6 +114,7 @@ static void load_name_dict(struct packed_git *p) if (!names) die(bad pack name dictionary in %s, p-pack_name); p-name_dict = names; + p-name_dict_end = offset; } const unsigned char *get_nameref(struct packed_git *p, const unsigned char **srcp) @@ -131,6 +132,41 @@ const unsigned char *get_nameref(struct packed_git *p, const unsigned char **src return p-name_dict-data + p-name_dict-offsets[index]; } +static void load_path_dict(struct packed_git *p) +{ + off_t offset; + struct packv4_dict *paths; + + /* +* For now we need to load the name dictionary to know where +* it ends and therefore where the path dictionary starts. +*/ + if (!p-name_dict) + load_name_dict(p); + + offset = p-name_dict_end; + paths = load_dict(p, offset); + if (!paths) + die(bad pack path dictionary in %s, p-pack_name); + p-path_dict = paths; +} + +const unsigned char *get_pathref(struct packed_git *p, const unsigned char **srcp) +{ + unsigned int index; + + if (!p-path_dict) + load_path_dict(p); + + index = decode_varint(srcp); + if (index 1 || index - 1 = p-path_dict-nb_entries) { + error(%s: index overflow, __func__); + return NULL; + } + index -= 1; + return p-path_dict-data + p-path_dict-offsets[index]; +} + void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs, off_t offset, unsigned long size) { -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 28/38] pack v4: code to load and prepare a pack dictionary table for use
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-parse.c | 77 ++ 1 file changed, 77 insertions(+) diff --git a/packv4-parse.c b/packv4-parse.c index 299fc48..26894bc 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -28,3 +28,80 @@ const unsigned char *get_sha1ref(struct packed_git *p, return sha1; } + +struct packv4_dict { + const unsigned char *data; + unsigned int nb_entries; + unsigned int offsets[FLEX_ARRAY]; +}; + +static struct packv4_dict *load_dict(struct packed_git *p, off_t *offset) +{ + struct pack_window *w_curs = NULL; + off_t curpos = *offset; + unsigned long dict_size, avail; + unsigned char *src, *data; + const unsigned char *cp; + git_zstream stream; + struct packv4_dict *dict; + int nb_entries, i, st; + + /* get uncompressed dictionary data size */ + src = use_pack(p, w_curs, curpos, avail); + cp = src; + dict_size = decode_varint(cp); + if (dict_size 3) { + error(bad dict size); + return NULL; + } + curpos += cp - src; + + data = xmallocz(dict_size); + memset(stream, 0, sizeof(stream)); + stream.next_out = data; + stream.avail_out = dict_size + 1; + + git_inflate_init(stream); + do { + src = use_pack(p, w_curs, curpos, stream.avail_in); + stream.next_in = src; + st = git_inflate(stream, Z_FINISH); + curpos += stream.next_in - src; + } while ((st == Z_OK || st == Z_BUF_ERROR) stream.avail_out); + git_inflate_end(stream); + unuse_pack(w_curs); + if (st != Z_STREAM_END || stream.total_out != dict_size) { + error(pack dictionary bad); + free(data); + return NULL; + } + + /* count number of entries */ + nb_entries = 0; + cp = data; + while (cp data + dict_size - 3) { + cp += 2; /* prefix bytes */ + cp += strlen((const char *)cp); /* entry string */ + cp += 1; /* terminating NUL */ + nb_entries++; + } + if (cp - data != dict_size) { + error(dict size mismatch); + free(data); + return NULL; + } + + dict = xmalloc(sizeof(*dict) + nb_entries * sizeof(dict-offsets[0])); + dict-data = data; + dict-nb_entries = nb_entries; + + cp = data; + for (i = 0; i nb_entries; i++) { + dict-offsets[i] = cp - data; + cp += 2; + cp += strlen((const char *)cp) + 1; + } + + *offset = curpos; + return dict; +} -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 32/38] pack v4: parse delta base reference
There is only one type of delta with pack v4. The base reference encoding already handles either an offset (via the pack index) or a literal SHA1. We assume in the literal SHA1 case that the object lives in the same pack, just like with previous pack versions. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- sha1_file.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/sha1_file.c b/sha1_file.c index 67eb903..f3bfa28 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1686,7 +1686,19 @@ static off_t get_delta_base(struct packed_git *p, * that is assured. An OFS_DELTA longer than the hash size * is stupid, as then a REF_DELTA would be smaller to store. */ - if (type == OBJ_OFS_DELTA) { + if (p-version = 4) { + if (base_info[0] != 0) { + const unsigned char *cp = base_info; + unsigned int base_index = decode_varint(cp); + if (!base_index || base_index - 1 = p-num_objects) + return 0; /* out of bounds */ + *curpos += cp - base_info; + base_offset = nth_packed_object_offset(p, base_index - 1); + } else { + base_offset = find_pack_entry_one(base_info+1, p); + *curpos += 21; + } + } else if (type == OBJ_OFS_DELTA) { const unsigned char *cp = base_info; base_offset = decode_varint(cp); base_offset = delta_obj_offset - base_offset; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 33/38] pack v4: we can read commit objects now
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- Makefile | 1 + packv4-parse.c | 1 + packv4-parse.h | 7 +++ sha1_file.c| 10 ++ 4 files changed, 19 insertions(+) create mode 100644 packv4-parse.h diff --git a/Makefile b/Makefile index ba6cafc..22fc276 100644 --- a/Makefile +++ b/Makefile @@ -702,6 +702,7 @@ LIB_H += notes.h LIB_H += object.h LIB_H += pack-revindex.h LIB_H += pack.h +LIB_H += packv4-parse.h LIB_H += parse-options.h LIB_H += patch-ids.h LIB_H += pathspec.h diff --git a/packv4-parse.c b/packv4-parse.c index bca1a97..431f47e 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -9,6 +9,7 @@ */ #include cache.h +#include packv4-parse.h #include varint.h const unsigned char *get_sha1ref(struct packed_git *p, diff --git a/packv4-parse.h b/packv4-parse.h new file mode 100644 index 000..40aa75a --- /dev/null +++ b/packv4-parse.h @@ -0,0 +1,7 @@ +#ifndef PACKV4_PARSE_H +#define PACKV4_PARSE_H + +void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs, +off_t offset, unsigned long size); + +#endif diff --git a/sha1_file.c b/sha1_file.c index f3bfa28..b57d9f8 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -19,6 +19,7 @@ #include tree-walk.h #include refs.h #include pack-revindex.h +#include packv4-parse.h #include sha1-lookup.h #include bulk-checkin.h #include streaming.h @@ -2172,6 +2173,15 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset, break; case OBJ_COMMIT: case OBJ_TREE: + if (p-version = 4 !base_from_cache) { + if (type == OBJ_COMMIT) { + data = pv4_get_commit(p, w_curs, curpos, size); + } else { + die(no pack v4 tree parsing yet); + } + break; + } + /* fall through */ case OBJ_BLOB: case OBJ_TAG: if (!base_from_cache) -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 38/38] packv4-create: add a command line argument to limit tree copy sequences
Because there is no delta object cache for tree objects yet, walking tree entries may result in a lot of recursion. Let's add --min-tree-copy=N where N is the minimum number of copied entries in a single copy sequence allowed for encoding tree deltas. The default is 1. Specifying 0 disables tree deltas entirely. This allows for experiments with the delta width and see the influence on pack size vs runtime access cost. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 9d6ffc0..34dcebf 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -19,6 +19,7 @@ static int pack_compression_seen; static int pack_compression_level = Z_DEFAULT_COMPRESSION; +static int min_tree_copy = 1; struct data_entry { unsigned offset; @@ -441,7 +442,7 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, if (!size) return NULL; - if (!delta_size) + if (!delta_size || !min_tree_copy) delta = NULL; /* @@ -530,7 +531,6 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, cp += encode_varint(copy_count, cp); if (first_delta) cp += encode_sha1ref(delta_sha1, cp); - copy_count = 0; /* * Now let's make sure this is going to take less @@ -538,12 +538,14 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, * created in parallel. If so we dump the copy * sequence over those entries in the output buffer. */ - if (cp - copy_buf out - buffer[copy_pos]) { + if (copy_count = min_tree_copy + cp - copy_buf out - buffer[copy_pos]) { out = buffer + copy_pos; memcpy(out, copy_buf, cp - copy_buf); out += cp - copy_buf; first_delta = 0; } + copy_count = 0; } if (end - out 48) { @@ -574,7 +576,8 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, cp += encode_varint(copy_count, cp); if (first_delta) cp += encode_sha1ref(delta_sha1, cp); - if (cp - copy_buf out - buffer[copy_pos]) { + if (copy_count = min_tree_copy + cp - copy_buf out - buffer[copy_pos]) { out = buffer + copy_pos; memcpy(out, copy_buf, cp - copy_buf); out += cp - copy_buf; @@ -1078,14 +1081,24 @@ static int git_pack_config(const char *k, const char *v, void *cb) int main(int argc, char *argv[]) { - if (argc != 3) { - fprintf(stderr, Usage: %s src_packfile dst_packfile\n, argv[0]); + char *src_pack, *dst_pack; + + if (argc == 3) { + src_pack = argv[1]; + dst_pack = argv[2]; + } else if (argc == 4 !prefixcmp(argv[1], --min-tree-copy=)) { + min_tree_copy = atoi(argv[1] + strlen(--min-tree-copy=)); + src_pack = argv[2]; + dst_pack = argv[3]; + } else { + fprintf(stderr, Usage: %s [--min-tree-copy=n] src_packfile dst_packfile\n, argv[0]); exit(1); } + git_config(git_pack_config, NULL); if (!pack_compression_seen core_compression_seen) pack_compression_level = core_compression_level; - process_one_pack(argv[1], argv[2]); + process_one_pack(src_pack, dst_pack); if (0) dict_dump(); return 0; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 35/38] pack v4: decode tree objects
For now we recreate the whole tree object in its canonical form. Eventually, the core code should grow some ability to walk packv4 tree entries directly which would be way more efficient. Not only would that avoid double tree entry parsing, but the pack v4 encoding allows for getting at child objects without going through the SHA1 search. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-parse.c | 137 ++--- 1 file changed, 131 insertions(+), 6 deletions(-) diff --git a/packv4-parse.c b/packv4-parse.c index b80b73e..04eab46 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -151,19 +151,15 @@ static void load_path_dict(struct packed_git *p) p-path_dict = paths; } -const unsigned char *get_pathref(struct packed_git *p, const unsigned char **srcp) +const unsigned char *get_pathref(struct packed_git *p, unsigned int index) { - unsigned int index; - if (!p-path_dict) load_path_dict(p); - index = decode_varint(srcp); - if (index 1 || index - 1 = p-path_dict-nb_entries) { + if (index = p-path_dict-nb_entries) { error(%s: index overflow, __func__); return NULL; } - index -= 1; return p-path_dict-data + p-path_dict-offsets[index]; } @@ -240,3 +236,132 @@ void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs, return dst; } + +static int decode_entries(struct packed_git *p, struct pack_window **w_curs, + off_t offset, unsigned int start, unsigned int count, + unsigned char **dstp, unsigned long *sizep, int hdr) +{ + unsigned long avail; + unsigned int nb_entries; + const unsigned char *src, *scp; + off_t copy_objoffset = 0; + + src = use_pack(p, w_curs, offset, avail); + scp = src; + + if (hdr) { + /* we need to skip over the object header */ + while (*scp 128) + if (++scp - src = avail - 20) + return -1; + /* let's still make sure this is actually a tree */ + if ((*scp++ 0xf) != OBJ_TREE) + return -1; + } + + nb_entries = decode_varint(scp); + if (scp == src || start nb_entries || count nb_entries - start) + return -1; + offset += scp - src; + avail -= scp - src; + src = scp; + + while (count) { + unsigned int what; + + if (avail 20) { + src = use_pack(p, w_curs, offset, avail); + if (avail 20) + return -1; + } + scp = src; + + what = decode_varint(scp); + if (scp == src) + return -1; + + if (!(what 1) start != 0) { + /* +* This is a single entry and we have to skip it. +* The path index was parsed and is in 'what'. +* Skip over the SHA1 index. +*/ + while (*scp++ 128); + start--; + } else if (!(what 1) start == 0) { + /* +* This is an actual tree entry to recreate. +*/ + const unsigned char *path, *sha1; + unsigned mode; + int len; + + path = get_pathref(p, what 1); + sha1 = get_sha1ref(p, scp); + if (!path || !sha1) + return -1; + mode = (path[0] 8) | path[1]; + len = snprintf((char *)*dstp, *sizep, %o %s%c, + mode, path+2, '\0'); + if (len + 20 *sizep) + return -1; + hashcpy(*dstp + len, sha1); + *dstp += len + 20; + *sizep -= len + 20; + count--; + } else if (what 1) { + /* +* Copy from another tree object. +*/ + unsigned int copy_start, copy_count; + + copy_start = what 1; + copy_count = decode_varint(scp); + if (!copy_count) + return -1; + + /* +* The LSB of copy_count is a flag indicating if +* a third value is provided to specify the source +* object. This may be omitted when it doesn't +* change, but has to be specified at least for the +* first copy
[PATCH 31/38] sha1_file.c: make use of decode_varint()
... replacing the equivalent open coded loop. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- sha1_file.c | 14 +++--- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index a298933..67eb903 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1687,20 +1687,12 @@ static off_t get_delta_base(struct packed_git *p, * is stupid, as then a REF_DELTA would be smaller to store. */ if (type == OBJ_OFS_DELTA) { - unsigned used = 0; - unsigned char c = base_info[used++]; - base_offset = c 127; - while (c 128) { - base_offset += 1; - if (!base_offset || MSB(base_offset, 7)) - return 0; /* overflow */ - c = base_info[used++]; - base_offset = (base_offset 7) + (c 127); - } + const unsigned char *cp = base_info; + base_offset = decode_varint(cp); base_offset = delta_obj_offset - base_offset; if (base_offset = 0 || base_offset = delta_obj_offset) return 0; /* out of bound */ - *curpos += used; + *curpos += cp - base_info; } else if (type == OBJ_REF_DELTA) { /* The base entry _must_ be in the same pack */ base_offset = find_pack_entry_one(base_info, p); -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 29/38] pack v4: code to retrieve a name
The name dictionary is loaded if not already done. We know it is located right after the SHA1 table (20 bytes per object) which is itself right after the 12-byte header. Then the index is parsed from the input buffer and a pointer to the corresponding entry is returned. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- cache.h| 3 +++ packv4-parse.c | 24 2 files changed, 27 insertions(+) diff --git a/cache.h b/cache.h index 59d9ba7..6ce327e 100644 --- a/cache.h +++ b/cache.h @@ -1015,6 +1015,8 @@ struct pack_window { unsigned int inuse_cnt; }; +struct packv4_dict; + extern struct packed_git { struct packed_git *next; struct pack_window *windows; @@ -1027,6 +1029,7 @@ extern struct packed_git { unsigned char *bad_object_sha1; int version; int index_version; + struct packv4_dict *name_dict; time_t mtime; int pack_fd; unsigned pack_local:1, diff --git a/packv4-parse.c b/packv4-parse.c index 26894bc..074e107 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -105,3 +105,27 @@ static struct packv4_dict *load_dict(struct packed_git *p, off_t *offset) *offset = curpos; return dict; } + +static void load_name_dict(struct packed_git *p) +{ + off_t offset = 12 + p-num_objects * 20; + struct packv4_dict *names = load_dict(p, offset); + if (!names) + die(bad pack name dictionary in %s, p-pack_name); + p-name_dict = names; +} + +const unsigned char *get_nameref(struct packed_git *p, const unsigned char **srcp) +{ + unsigned int index; + + if (!p-name_dict) + load_name_dict(p); + + index = decode_varint(srcp); + if (index = p-name_dict-nb_entries) { + error(%s: index overflow, __func__); + return NULL; + } + return p-name_dict-data + p-name_dict-offsets[index]; +} -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 37/38] pack v4: introduce escape hatches in the name and path indexes
If the path or name index is zero, this means the entry data is to be found inline rather than being located in the dictionary table. This is there to allow easy completion of thin packs without having to add new table entries which would have required a full rewrite of the pack data. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 6 +++--- packv4-parse.c | 28 ++-- 2 files changed, 25 insertions(+), 9 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index fd16222..9d6ffc0 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -343,7 +343,7 @@ void *pv4_encode_commit(void *buffer, unsigned long *sizep) index = dict_add_entry(commit_name_table, tz_val, in, end - in); if (index 0) goto bad_dict; - out += encode_varint(index, out); + out += encode_varint(index + 1, out); time = strtoul(end, end, 10); if (!end || end[0] != ' ' || end[6] != '\n') goto bad_data; @@ -361,7 +361,7 @@ void *pv4_encode_commit(void *buffer, unsigned long *sizep) index = dict_add_entry(commit_name_table, tz_val, in, end - in); if (index 0) goto bad_dict; - out += encode_varint(index, out); + out += encode_varint(index + 1, out); time = strtoul(end, end, 10); if (!end || end[0] != ' ' || end[6] != '\n') goto bad_data; @@ -561,7 +561,7 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, free(buffer); return NULL; } - out += encode_varint(index 1, out); + out += encode_varint((index + 1) 1, out); out += encode_sha1ref(name_entry.sha1, out); } diff --git a/packv4-parse.c b/packv4-parse.c index 4c218d2..6db4ed3 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -125,11 +125,19 @@ const unsigned char *get_nameref(struct packed_git *p, const unsigned char **src load_name_dict(p); index = decode_varint(srcp); - if (index = p-name_dict-nb_entries) { + + if (!index) { + /* the entry data is inline */ + const unsigned char *data = *srcp; + *srcp += 2 + strlen((const char *)*srcp + 2) + 1; + return data; + } + + if (index - 1 = p-name_dict-nb_entries) { error(%s: index overflow, __func__); return NULL; } - return p-name_dict-data + p-name_dict-offsets[index]; + return p-name_dict-data + p-name_dict-offsets[index - 1]; } static void load_path_dict(struct packed_git *p) @@ -151,16 +159,24 @@ static void load_path_dict(struct packed_git *p) p-path_dict = paths; } -const unsigned char *get_pathref(struct packed_git *p, unsigned int index) +const unsigned char *get_pathref(struct packed_git *p, unsigned int index, +const unsigned char **srcp) { if (!p-path_dict) load_path_dict(p); - if (index = p-path_dict-nb_entries) { + if (!index) { + /* the entry data is inline */ + const unsigned char *data = *srcp; + *srcp += 2 + strlen((const char *)*srcp + 2) + 1; + return data; + } + + if (index - 1 = p-path_dict-nb_entries) { error(%s: index overflow, __func__); return NULL; } - return p-path_dict-data + p-path_dict-offsets[index]; + return p-path_dict-data + p-path_dict-offsets[index - 1]; } void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs, @@ -296,7 +312,7 @@ static int decode_entries(struct packed_git *p, struct pack_window **w_curs, unsigned mode; int len; - path = get_pathref(p, what 1); + path = get_pathref(p, what 1, scp); sha1 = get_sha1ref(p, scp); if (!path || !sha1) return -1; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 36/38] pack v4: get tree objects
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-parse.c | 25 + packv4-parse.h | 2 ++ sha1_file.c| 2 +- 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/packv4-parse.c b/packv4-parse.c index 04eab46..4c218d2 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -365,3 +365,28 @@ static int decode_entries(struct packed_git *p, struct pack_window **w_curs, return 0; } + +void *pv4_get_tree(struct packed_git *p, struct pack_window **w_curs, + off_t offset, unsigned long size) +{ + unsigned long avail; + unsigned int nb_entries; + unsigned char *dst, *dcp; + const unsigned char *src, *scp; + int ret; + + src = use_pack(p, w_curs, offset, avail); + scp = src; + nb_entries = decode_varint(scp); + if (scp == src) + return NULL; + + dst = xmallocz(size); + dcp = dst; + ret = decode_entries(p, w_curs, offset, 0, nb_entries, dcp, size, 0); + if (ret 0 || size != 0) { + free(dst); + return NULL; + } + return dst; +} diff --git a/packv4-parse.h b/packv4-parse.h index 40aa75a..5f9d809 100644 --- a/packv4-parse.h +++ b/packv4-parse.h @@ -3,5 +3,7 @@ void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs, off_t offset, unsigned long size); +void *pv4_get_tree(struct packed_git *p, struct pack_window **w_curs, + off_t offset, unsigned long size); #endif diff --git a/sha1_file.c b/sha1_file.c index b57d9f8..79e1293 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2177,7 +2177,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset, if (type == OBJ_COMMIT) { data = pv4_get_commit(p, w_curs, curpos, size); } else { - die(no pack v4 tree parsing yet); + data = pv4_get_tree(p, w_curs, curpos, size); } break; } -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/38] pack v4: commit object encoding
This goes as follows: - Tree reference: either variable length encoding of the index into the SHA1 table or the literal SHA1 prefixed by 0 (see encode_sha1ref()). - Parent count: variable length encoding of the number of parents. This is normally going to occupy a single byte but doesn't have to. - List of parent references: a list of encode_sha1ref() encoded references, or nothing if the parent count was zero. - Author reference: variable length encoding of an index into the author identifier dictionary table which also covers the time zone. To make the overall encoding efficient, the author table is sorted by usage frequency so the most used names are first and require the shortest index encoding. - Author time stamp: variable length encoded. Year 2038 ready! - Committer reference: same as author reference. - Committer time stamp: same as author time stamp. The remainder of the canonical commit object content is then zlib compressed and appended to the above. Rationale: The most important commit object data is densely encoded while requiring no zlib inflate processing on access, and all SHA1 references are most likely to be direct indices into the pack index file requiring no SHA1 search into the pack index file. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 119 1 file changed, 119 insertions(+) diff --git a/packv4-create.c b/packv4-create.c index 12527c0..d4a79f4 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -14,6 +14,9 @@ #include pack.h #include varint.h + +static int pack_compression_level = Z_DEFAULT_COMPRESSION; + struct data_entry { unsigned offset; unsigned size; @@ -274,6 +277,122 @@ static int encode_sha1ref(const unsigned char *sha1, unsigned char *buf) return 1 + 20; } +/* + * This converts a canonical commit object buffer into its + * tightly packed representation using the already populated + * and sorted commit_name_table dictionary. The parsing is + * strict so to ensure the canonical version may always be + * regenerated and produce the same hash. + */ +void *pv4_encode_commit(void *buffer, unsigned long *sizep) +{ + unsigned long size = *sizep; + char *in, *tail, *end; + unsigned char *out; + unsigned char sha1[20]; + int nb_parents, index, tz_val; + unsigned long time; + z_stream stream; + int status; + + /* +* It is guaranteed that the output is always going to be smaller +* than the input. We could even do this conversion in place. +*/ + in = buffer; + tail = in + size; + buffer = xmalloc(size); + out = buffer; + + /* parse the tree line */ + if (in + 46 = tail || memcmp(in, tree , 5) || in[45] != '\n') + goto bad_data; + if (get_sha1_lowhex(in + 5, sha1) 0) + goto bad_data; + in += 46; + out += encode_sha1ref(sha1, out); + + /* count how many parent lines */ + nb_parents = 0; + while (in + 48 tail !memcmp(in, parent , 7) in[47] == '\n') { + nb_parents++; + in += 48; + } + out += encode_varint(nb_parents, out); + + /* rewind and parse the parent lines */ + in -= 48 * nb_parents; + while (nb_parents--) { + if (get_sha1_lowhex(in + 7, sha1)) + goto bad_data; + out += encode_sha1ref(sha1, out); + in += 48; + } + + /* parse the author line */ + /* it must be at least author x x 0 +\n i.e. 21 chars */ + if (in + 21 = tail || memcmp(in, author , 7)) + goto bad_data; + in += 7; + end = get_nameend_and_tz(in, tz_val); + if (!end) + goto bad_data; + index = dict_add_entry(commit_name_table, tz_val, in, end - in); + if (index 0) + goto bad_dict; + out += encode_varint(index, out); + time = strtoul(end, end, 10); + if (!end || end[0] != ' ' || end[6] != '\n') + goto bad_data; + out += encode_varint(time, out); + in = end + 7; + + /* parse the committer line */ + /* it must be at least committer x x 0 +\n i.e. 24 chars */ + if (in + 24 = tail || memcmp(in, committer , 7)) + goto bad_data; + in += 10; + end = get_nameend_and_tz(in, tz_val); + if (!end) + goto bad_data; + index = dict_add_entry(commit_name_table, tz_val, in, end - in); + if (index 0) + goto bad_dict; + out += encode_varint(index, out); + time = strtoul(end, end, 10); + if (!end || end[0] != ' ' || end[6] != '\n') + goto bad_data; + out += encode_varint(time, out); + in = end + 7; + + /* finally, deflate the remaining data */ + memset(stream, 0, sizeof(stream)); + deflateInit(stream,
[PATCH 01/38] pack v4: initial pack dictionary structure and code
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 137 1 file changed, 137 insertions(+) create mode 100644 packv4-create.c diff --git a/packv4-create.c b/packv4-create.c new file mode 100644 index 000..2de6d41 --- /dev/null +++ b/packv4-create.c @@ -0,0 +1,137 @@ +/* + * packv4-create.c: management of dictionary tables used in pack v4 + * + * (C) Nicolas Pitre n...@fluxnic.net + * + * This code is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include cache.h + +struct data_entry { + unsigned offset; + unsigned hits; +}; + +struct dict_table { + char *data; + unsigned ptr; + unsigned size; + struct data_entry *entry; + unsigned nb_entries; + unsigned max_entries; + unsigned *hash; + unsigned hash_size; +}; + +struct dict_table *create_dict_table(void) +{ + return xcalloc(sizeof(struct dict_table), 1); +} + +void destroy_dict_table(struct dict_table *t) +{ + free(t-data); + free(t-entry); + free(t-hash); + free(t); +} + +static int locate_entry(struct dict_table *t, const char *str) +{ + int i = 0; + const unsigned char *s = (const unsigned char *) str; + + while (*s) + i = i * 111 + *s++; + i = (unsigned)i % t-hash_size; + + while (t-hash[i]) { + unsigned n = t-hash[i] - 1; + if (!strcmp(str, t-data + t-entry[n].offset)) + return n; + if (++i = t-hash_size) + i = 0; + } + return -1 - i; +} + +static void rehash_entries(struct dict_table *t) +{ + unsigned n; + + t-hash_size *= 2; + if (t-hash_size 1024) + t-hash_size = 1024; + t-hash = xrealloc(t-hash, t-hash_size * sizeof(*t-hash)); + memset(t-hash, 0, t-hash_size * sizeof(*t-hash)); + + for (n = 0; n t-nb_entries; n++) { + int i = locate_entry(t, t-data + t-entry[n].offset); + if (i 0) + t-hash[-1 - i] = n + 1; + } +} + +int dict_add_entry(struct dict_table *t, const char *str) +{ + int i, len = strlen(str) + 1; + + if (t-ptr + len = t-size) { + t-size = (t-size + len + 1024) * 3 / 2; + t-data = xrealloc(t-data, t-size); + } + memcpy(t-data + t-ptr, str, len); + + i = (t-nb_entries) ? locate_entry(t, t-data + t-ptr) : -1; + if (i = 0) { + t-entry[i].hits++; + return i; + } + + if (t-nb_entries = t-max_entries) { + t-max_entries = (t-max_entries + 1024) * 3 / 2; + t-entry = xrealloc(t-entry, t-max_entries * sizeof(*t-entry)); + } + t-entry[t-nb_entries].offset = t-ptr; + t-entry[t-nb_entries].hits = 1; + t-ptr += len + 1; + t-nb_entries++; + + if (t-hash_size * 3 = t-nb_entries * 4) + rehash_entries(t); + else + t-hash[-1 - i] = t-nb_entries; + + return t-nb_entries - 1; +} + +static int cmp_dict_entries(const void *a_, const void *b_) +{ + const struct data_entry *a = a_; + const struct data_entry *b = b_; + int diff = b-hits - a-hits; + if (!diff) + diff = a-offset - b-offset; + return diff; +} + +static void sort_dict_entries_by_hits(struct dict_table *t) +{ + qsort(t-entry, t-nb_entries, sizeof(*t-entry), cmp_dict_entries); + t-hash_size = (t-nb_entries * 4 / 3) / 2; + rehash_entries(t); +} + +void dict_dump(struct dict_table *t) +{ + int i; + + sort_dict_entries_by_hits(t); + for (i = 0; i t-nb_entries; i++) + printf(%d\t%s\n, + t-entry[i].hits, + t-data + t-entry[i].offset); +} -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 22/38] pack index v3
This is a minor change over pack index v2. Since pack v4 already contains the sorted SHA1 table, it is therefore ommitted from the index file. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- pack-write.c| 6 +- packv4-create.c | 10 +- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/pack-write.c b/pack-write.c index ca9e63b..631007e 100644 --- a/pack-write.c +++ b/pack-write.c @@ -87,6 +87,8 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec /* if last object's offset is = 2^31 we should use index V2 */ index_version = need_large_offset(last_obj_offset, opts) ? 2 : opts-version; + if (index_version opts-version) + index_version = opts-version; /* index versions 2 and above need a header */ if (index_version = 2) { @@ -127,7 +129,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec uint32_t offset = htonl(obj-offset); sha1write(f, offset, 4); } - sha1write(f, obj-sha1, 20); + /* Pack v4 (using index v3) carries the SHA1 table already */ + if (index_version 3) + sha1write(f, obj-sha1, 20); git_SHA1_Update(ctx, obj-sha1, 20); if ((opts-flags WRITE_IDX_STRICT) (i !hashcmp(list[-2]-sha1, obj-sha1))) diff --git a/packv4-create.c b/packv4-create.c index a9e9002..22cdf8e 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -1014,8 +1014,10 @@ static void process_one_pack(char *src_pack, char *dst_pack) struct packed_git *p; struct sha1file *f; struct pack_idx_entry *objs, **p_objs; + struct pack_idx_option idx_opts; unsigned i, nr_objects; off_t written = 0; + unsigned char pack_sha1[20]; p = open_pack(src_pack); if (!p) @@ -1041,11 +1043,17 @@ static void process_one_pack(char *src_pack, char *dst_pack) for (i = 0; i nr_objects; i++) { off_t obj_pos = written; struct pack_idx_entry *obj = p_objs[i]; + crc32_begin(f); written += packv4_write_object(f, p, obj); obj-offset = obj_pos; + obj-crc32 = crc32_end(f); } - sha1close(f, NULL, CSUM_CLOSE | CSUM_FSYNC); + sha1close(f, pack_sha1, CSUM_CLOSE | CSUM_FSYNC); + + reset_pack_idx_option(idx_opts); + idx_opts.version = 3; + write_idx_file(dst_pack, p_objs, nr_objects, idx_opts, pack_sha1); } static int git_pack_config(const char *k, const char *v, void *cb) -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 20/38] pack v4: honor pack.compression config option
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/packv4-create.c b/packv4-create.c index c8d3053..45f8427 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -16,6 +16,7 @@ #include varint.h +static int pack_compression_seen; static int pack_compression_level = Z_DEFAULT_COMPRESSION; struct data_entry { @@ -1047,12 +1048,30 @@ static void process_one_pack(char *src_pack, char *dst_pack) sha1close(f, NULL, CSUM_CLOSE | CSUM_FSYNC); } +static int git_pack_config(const char *k, const char *v, void *cb) +{ + if (!strcmp(k, pack.compression)) { + int level = git_config_int(k, v); + if (level == -1) + level = Z_DEFAULT_COMPRESSION; + else if (level 0 || level Z_BEST_COMPRESSION) + die(bad pack compression level %d, level); + pack_compression_level = level; + pack_compression_seen = 1; + return 0; + } + return git_default_config(k, v, cb); +} + int main(int argc, char *argv[]) { if (argc != 3) { fprintf(stderr, Usage: %s src_packfile dst_packfile\n, argv[0]); exit(1); } + git_config(git_pack_config, NULL); + if (!pack_compression_seen core_compression_seen) + pack_compression_level = core_compression_level; process_one_pack(argv[1], argv[2]); if (0) dict_dump(); -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/38] pack v4: relax commit parsing a bit
At least commit af25e94d4dcfb9608846242fabdd4e6014e5c9f0 in the Linux kernel repository has author 1120285620 -0700 Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 45f8427..a9e9002 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -158,12 +158,12 @@ static char *get_nameend_and_tz(char *from, int *tz_val) char *end, *tz; tz = strchr(from, '\n'); - /* let's assume the smallest possible string to be x x 0 +\n */ - if (!tz || tz - from 13) + /* let's assume the smallest possible string to be 0 +\n */ + if (!tz || tz - from 11) return NULL; tz -= 4; end = tz - 4; - while (end - from 5 *end != ' ') + while (end - from 3 *end != ' ') end--; if (end[-1] != '' || end[0] != ' ' || tz[-2] != ' ') return NULL; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/38] pack v4: object headers
In pack v4 the object size and type is encoded differently from pack v3. The object size uses the same efficient variable length number encoding already used elsewhere. The object type has 4 bits allocated to it compared to 3 bits in pack v3. This should be quite sufficient for the foreseeable future, especially since pack v4 has only one type of delta object instead of two. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/packv4-create.c b/packv4-create.c index 61b70c8..6098062 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -635,6 +635,33 @@ static unsigned long packv4_write_tables(struct sha1file *f, unsigned nr_objects return written; } +static int write_object_header(struct sha1file *f, enum object_type type, unsigned long size) +{ + unsigned char buf[16]; + uint64_t val; + int len; + + /* +* We really have only one kind of delta object. +*/ + if (type == OBJ_OFS_DELTA) + type = OBJ_REF_DELTA; + + /* +* We allocate 4 bits in the LSB for the object type which should +* be good for quite a while, given that we effectively encodes +* only 5 object types: commit, tree, blob, delta, tag. +*/ + val = size; + if (MSB(val, 4)) + die(fixme: the code doesn't currently cope with big sizes); + val = 4; + val |= type; + len = encode_varint(val, buf); + sha1write(f, buf, len); + return len; +} + static struct packed_git *open_pack(const char *path) { char arg[PATH_MAX]; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/38] pack v4: scan tree objects
Let's read a pack to feed our dictionary with all the path strings contained in all the tree objects. Dump the resulting dictionary sorted by frequency to stdout. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- Makefile| 1 + packv4-create.c | 137 2 files changed, 138 insertions(+) diff --git a/Makefile b/Makefile index 3588ca1..4716113 100644 --- a/Makefile +++ b/Makefile @@ -550,6 +550,7 @@ PROGRAM_OBJS += shell.o PROGRAM_OBJS += show-index.o PROGRAM_OBJS += upload-pack.o PROGRAM_OBJS += remote-testsvn.o +PROGRAM_OBJS += packv4-create.o # Binary suffix, set to .exe for Windows builds X = diff --git a/packv4-create.c b/packv4-create.c index 2de6d41..00762a5 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -9,6 +9,8 @@ */ #include cache.h +#include object.h +#include tree-walk.h struct data_entry { unsigned offset; @@ -125,6 +127,22 @@ static void sort_dict_entries_by_hits(struct dict_table *t) rehash_entries(t); } +static struct dict_table *tree_path_table; + +static int add_tree_dict_entries(void *buf, unsigned long size) +{ + struct tree_desc desc; + struct name_entry name_entry; + + if (!tree_path_table) + tree_path_table = create_dict_table(); + + init_tree_desc(desc, buf, size); + while (tree_entry(desc, name_entry)) + dict_add_entry(tree_path_table, name_entry.path); + return 0; +} + void dict_dump(struct dict_table *t) { int i; @@ -135,3 +153,122 @@ void dict_dump(struct dict_table *t) t-entry[i].hits, t-data + t-entry[i].offset); } + +struct idx_entry +{ + off_toffset; + const unsigned char *sha1; +}; + +static int sort_by_offset(const void *e1, const void *e2) +{ + const struct idx_entry *entry1 = e1; + const struct idx_entry *entry2 = e2; + if (entry1-offset entry2-offset) + return -1; + if (entry1-offset entry2-offset) + return 1; + return 0; +} +static int create_pack_dictionaries(struct packed_git *p) +{ + uint32_t nr_objects, i; + struct idx_entry *objects; + + nr_objects = p-num_objects; + objects = xmalloc((nr_objects + 1) * sizeof(*objects)); + objects[nr_objects].offset = p-index_size - 40; + for (i = 0; i nr_objects; i++) { + objects[i].sha1 = nth_packed_object_sha1(p, i); + objects[i].offset = nth_packed_object_offset(p, i); + } + qsort(objects, nr_objects, sizeof(*objects), sort_by_offset); + + for (i = 0; i nr_objects; i++) { + void *data; + enum object_type type; + unsigned long size; + struct object_info oi = {}; + + oi.typep = type; + oi.sizep = size; + if (packed_object_info(p, objects[i].offset, oi) 0) + die(cannot get type of %s from %s, + sha1_to_hex(objects[i].sha1), p-pack_name); + + switch (type) { + case OBJ_TREE: + break; + default: + continue; + } + data = unpack_entry(p, objects[i].offset, type, size); + if (!data) + die(cannot unpack %s from %s, + sha1_to_hex(objects[i].sha1), p-pack_name); + if (check_sha1_signature(objects[i].sha1, data, size, typename(type))) + die(packed %s from %s is corrupt, + sha1_to_hex(objects[i].sha1), p-pack_name); + if (add_tree_dict_entries(data, size) 0) + die(can't process %s object %s, + typename(type), sha1_to_hex(objects[i].sha1)); + free(data); + } + free(objects); + + return 0; +} + +static int process_one_pack(const char *path) +{ + char arg[PATH_MAX]; + int len; + struct packed_git *p; + + len = strlcpy(arg, path, PATH_MAX); + if (len = PATH_MAX) + return error(name too long: %s, path); + + /* +* In addition to foo.idx we accept foo.pack and foo; +* normalize these forms to foo.idx for add_packed_git(). +*/ + if (has_extension(arg, .pack)) { + strcpy(arg + len - 5, .idx); + len--; + } else if (!has_extension(arg, .idx)) { + if (len + 4 = PATH_MAX) + return error(name too long: %s.idx, arg); + strcpy(arg + len, .idx); + len += 4; + } + + /* +* add_packed_git() uses our buffer (containing foo.idx) to +* build the pack filename (foo.pack). Make sure it fits. +*/ + if (len + 1 = PATH_MAX) { + arg[len - 4] = '\0'; +
[PATCH 16/38] pack v4: object writing
This adds the missing code to finally be able to produce a complete pack file version 4. We trap commit and tree objects as those have a completely new encoding. Other object types are copied almost unchanged. As we go the pack index entries are updated in place to store the new object offsets once they're written to the destination file. This will be needed later for writing the pack index file. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 74 ++--- 1 file changed, 71 insertions(+), 3 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index b0e344f..5d76234 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -722,6 +722,59 @@ static unsigned long copy_object_data(struct sha1file *f, struct packed_git *p, return written; } +static off_t packv4_write_object(struct sha1file *f, struct packed_git *p, +struct pack_idx_entry *obj) +{ + void *src, *result; + struct object_info oi = {}; + enum object_type type; + unsigned long size; + unsigned int hdrlen; + + oi.typep = type; + oi.sizep = size; + if (packed_object_info(p, obj-offset, oi) 0) + die(cannot get type of %s from %s, + sha1_to_hex(obj-sha1), p-pack_name); + + /* Some objects are copied without decompression */ + switch (type) { + case OBJ_COMMIT: + case OBJ_TREE: + break; + default: + return copy_object_data(f, p, obj-offset); + } + + /* The rest is converted into their new format */ + src = unpack_entry(p, obj-offset, type, size); + if (!src) + die(cannot unpack %s from %s, + sha1_to_hex(obj-sha1), p-pack_name); + if (check_sha1_signature(obj-sha1, src, size, typename(type))) + die(packed %s from %s is corrupt, + sha1_to_hex(obj-sha1), p-pack_name); + + hdrlen = write_object_header(f, type, size); + switch (type) { + case OBJ_COMMIT: + result = pv4_encode_commit(src, size); + break; + case OBJ_TREE: + result = pv4_encode_tree(src, size); + break; + default: + die(unexpected object type %d, type); + } + free(src); + if (!result) + die(can't convert %s object %s, + typename(type), sha1_to_hex(obj-sha1)); + sha1write(f, result, size); + free(result); + return hdrlen + size; +} + static struct packed_git *open_pack(const char *path) { char arg[PATH_MAX]; @@ -780,7 +833,8 @@ static void process_one_pack(char *src_pack, char *dst_pack) struct packed_git *p; struct sha1file *f; struct pack_idx_entry *objs, **p_objs; - unsigned nr_objects; + unsigned i, nr_objects; + off_t written = 0; p = open_pack(src_pack); if (!p) @@ -791,12 +845,26 @@ static void process_one_pack(char *src_pack, char *dst_pack) p_objs = sort_objs_by_offset(objs, nr_objects); create_pack_dictionaries(p, p_objs); + sort_dict_entries_by_hits(commit_name_table); + sort_dict_entries_by_hits(tree_path_table); f = packv4_open(dst_pack); if (!f) die(unable to open destination pack); - packv4_write_header(f, nr_objects); - packv4_write_tables(f, nr_objects, objs); + written += packv4_write_header(f, nr_objects); + written += packv4_write_tables(f, nr_objects, objs); + + /* Let's write objects out, updating the object index list in place */ + all_objs = objs; + all_objs_nr = nr_objects; + for (i = 0; i nr_objects; i++) { + off_t obj_pos = written; + struct pack_idx_entry *obj = p_objs[i]; + written += packv4_write_object(f, p, obj); + obj-offset = obj_pos; + } + + sha1close(f, NULL, CSUM_CLOSE | CSUM_FSYNC); } int main(int argc, char *argv[]) -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/38] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry
Let's create a struct pack_idx_entry list with sorted sha1 which will be useful later. The offset sorted list is now a separate indirect list. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 72 + 1 file changed, 42 insertions(+), 30 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 20d97a4..012129b 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -11,6 +11,7 @@ #include cache.h #include object.h #include tree-walk.h +#include pack.h struct data_entry { unsigned offset; @@ -244,46 +245,53 @@ static void dict_dump(void) dump_dict_table(tree_path_table); } -struct idx_entry +static struct pack_idx_entry *get_packed_object_list(struct packed_git *p) { - off_toffset; - const unsigned char *sha1; -}; + unsigned i, nr_objects = p-num_objects; + struct pack_idx_entry *objects; + + objects = xmalloc((nr_objects + 1) * sizeof(*objects)); + objects[nr_objects].offset = p-pack_size - 20; + for (i = 0; i nr_objects; i++) { + hashcpy(objects[i].sha1, nth_packed_object_sha1(p, i)); + objects[i].offset = nth_packed_object_offset(p, i); + } + + return objects; +} static int sort_by_offset(const void *e1, const void *e2) { - const struct idx_entry *entry1 = e1; - const struct idx_entry *entry2 = e2; - if (entry1-offset entry2-offset) + const struct pack_idx_entry * const *entry1 = e1; + const struct pack_idx_entry * const *entry2 = e2; + if ((*entry1)-offset (*entry2)-offset) return -1; - if (entry1-offset entry2-offset) + if ((*entry1)-offset (*entry2)-offset) return 1; return 0; } -static struct idx_entry *get_packed_object_list(struct packed_git *p) +static struct pack_idx_entry **sort_objs_by_offset(struct pack_idx_entry *list, + unsigned nr_objects) { - uint32_t nr_objects, i; - struct idx_entry *objects; + unsigned i; + struct pack_idx_entry **sorted; - nr_objects = p-num_objects; - objects = xmalloc((nr_objects + 1) * sizeof(*objects)); - objects[nr_objects].offset = p-index_size - 40; - for (i = 0; i nr_objects; i++) { - objects[i].sha1 = nth_packed_object_sha1(p, i); - objects[i].offset = nth_packed_object_offset(p, i); - } - qsort(objects, nr_objects, sizeof(*objects), sort_by_offset); + sorted = xmalloc((nr_objects + 1) * sizeof(*sorted)); + for (i = 0; i nr_objects + 1; i++) + sorted[i] = list[i]; + qsort(sorted, nr_objects + 1, sizeof(*sorted), sort_by_offset); - return objects; + return sorted; } static int create_pack_dictionaries(struct packed_git *p, - struct idx_entry *objects) + struct pack_idx_entry **obj_list) { unsigned int i; for (i = 0; i p-num_objects; i++) { + struct pack_idx_entry *obj = obj_list[i]; void *data; enum object_type type; unsigned long size; @@ -292,9 +300,9 @@ static int create_pack_dictionaries(struct packed_git *p, oi.typep = type; oi.sizep = size; - if (packed_object_info(p, objects[i].offset, oi) 0) + if (packed_object_info(p, obj-offset, oi) 0) die(cannot get type of %s from %s, - sha1_to_hex(objects[i].sha1), p-pack_name); + sha1_to_hex(obj-sha1), p-pack_name); switch (type) { case OBJ_COMMIT: @@ -306,16 +314,16 @@ static int create_pack_dictionaries(struct packed_git *p, default: continue; } - data = unpack_entry(p, objects[i].offset, type, size); + data = unpack_entry(p, obj-offset, type, size); if (!data) die(cannot unpack %s from %s, - sha1_to_hex(objects[i].sha1), p-pack_name); - if (check_sha1_signature(objects[i].sha1, data, size, typename(type))) + sha1_to_hex(obj-sha1), p-pack_name); + if (check_sha1_signature(obj-sha1, data, size, typename(type))) die(packed %s from %s is corrupt, - sha1_to_hex(objects[i].sha1), p-pack_name); + sha1_to_hex(obj-sha1), p-pack_name); if (add_dict_entries(data, size) 0) die(can't process %s object %s, - typename(type), sha1_to_hex(objects[i].sha1)); + typename(type), sha1_to_hex(obj-sha1)); free(data); } @@
[PATCH 04/38] pack v4: add tree entry mode support to dictionary entries
Augment dict entries with a 16-bit prefix in order to store the file mode value of tree entries. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 56 1 file changed, 36 insertions(+), 20 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 00762a5..eccd9fc 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -14,11 +14,12 @@ struct data_entry { unsigned offset; + unsigned size; unsigned hits; }; struct dict_table { - char *data; + unsigned char *data; unsigned ptr; unsigned size; struct data_entry *entry; @@ -41,18 +42,19 @@ void destroy_dict_table(struct dict_table *t) free(t); } -static int locate_entry(struct dict_table *t, const char *str) +static int locate_entry(struct dict_table *t, const void *data, int size) { - int i = 0; - const unsigned char *s = (const unsigned char *) str; + int i = 0, len = size; + const unsigned char *p = data; - while (*s) - i = i * 111 + *s++; + while (len--) + i = i * 111 + *p++; i = (unsigned)i % t-hash_size; while (t-hash[i]) { unsigned n = t-hash[i] - 1; - if (!strcmp(str, t-data + t-entry[n].offset)) + if (t-entry[n].size == size + memcmp(t-data + t-entry[n].offset, data, size) == 0) return n; if (++i = t-hash_size) i = 0; @@ -71,23 +73,28 @@ static void rehash_entries(struct dict_table *t) memset(t-hash, 0, t-hash_size * sizeof(*t-hash)); for (n = 0; n t-nb_entries; n++) { - int i = locate_entry(t, t-data + t-entry[n].offset); + int i = locate_entry(t, t-data + t-entry[n].offset, + t-entry[n].size); if (i 0) t-hash[-1 - i] = n + 1; } } -int dict_add_entry(struct dict_table *t, const char *str) +int dict_add_entry(struct dict_table *t, int val, const char *str) { - int i, len = strlen(str) + 1; + int i, val_len = 2, str_len = strlen(str) + 1; - if (t-ptr + len = t-size) { - t-size = (t-size + len + 1024) * 3 / 2; + if (t-ptr + val_len + str_len t-size) { + t-size = (t-size + val_len + str_len + 1024) * 3 / 2; t-data = xrealloc(t-data, t-size); } - memcpy(t-data + t-ptr, str, len); - i = (t-nb_entries) ? locate_entry(t, t-data + t-ptr) : -1; + t-data[t-ptr] = val 8; + t-data[t-ptr + 1] = val; + memcpy(t-data + t-ptr + val_len, str, str_len); + + i = (t-nb_entries) ? + locate_entry(t, t-data + t-ptr, val_len + str_len) : -1; if (i = 0) { t-entry[i].hits++; return i; @@ -98,8 +105,9 @@ int dict_add_entry(struct dict_table *t, const char *str) t-entry = xrealloc(t-entry, t-max_entries * sizeof(*t-entry)); } t-entry[t-nb_entries].offset = t-ptr; + t-entry[t-nb_entries].size = val_len + str_len; t-entry[t-nb_entries].hits = 1; - t-ptr += len + 1; + t-ptr += val_len + str_len; t-nb_entries++; if (t-hash_size * 3 = t-nb_entries * 4) @@ -139,7 +147,8 @@ static int add_tree_dict_entries(void *buf, unsigned long size) init_tree_desc(desc, buf, size); while (tree_entry(desc, name_entry)) - dict_add_entry(tree_path_table, name_entry.path); + dict_add_entry(tree_path_table, name_entry.mode, + name_entry.path); return 0; } @@ -148,10 +157,16 @@ void dict_dump(struct dict_table *t) int i; sort_dict_entries_by_hits(t); - for (i = 0; i t-nb_entries; i++) - printf(%d\t%s\n, - t-entry[i].hits, - t-data + t-entry[i].offset); + for (i = 0; i t-nb_entries; i++) { + int16_t val; + uint16_t uval; + val = t-data[t-entry[i].offset] 8; + val |= t-data[t-entry[i].offset + 1]; + uval = val; + printf(%d\t%d\t%o\t%s\n, + t-entry[i].hits, val, uval, + t-data + t-entry[i].offset + 2); + } } struct idx_entry @@ -170,6 +185,7 @@ static int sort_by_offset(const void *e1, const void *e2) return 1; return 0; } + static int create_pack_dictionaries(struct packed_git *p) { uint32_t nr_objects, i; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/38] pack v4: object header decode
For this we need the pack version. However only open_packed_git_1() has been audited for pack v4 so far, hence the version validation is not added to pack_version_ok() just yet. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- cache.h | 1 + sha1_file.c | 14 -- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index c939b60..59d9ba7 100644 --- a/cache.h +++ b/cache.h @@ -1025,6 +1025,7 @@ extern struct packed_git { uint32_t num_objects; uint32_t num_bad_objects; unsigned char *bad_object_sha1; + int version; int index_version; time_t mtime; int pack_fd; diff --git a/sha1_file.c b/sha1_file.c index 5c63781..a298933 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -10,6 +10,7 @@ #include string-list.h #include delta.h #include pack.h +#include varint.h #include blob.h #include commit.h #include run-command.h @@ -845,10 +846,11 @@ static int open_packed_git_1(struct packed_git *p) return error(file %s is far too short to be a packfile, p-pack_name); if (hdr.hdr_signature != htonl(PACK_SIGNATURE)) return error(file %s is not a GIT packfile, p-pack_name); - if (!pack_version_ok(hdr.hdr_version)) + if (!pack_version_ok(hdr.hdr_version) hdr.hdr_version != htonl(4)) return error(packfile %s is version %PRIu32 and not supported (try upgrading GIT to a newer version), p-pack_name, ntohl(hdr.hdr_version)); + p-version = ntohl(hdr.hdr_version); /* Verify the pack matches its index. */ if (p-num_objects != ntohl(hdr.hdr_entries)) @@ -1725,7 +1727,15 @@ int unpack_object_header(struct packed_git *p, * insane, so we know won't exceed what we have been given. */ base = use_pack(p, w_curs, *curpos, left); - used = unpack_object_header_buffer(base, left, type, sizep); + if (p-version 4) { + used = unpack_object_header_buffer(base, left, type, sizep); + } else { + const unsigned char *cp = base; + uintmax_t val = decode_varint(cp); + used = cp - base; + type = val 0xf; + *sizep = val 4; + } if (!used) { type = OBJ_BAD; } else -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/38] packv4-create: normalize pack name to properly generate the pack index file name
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 73 +++-- 1 file changed, 34 insertions(+), 39 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 22cdf8e..c23c791 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -956,56 +956,46 @@ static off_t packv4_write_object(struct sha1file *f, struct packed_git *p, return hdrlen + size; } -static struct packed_git *open_pack(const char *path) +static char *normalize_pack_name(const char *path) { - char arg[PATH_MAX]; + char buf[PATH_MAX]; int len; - struct packed_git *p; - len = strlcpy(arg, path, PATH_MAX); - if (len = PATH_MAX) { - error(name too long: %s, path); - return NULL; - } + len = strlcpy(buf, path, PATH_MAX); + if (len = PATH_MAX - 6) + die(name too long: %s, path); /* * In addition to foo.idx we accept foo.pack and foo; -* normalize these forms to foo.idx for add_packed_git(). +* normalize these forms to foo.pack. */ - if (has_extension(arg, .pack)) { - strcpy(arg + len - 5, .idx); - len--; - } else if (!has_extension(arg, .idx)) { - if (len + 4 = PATH_MAX) { - error(name too long: %s.idx, arg); - return NULL; - } - strcpy(arg + len, .idx); - len += 4; + if (has_extension(buf, .idx)) { + strcpy(buf + len - 4, .pack); + len++; + } else if (!has_extension(buf, .pack)) { + strcpy(buf + len, .pack); + len += 5; } - /* -* add_packed_git() uses our buffer (containing foo.idx) to -* build the pack filename (foo.pack). Make sure it fits. -*/ - if (len + 1 = PATH_MAX) { - arg[len - 4] = '\0'; - error(name too long: %s.pack, arg); - return NULL; - } + return xstrdup(buf); +} - p = add_packed_git(arg, len, 1); - if (!p) { - error(packfile %s not found., arg); - return NULL; - } +static struct packed_git *open_pack(const char *path) +{ + char *packname = normalize_pack_name(path); + int len = strlen(packname); + struct packed_git *p; + + strcpy(packname + len - 5, .idx); + p = add_packed_git(packname, len - 1, 1); + if (!p) + die(packfile %s not found., packname); install_packed_git(p); - if (open_pack_index(p)) { - error(packfile %s index not opened, p-pack_name); - return NULL; - } + if (open_pack_index(p)) + die(packfile %s index not opened, p-pack_name); + free(packname); return p; } @@ -1017,6 +1007,7 @@ static void process_one_pack(char *src_pack, char *dst_pack) struct pack_idx_option idx_opts; unsigned i, nr_objects; off_t written = 0; + char *packname; unsigned char pack_sha1[20]; p = open_pack(src_pack); @@ -1031,7 +1022,8 @@ static void process_one_pack(char *src_pack, char *dst_pack) sort_dict_entries_by_hits(commit_name_table); sort_dict_entries_by_hits(tree_path_table); - f = packv4_open(dst_pack); + packname = normalize_pack_name(dst_pack); + f = packv4_open(packname); if (!f) die(unable to open destination pack); written += packv4_write_header(f, nr_objects); @@ -1053,7 +1045,10 @@ static void process_one_pack(char *src_pack, char *dst_pack) reset_pack_idx_option(idx_opts); idx_opts.version = 3; - write_idx_file(dst_pack, p_objs, nr_objects, idx_opts, pack_sha1); + strcpy(packname + strlen(packname) - 5, .idx); + write_idx_file(packname, p_objs, nr_objects, idx_opts, pack_sha1); + + free(packname); } static int git_pack_config(const char *k, const char *v, void *cb) -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/38] pack v4: basic SHA1 reference encoding
The SHA1 reference is either an index into a SHA1 table using the variable length number encoding, or the literal 20 bytes SHA1 prefixed with a 0. The index 0 discriminates between an actual index value or the literal SHA1. Therefore when the index is used its value must be increased by 1. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 29 + 1 file changed, 29 insertions(+) diff --git a/packv4-create.c b/packv4-create.c index 012129b..12527c0 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -12,6 +12,7 @@ #include object.h #include tree-walk.h #include pack.h +#include varint.h struct data_entry { unsigned offset; @@ -245,6 +246,34 @@ static void dict_dump(void) dump_dict_table(tree_path_table); } +/* + * Encode an object SHA1 reference with either an object index into the + * pack SHA1 table incremented by 1, or the literal SHA1 value prefixed + * with a zero byte if the needed SHA1 is not available in the table. + */ +static struct pack_idx_entry *all_objs; +static unsigned all_objs_nr; +static int encode_sha1ref(const unsigned char *sha1, unsigned char *buf) +{ + unsigned lo = 0, hi = all_objs_nr; + + do { + unsigned mi = (lo + hi) / 2; + int cmp = hashcmp(all_objs[mi].sha1, sha1); + + if (cmp == 0) + return encode_varint(mi + 1, buf); + if (cmp 0) + hi = mi; + else + lo = mi+1; + } while (lo hi); + + *buf++ = 0; + hashcpy(buf, sha1); + return 1 + 20; +} + static struct pack_idx_entry *get_packed_object_list(struct packed_git *p) { unsigned i, nr_objects = p-num_objects; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/38] pack v4: tree object delta encoding
In order to be able to quickly walk tree objects, let's encode their delta as a range of entries into another tree object. In order to discriminate between a copy sequence from a regular entry, the entry index LSB is reserved to indicate a copy sequence. Therefore the actual index of a path component is shifted left one bit. The encoding allows for the base object to change so multiple base objects can be borrowed from. The code doesn't try to exploit this possibility at the moment though. The code isn't optimal at the moment as it doesn't consider the case where a copy sequence could be larger than the local sequence it means to replace. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 108 +--- 1 file changed, 103 insertions(+), 5 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 5d76234..6830a0a 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -394,24 +394,53 @@ bad: return NULL; } +static int compare_tree_entries(struct name_entry *e1, struct name_entry *e2) +{ + int len1 = tree_entry_len(e1); + int len2 = tree_entry_len(e2); + int len = len1 len2 ? len1 : len2; + unsigned char c1, c2; + int cmp; + + cmp = memcmp(e1-path, e2-path, len); + if (cmp) + return cmp; + c1 = e1-path[len]; + c2 = e2-path[len]; + if (!c1 S_ISDIR(e1-mode)) + c1 = '/'; + if (!c2 S_ISDIR(e2-mode)) + c2 = '/'; + return c1 - c2; +} + /* * This converts a canonical tree object buffer into its * tightly packed representation using the already populated * and sorted tree_path_table dictionary. The parsing is * strict so to ensure the canonical version may always be * regenerated and produce the same hash. + * + * If a delta buffer is provided, we may encode multiple ranges of tree + * entries against that buffer. */ -void *pv4_encode_tree(void *_buffer, unsigned long *sizep) +void *pv4_encode_tree(void *_buffer, unsigned long *sizep, + void *delta, unsigned long delta_size, + const unsigned char *delta_sha1) { unsigned long size = *sizep; unsigned char *in, *out, *end, *buffer = _buffer; - struct tree_desc desc; - struct name_entry name_entry; + struct tree_desc desc, delta_desc; + struct name_entry name_entry, delta_entry; int nb_entries; + unsigned int copy_start, copy_count = 0, delta_pos = 0, first_delta = 1; if (!size) return NULL; + if (!delta_size) + delta = NULL; + /* * We can't make sure the result will always be smaller than the * input. The smallest possible entry is 0 x\040 byte SHA1 @@ -434,9 +463,42 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep) out += encode_varint(nb_entries, out); init_tree_desc(desc, in, size); + if (delta) { + init_tree_desc(delta_desc, delta, delta_size); + if (!tree_entry(delta_desc, delta_entry)) + delta = NULL; + } + while (tree_entry(desc, name_entry)) { int pathlen, index; + /* +* Try to match entries against our delta object. +*/ + if (delta) { + int ret; + + do { + ret = compare_tree_entries(name_entry, delta_entry); + if (ret = 0 || copy_count != 0) + break; + delta_pos++; + if (!tree_entry(delta_desc, delta_entry)) + delta = NULL; + } while (delta); + + if (ret == 0 name_entry.mode == delta_entry.mode + hashcmp(name_entry.sha1, delta_entry.sha1) == 0) { + if (!copy_count) + copy_start = delta_pos; + copy_count++; + delta_pos++; + if (!tree_entry(delta_desc, delta_entry)) + delta = NULL; + continue; + } + } + if (end - out 48) { unsigned long sofar = out - buffer; buffer = xrealloc(buffer, (sofar + 48)*2); @@ -444,6 +506,32 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep) out = buffer + sofar; } + if (copy_count) { + /* +* Let's write a sequence indicating we're copying +* entries from another object: +* +
[PATCH 13/38] pack v4: creation code
Let's actually open the destination pack file and write the header and the tables. The header isn't much different from pack v3, except for the pack version number of course. The first table is the sorted SHA1 table normally found in the pack index file. With pack v4 we write this table in the main pack file instead as it is index referenced by subsequent objects in the pack. Doing so has many advantages: - The SHA1 references used to be duplicated on disk: once in the pack index file, and then at least once or more within commit and tree objects referencing them. The only SHA1 which is not being listed more than once this way is the one for a branch tip commit object and those are normally very few. Now all that SHA1 data is represented only once. - The SHA1 references found in commit and tree objects can be obtained on disk directly without having to deflate those objects first. The SHA1 table size is obtained by multiplying the number of objects by 20. And then the commit and path dictionary tables are written right after the SHA1 table. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 60 - 1 file changed, 55 insertions(+), 5 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 92d3662..61b70c8 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -593,6 +593,48 @@ static unsigned long write_dict_table(struct sha1file *f, struct dict_table *t) return hdrlen + datalen; } +static struct sha1file * packv4_open(char *path) +{ + int fd; + + fd = open(path, O_CREAT|O_EXCL|O_WRONLY, 0600); + if (fd 0) + die_errno(unable to create '%s', path); + return sha1fd(fd, path); +} + +static unsigned int packv4_write_header(struct sha1file *f, unsigned nr_objects) +{ + struct pack_header hdr; + + hdr.hdr_signature = htonl(PACK_SIGNATURE); + hdr.hdr_version = htonl(4); + hdr.hdr_entries = htonl(nr_objects); + sha1write(f, hdr, sizeof(hdr)); + + return sizeof(hdr); +} + +static unsigned long packv4_write_tables(struct sha1file *f, unsigned nr_objects, +struct pack_idx_entry *objs) +{ + unsigned i; + unsigned long written = 0; + + /* The sorted list of object SHA1's is always first */ + for (i = 0; i nr_objects; i++) + sha1write(f, objs[i].sha1, 20); + written = 20 * nr_objects; + + /* Then the commit dictionary table */ + written += write_dict_table(f, commit_name_table); + + /* Followed by the path component dictionary table */ + written += write_dict_table(f, tree_path_table); + + return written; +} + static struct packed_git *open_pack(const char *path) { char arg[PATH_MAX]; @@ -646,9 +688,10 @@ static struct packed_git *open_pack(const char *path) return p; } -static void process_one_pack(char *src_pack) +static void process_one_pack(char *src_pack, char *dst_pack) { struct packed_git *p; + struct sha1file *f; struct pack_idx_entry *objs, **p_objs; unsigned nr_objects; @@ -661,15 +704,22 @@ static void process_one_pack(char *src_pack) p_objs = sort_objs_by_offset(objs, nr_objects); create_pack_dictionaries(p, p_objs); + + f = packv4_open(dst_pack); + if (!f) + die(unable to open destination pack); + packv4_write_header(f, nr_objects); + packv4_write_tables(f, nr_objects, objs); } int main(int argc, char *argv[]) { - if (argc != 2) { - fprintf(stderr, Usage: %s packfile\n, argv[0]); + if (argc != 3) { + fprintf(stderr, Usage: %s src_packfile dst_packfile\n, argv[0]); exit(1); } - process_one_pack(argv[1]); - dict_dump(); + process_one_pack(argv[1], argv[2]); + if (0) + dict_dump(); return 0; } -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/38] export packed_object_info()
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- cache.h | 1 + sha1_file.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index 85b544f..b6634c4 100644 --- a/cache.h +++ b/cache.h @@ -1160,6 +1160,7 @@ struct object_info { } u; }; extern int sha1_object_info_extended(const unsigned char *, struct object_info *); +extern int packed_object_info(struct packed_git *, off_t, struct object_info *); /* Dumb servers support */ extern int update_server_info(int); diff --git a/sha1_file.c b/sha1_file.c index 8e27db1..c2020d0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1782,8 +1782,8 @@ unwind: goto out; } -static int packed_object_info(struct packed_git *p, off_t obj_offset, - struct object_info *oi) +int packed_object_info(struct packed_git *p, off_t obj_offset, + struct object_info *oi) { struct pack_window *w_curs = NULL; unsigned long size; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/38] introduce get_sha1_lowhex()
This is like get_sha1_hex() but stricter in accepting lowercase letters only. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- cache.h | 3 +++ hex.c | 11 +++ 2 files changed, 14 insertions(+) diff --git a/cache.h b/cache.h index b6634c4..4231dfa 100644 --- a/cache.h +++ b/cache.h @@ -850,8 +850,11 @@ extern int for_each_abbrev(const char *prefix, each_abbrev_fn, void *); * Return 0 on success. Reading stops if a NUL is encountered in the * input, so it is safe to pass this function an arbitrary * null-terminated string. + * + * The low version accepts numbers and lowercase letters only. */ extern int get_sha1_hex(const char *hex, unsigned char *sha1); +extern int get_sha1_lowhex(const char *hex, unsigned char *sha1); extern char *sha1_to_hex(const unsigned char *sha1); /* static buffer result! */ extern int read_ref_full(const char *refname, unsigned char *sha1, diff --git a/hex.c b/hex.c index 9ebc050..1d7eae1 100644 --- a/hex.c +++ b/hex.c @@ -56,6 +56,17 @@ int get_sha1_hex(const char *hex, unsigned char *sha1) return 0; } +int get_sha1_lowhex(const char *hex, unsigned char *sha1) +{ + int i; + + /* uppercase letters (as well as '\0') have bit 5 clear */ + for (i = 0; i 20; i++) + if (!(hex[i] 0x20)) + return -1; + return get_sha1_hex(hex, sha1); +} + char *sha1_to_hex(const unsigned char *sha1) { static int bufno; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 27/38] pack v4: code to obtain a SHA1 from a sha1ref
Let's start actually parsing pack v4 data. Here's the first item. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- Makefile | 1 + packv4-parse.c | 30 ++ 2 files changed, 31 insertions(+) create mode 100644 packv4-parse.c diff --git a/Makefile b/Makefile index 4716113..ba6cafc 100644 --- a/Makefile +++ b/Makefile @@ -838,6 +838,7 @@ LIB_OBJS += object.o LIB_OBJS += pack-check.o LIB_OBJS += pack-revindex.o LIB_OBJS += pack-write.o +LIB_OBJS += packv4-parse.o LIB_OBJS += pager.o LIB_OBJS += parse-options.o LIB_OBJS += parse-options-cb.o diff --git a/packv4-parse.c b/packv4-parse.c new file mode 100644 index 000..299fc48 --- /dev/null +++ b/packv4-parse.c @@ -0,0 +1,30 @@ +/* + * Code to parse pack v4 object encoding + * + * (C) Nicolas Pitre n...@fluxnic.net + * + * This code is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include cache.h +#include varint.h + +const unsigned char *get_sha1ref(struct packed_git *p, +const unsigned char **bufp) +{ + const unsigned char *sha1; + + if (!**bufp) { + sha1 = *bufp + 1; + *bufp += 21; + } else { + unsigned int index = decode_varint(bufp); + if (index 1 || index - 1 p-num_objects) + die(bad index in %s, __func__); + sha1 = p-sha1_table + (index - 1) * 20; + } + + return sha1; +} -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/38] pack v4: load delta candidate for encoding tree objects
The SHA1 of the base object is retrieved and the corresponding object is loaded in memory for pv4_encode_tree() to look at. Simple but effective. Obviously this relies on the delta matching already performed during the pack v3 delta search. Some native delta search for pack v4 could be investigated eventually. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 63 ++--- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 6830a0a..15c5959 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -820,18 +820,56 @@ static unsigned long copy_object_data(struct sha1file *f, struct packed_git *p, return written; } +static unsigned char *get_delta_base(struct packed_git *p, off_t offset, +unsigned char *sha1_buf) +{ + struct pack_window *w_curs = NULL; + enum object_type type; + unsigned long avail, size; + int hdrlen; + unsigned char *src; + const unsigned char *base_sha1 = NULL; ; + + src = use_pack(p, w_curs, offset, avail); + hdrlen = unpack_object_header_buffer(src, avail, type, size); + + if (type == OBJ_OFS_DELTA) { + const unsigned char *cp = src + hdrlen; + off_t base_offset = decode_varint(cp); + base_offset = offset - base_offset; + if (base_offset = 0 || base_offset = offset) { + error(delta offset out of bound); + } else { + struct revindex_entry *revidx; + revidx = find_pack_revindex(p, base_offset); + base_sha1 = nth_packed_object_sha1(p, revidx-nr); + } + } else if (type == OBJ_REF_DELTA) { + base_sha1 = src + hdrlen; + } else + error(expected to get a delta but got a %s, typename(type)); + + unuse_pack(w_curs); + + if (!base_sha1) + return NULL; + hashcpy(sha1_buf, base_sha1); + return sha1_buf; +} + static off_t packv4_write_object(struct sha1file *f, struct packed_git *p, struct pack_idx_entry *obj) { void *src, *result; struct object_info oi = {}; - enum object_type type; + enum object_type type, packed_type; unsigned long size; unsigned int hdrlen; oi.typep = type; oi.sizep = size; - if (packed_object_info(p, obj-offset, oi) 0) + packed_type = packed_object_info(p, obj-offset, oi); + if (packed_type 0) die(cannot get type of %s from %s, sha1_to_hex(obj-sha1), p-pack_name); @@ -859,7 +897,26 @@ static off_t packv4_write_object(struct sha1file *f, struct packed_git *p, result = pv4_encode_commit(src, size); break; case OBJ_TREE: - result = pv4_encode_tree(src, size, NULL, 0, NULL); + if (packed_type != OBJ_TREE) { + unsigned char sha1_buf[20], *ref_sha1; + void *ref; + enum object_type ref_type; + unsigned long ref_size; + + ref_sha1 = get_delta_base(p, obj-offset, sha1_buf); + if (!ref_sha1) + die(unable to get delta base sha1 for %s, + sha1_to_hex(obj-sha1)); + ref = read_sha1_file(ref_sha1, ref_type, ref_size); + if (!ref || ref_type != OBJ_TREE) + die(cannot obtain delta base for %s, + sha1_to_hex(obj-sha1)); + result = pv4_encode_tree(src, size, +ref, ref_size, ref_sha1); + free(ref); + } else { + result = pv4_encode_tree(src, size, NULL, 0, NULL); + } break; default: die(unexpected object type %d, type); -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/38] pack v4: add commit object parsing
Let's create another dictionary table to hold the author and committer entries. We use the same table format used for tree entries where the 16 bit data prefix is conveniently used to store the timezone value. In order to copy straight from a commit object buffer, dict_add_entry() is modified to get the string length as the provided string pointer is not always be null terminated. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 98 +++-- 1 file changed, 89 insertions(+), 9 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index eccd9fc..5c08871 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -1,5 +1,5 @@ /* - * packv4-create.c: management of dictionary tables used in pack v4 + * packv4-create.c: creation of dictionary tables and objects used in pack v4 * * (C) Nicolas Pitre n...@fluxnic.net * @@ -80,9 +80,9 @@ static void rehash_entries(struct dict_table *t) } } -int dict_add_entry(struct dict_table *t, int val, const char *str) +int dict_add_entry(struct dict_table *t, int val, const char *str, int str_len) { - int i, val_len = 2, str_len = strlen(str) + 1; + int i, val_len = 2; if (t-ptr + val_len + str_len t-size) { t-size = (t-size + val_len + str_len + 1024) * 3 / 2; @@ -92,6 +92,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str) t-data[t-ptr] = val 8; t-data[t-ptr + 1] = val; memcpy(t-data + t-ptr + val_len, str, str_len); + t-data[t-ptr + val_len + str_len] = 0; i = (t-nb_entries) ? locate_entry(t, t-data + t-ptr, val_len + str_len) : -1; @@ -107,7 +108,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str) t-entry[t-nb_entries].offset = t-ptr; t-entry[t-nb_entries].size = val_len + str_len; t-entry[t-nb_entries].hits = 1; - t-ptr += val_len + str_len; + t-ptr += val_len + str_len + 1; t-nb_entries++; if (t-hash_size * 3 = t-nb_entries * 4) @@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct dict_table *t) rehash_entries(t); } +static struct dict_table *commit_name_table; static struct dict_table *tree_path_table; +/* + * Parse the author/committer line from a canonical commit object. + * The 'from' argument points right after the author or committer + * string. The time zone is parsed and stored in *tz_val. The returned + * pointer is right after the end of the email address which is also just + * before the time value, or NULL if a parsing error is encountered. + */ +static char *get_nameend_and_tz(char *from, int *tz_val) +{ + char *end, *tz; + + tz = strchr(from, '\n'); + /* let's assume the smallest possible string to be x x 0 +\n */ + if (!tz || tz - from 13) + return NULL; + tz -= 4; + end = tz - 4; + while (end - from 5 *end != ' ') + end--; + if (end[-1] != '' || end[0] != ' ' || tz[-2] != ' ') + return NULL; + *tz_val = (tz[0] - '0') * 1000 + + (tz[1] - '0') * 100 + + (tz[2] - '0') * 10 + + (tz[3] - '0'); + switch (tz[-1]) { + default:return NULL; + case '+': break; + case '-': *tz_val = -*tz_val; + } + return end; +} + +static int add_commit_dict_entries(void *buf, unsigned long size) +{ + char *name, *end = NULL; + int tz_val; + + if (!commit_name_table) + commit_name_table = create_dict_table(); + + /* parse and add author info */ + name = strstr(buf, \nauthor ); + if (name) { + name += 8; + end = get_nameend_and_tz(name, tz_val); + } + if (!name || !end) + return -1; + dict_add_entry(commit_name_table, tz_val, name, end - name); + + /* parse and add committer info */ + name = strstr(end, \ncommitter ); + if (name) { + name += 11; + end = get_nameend_and_tz(name, tz_val); + } + if (!name || !end) + return -1; + dict_add_entry(commit_name_table, tz_val, name, end - name); + + return 0; +} + static int add_tree_dict_entries(void *buf, unsigned long size) { struct tree_desc desc; @@ -146,13 +212,16 @@ static int add_tree_dict_entries(void *buf, unsigned long size) tree_path_table = create_dict_table(); init_tree_desc(desc, buf, size); - while (tree_entry(desc, name_entry)) + while (tree_entry(desc, name_entry)) { + int pathlen = tree_entry_len(name_entry); dict_add_entry(tree_path_table, name_entry.mode, - name_entry.path); + name_entry.path, pathlen); + } + return 0; } -void dict_dump(struct dict_table *t) +void
[PATCH 15/38] pack v4: object data copy
Blob and tag objects have no particular changes except for their object header. Delta objects are also copied as is, except for their delta base reference which is converted to the new way as used elsewhere in pack v4 encoding i.e. an index into the SHA1 table or a literal SHA1 prefixed by 0 if not found in the table (see encode_sha1ref). This is true for both REF_DELTA as well as OFS_DELTA. Object payload is validated against the recorded CRC32 in the source pack index file when possible before being copied. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 60 + 1 file changed, 60 insertions(+) diff --git a/packv4-create.c b/packv4-create.c index 6098062..b0e344f 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -12,6 +12,7 @@ #include object.h #include tree-walk.h #include pack.h +#include pack-revindex.h #include varint.h @@ -662,6 +663,65 @@ static int write_object_header(struct sha1file *f, enum object_type type, unsign return len; } +static unsigned long copy_object_data(struct sha1file *f, struct packed_git *p, + off_t offset) +{ + struct pack_window *w_curs = NULL; + struct revindex_entry *revidx; + enum object_type type; + unsigned long avail, size, datalen, written; + int hdrlen, reflen, idx_nr; + unsigned char *src, buf[24]; + + revidx = find_pack_revindex(p, offset); + idx_nr = revidx-nr; + datalen = revidx[1].offset - offset; + + src = use_pack(p, w_curs, offset, avail); + hdrlen = unpack_object_header_buffer(src, avail, type, size); + + written = write_object_header(f, type, size); + + if (type == OBJ_OFS_DELTA) { + const unsigned char *cp = src + hdrlen; + off_t base_offset = decode_varint(cp); + hdrlen = cp - src; + base_offset = offset - base_offset; + if (base_offset = 0 || base_offset = offset) + die(delta offset out of bound); + revidx = find_pack_revindex(p, base_offset); + reflen = encode_sha1ref(nth_packed_object_sha1(p, revidx-nr), buf); + sha1write(f, buf, reflen); + written += reflen; + } else if (type == OBJ_REF_DELTA) { + reflen = encode_sha1ref(src + hdrlen, buf); + hdrlen += 20; + sha1write(f, buf, reflen); + written += reflen; + } + + if (p-index_version 1 + check_pack_crc(p, w_curs, offset, datalen, idx_nr)) + die(bad CRC for object at offset %PRIuMAX in %s, + (uintmax_t)offset, p-pack_name); + + offset += hdrlen; + datalen -= hdrlen; + + while (datalen) { + src = use_pack(p, w_curs, offset, avail); + if (avail datalen) + avail = datalen; + sha1write(f, src, avail); + written += avail; + offset += avail; + datalen -= avail; + } + unuse_pack(w_curs); + + return written; +} + static struct packed_git *open_pack(const char *path) { char arg[PATH_MAX]; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 25/38] pack v4: initial pack index v3 support on the read side
A bit crud but good enough for now. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- cache.h | 1 + pack-check.c| 4 +++- pack-revindex.c | 7 --- sha1_file.c | 56 ++-- 4 files changed, 58 insertions(+), 10 deletions(-) diff --git a/cache.h b/cache.h index 4231dfa..c939b60 100644 --- a/cache.h +++ b/cache.h @@ -1021,6 +1021,7 @@ extern struct packed_git { off_t pack_size; const void *index_data; size_t index_size; + const unsigned char *sha1_table; uint32_t num_objects; uint32_t num_bad_objects; unsigned char *bad_object_sha1; diff --git a/pack-check.c b/pack-check.c index 63a595c..8200f24 100644 --- a/pack-check.c +++ b/pack-check.c @@ -25,6 +25,7 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, { const uint32_t *index_crc; uint32_t data_crc = crc32(0, NULL, 0); + unsigned sha1_table; do { unsigned long avail; @@ -36,8 +37,9 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, len -= avail; } while (len); + sha1_table = p-index_version 3 ? (p-num_objects * (20/4)) : 0; index_crc = p-index_data; - index_crc += 2 + 256 + p-num_objects * (20/4) + nr; + index_crc += 2 + 256 + sha1_table + nr; return data_crc != ntohl(*index_crc); } diff --git a/pack-revindex.c b/pack-revindex.c index b4d2b35..739a568 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -170,9 +170,10 @@ static void create_pack_revindex(struct pack_revindex *rix) index += 4 * 256; if (p-index_version 1) { - const uint32_t *off_32 = - (uint32_t *)(index + 8 + p-num_objects * (20 + 4)); - const uint32_t *off_64 = off_32 + p-num_objects; + const uint32_t *off_32, *off_64; + unsigned sha1 = p-index_version 3 ? 20 : 0; + off_32 = (uint32_t *)(index + 8 + p-num_objects * (sha1 + 4)); + off_64 = off_32 + p-num_objects; for (i = 0; i num_ent; i++) { uint32_t off = ntohl(*off_32++); if (!(off 0x8000)) { diff --git a/sha1_file.c b/sha1_file.c index c2020d0..5c63781 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -504,7 +504,7 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) hdr = idx_map; if (hdr-idx_signature == htonl(PACK_IDX_SIGNATURE)) { version = ntohl(hdr-idx_version); - if (version 2 || version 2) { + if (version 2 || version 3) { munmap(idx_map, idx_size); return error(index file %s is version %PRIu32 and is not supported by this binary @@ -539,12 +539,13 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) munmap(idx_map, idx_size); return error(wrong index v1 file size in %s, path); } - } else if (version == 2) { + } else if (version == 2 || version == 3) { + unsigned long min_size, max_size; /* * Minimum size: * - 8 bytes of header * - 256 index entries 4 bytes each -* - 20-byte sha1 entry * nr +* - 20-byte sha1 entry * nr (version 2 only) * - 4-byte crc entry * nr * - 4-byte offset entry * nr * - 20-byte SHA1 of the packfile @@ -553,8 +554,10 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) * variable sized table containing 8-byte entries * for offsets larger than 2^31. */ - unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20; - unsigned long max_size = min_size; + min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20; + if (version != 2) + min_size -= nr*20; + max_size = min_size; if (nr) max_size += (nr - 1)*8; if (idx_size min_size || idx_size max_size) { @@ -573,6 +576,36 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) } } + if (version = 3) { + /* the SHA1 table is located in the main pack file */ + void *pack_map; + struct pack_header *pack_hdr; + + fd = git_open_noatime(p-pack_name); + if (fd 0) { + munmap(idx_map, idx_size); + return error(unable to open %s, p-pack_name); + } + if (fstat(fd, st) != 0 || xsize_t(st.st_size) 12 + nr*20) { + close(fd); +
[PATCH 19/38] packv4-create: optimize delta encoding
Make sure the copy sequence is smaller than the list of tree entries it is meant to replace. We do so by encoding tree entries in parallel with the delta entry comparison, and then comparing the length of both sequences. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 65 +++-- 1 file changed, 45 insertions(+), 20 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 15c5959..c8d3053 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -433,7 +433,8 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, struct tree_desc desc, delta_desc; struct name_entry name_entry, delta_entry; int nb_entries; - unsigned int copy_start, copy_count = 0, delta_pos = 0, first_delta = 1; + unsigned int copy_start = 0, copy_count = 0, copy_pos = 0, copy_end = 0; + unsigned int delta_pos = 0, first_delta = 1; if (!size) return NULL; @@ -489,24 +490,23 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, if (ret == 0 name_entry.mode == delta_entry.mode hashcmp(name_entry.sha1, delta_entry.sha1) == 0) { - if (!copy_count) + if (!copy_count) { copy_start = delta_pos; + copy_pos = out - buffer; + copy_end = 0; + } copy_count++; delta_pos++; if (!tree_entry(delta_desc, delta_entry)) delta = NULL; - continue; - } - } + } else + copy_end = 1; + } else + copy_end = 1; - if (end - out 48) { - unsigned long sofar = out - buffer; - buffer = xrealloc(buffer, (sofar + 48)*2); - end = buffer + (sofar + 48)*2; - out = buffer + sofar; - } + if (copy_count copy_end) { + unsigned char copy_buf[48], *cp = copy_buf; - if (copy_count) { /* * Let's write a sequence indicating we're copying * entries from another object: @@ -524,12 +524,31 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, */ copy_start = (copy_start 1) | 1; copy_count = (copy_count 1) | first_delta; - out += encode_varint(copy_start, out); - out += encode_varint(copy_count, out); + cp += encode_varint(copy_start, cp); + cp += encode_varint(copy_count, cp); if (first_delta) - out += encode_sha1ref(delta_sha1, out); + cp += encode_sha1ref(delta_sha1, cp); copy_count = 0; - first_delta = 0; + + /* +* Now let's make sure this is going to take less +* space than the corresponding direct entries we've +* created in parallel. If so we dump the copy +* sequence over those entries in the output buffer. +*/ + if (cp - copy_buf out - buffer[copy_pos]) { + out = buffer + copy_pos; + memcpy(out, copy_buf, cp - copy_buf); + out += cp - copy_buf; + first_delta = 0; + } + } + + if (end - out 48) { + unsigned long sofar = out - buffer; + buffer = xrealloc(buffer, (sofar + 48)*2); + end = buffer + (sofar + 48)*2; + out = buffer + sofar; } pathlen = tree_entry_len(name_entry); @@ -545,13 +564,19 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep, } if (copy_count) { - /* flush the trailing copy */ + /* process the trailing copy */ + unsigned char copy_buf[48], *cp = copy_buf; copy_start = (copy_start 1) | 1; copy_count = (copy_count 1) | first_delta; - out += encode_varint(copy_start, out); - out += encode_varint(copy_count, out); + cp += encode_varint(copy_start, cp); + cp += encode_varint(copy_count, cp); if (first_delta) -
[PATCH 30/38] pack v4: code to recreate a canonical commit object
Usage of snprintf() is possibly not the most efficient approach. For example we could simply copy the needed strings and generate the SHA1 hex strings directly into the destination buffer. But such optimizations may come later. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-parse.c | 74 ++ 1 file changed, 74 insertions(+) diff --git a/packv4-parse.c b/packv4-parse.c index 074e107..bca1a97 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -129,3 +129,77 @@ const unsigned char *get_nameref(struct packed_git *p, const unsigned char **src } return p-name_dict-data + p-name_dict-offsets[index]; } + +void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs, +off_t offset, unsigned long size) +{ + unsigned long avail; + git_zstream stream; + int len, st; + unsigned int nb_parents; + unsigned char *dst, *dcp; + const unsigned char *src, *scp, *sha1, *name; + unsigned long time; + int16_t tz; + + dst = xmallocz(size); + dcp = dst; + + src = use_pack(p, w_curs, offset, avail); + scp = src; + + sha1 = get_sha1ref(p, scp); + len = snprintf((char *)dcp, size, tree %s\n, sha1_to_hex(sha1)); + dcp += len; + size -= len; + + nb_parents = decode_varint(scp); + while (nb_parents--) { + sha1 = get_sha1ref(p, scp); + len = snprintf((char *)dcp, size, parent %s\n, sha1_to_hex(sha1)); + if (len = size) + die(overflow in %s, __func__); + dcp += len; + size -= len; + } + + name = get_nameref(p, scp); + tz = (name[0] 8) | name[1]; + time = decode_varint(scp); + len = snprintf((char *)dcp, size, author %s %lu %+05d\n, name+2, time, tz); + if (len = size) + die(overflow in %s, __func__); + dcp += len; + size -= len; + + name = get_nameref(p, scp); + tz = (name[0] 8) | name[1]; + time = decode_varint(scp); + len = snprintf((char *)dcp, size, committer %s %lu %+05d\n, name+2, time, tz); + if (len = size) + die(overflow in %s, __func__); + dcp += len; + size -= len; + + if (scp - src avail) + die(overflow in %s, __func__); + offset += scp - src; + + memset(stream, 0, sizeof(stream)); + stream.next_out = dcp; + stream.avail_out = size + 1; + git_inflate_init(stream); + do { + src = use_pack(p, w_curs, offset, stream.avail_in); + stream.next_in = (unsigned char *)src; + st = git_inflate(stream, Z_FINISH); + offset += stream.next_in - src; + } while ((st == Z_OK || st == Z_BUF_ERROR) stream.avail_out); + git_inflate_end(stream); + if (st != Z_STREAM_END || stream.total_out != size) { + free(dst); + return NULL; + } + + return dst; +} -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/38] pack v4: tree object encoding
This goes as follows: - Number of tree entries: variable length encoded. Then for each tree entry: - Path component reference: variable length encoded index into the path dictionary table which also covers the entry mode. To make the overall encoding efficient, the path table is already sorted by usage frequency so the most used path names are first and require the shortest index encoding. - SHA1 reference: either variable length encoding of the index into the SHA1 table or the literal SHA1 prefixed by 0 (see encode_sha1ref()). Rationale: all the tree object data is densely encoded while requiring no zlib inflate processing on access, and all SHA1 references are most likely to be direct indices into the pack index file requiring no SHA1 search. Path filtering can be accomplished on the path index directly without any string comparison during the tree traversal. Still lacking is some kind of delta encoding for multiple tree objects with only small differences between them. But that'll come later. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 66 + 1 file changed, 66 insertions(+) diff --git a/packv4-create.c b/packv4-create.c index d4a79f4..b91ee0b 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -393,6 +393,72 @@ bad: return NULL; } +/* + * This converts a canonical tree object buffer into its + * tightly packed representation using the already populated + * and sorted tree_path_table dictionary. The parsing is + * strict so to ensure the canonical version may always be + * regenerated and produce the same hash. + */ +void *pv4_encode_tree(void *_buffer, unsigned long *sizep) +{ + unsigned long size = *sizep; + unsigned char *in, *out, *end, *buffer = _buffer; + struct tree_desc desc; + struct name_entry name_entry; + int nb_entries; + + if (!size) + return NULL; + + /* +* We can't make sure the result will always be smaller than the +* input. The smallest possible entry is 0 x\040 byte SHA1 +* or 44 bytes. The output entry may have a realistic path index +* encoding using up to 3 bytes, and a non indexable SHA1 meaning +* 41 bytes. And the output data already has the nb_entries +* headers. In practice the output size will be significantly +* smaller but for now let's make it simple. +*/ + in = buffer; + out = xmalloc(size + 48); + end = out + size + 48; + buffer = out; + + /* let's count how many entries there are */ + init_tree_desc(desc, in, size); + nb_entries = 0; + while (tree_entry(desc, name_entry)) + nb_entries++; + out += encode_varint(nb_entries, out); + + init_tree_desc(desc, in, size); + while (tree_entry(desc, name_entry)) { + int pathlen, index; + + if (end - out 48) { + unsigned long sofar = out - buffer; + buffer = xrealloc(buffer, (sofar + 48)*2); + end = buffer + (sofar + 48)*2; + out = buffer + sofar; + } + + pathlen = tree_entry_len(name_entry); + index = dict_add_entry(tree_path_table, name_entry.mode, + name_entry.path, pathlen); + if (index 0) { + error(missing tree dict entry); + free(buffer); + return NULL; + } + out += encode_varint(index, out); + out += encode_sha1ref(name_entry.sha1, out); + } + + *sizep = out - buffer; + return buffer; +} + static struct pack_idx_entry *get_packed_object_list(struct packed_git *p) { unsigned i, nr_objects = p-num_objects; -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/38] packv4-create: add progress display
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/packv4-create.c b/packv4-create.c index c23c791..fd16222 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -13,6 +13,7 @@ #include tree-walk.h #include pack.h #include pack-revindex.h +#include progress.h #include varint.h @@ -627,8 +628,10 @@ static struct pack_idx_entry **sort_objs_by_offset(struct pack_idx_entry *list, static int create_pack_dictionaries(struct packed_git *p, struct pack_idx_entry **obj_list) { + struct progress *progress_state; unsigned int i; + progress_state = start_progress(Scanning objects, p-num_objects); for (i = 0; i p-num_objects; i++) { struct pack_idx_entry *obj = obj_list[i]; void *data; @@ -637,6 +640,8 @@ static int create_pack_dictionaries(struct packed_git *p, struct object_info oi = {}; int (*add_dict_entries)(void *, unsigned long); + display_progress(progress_state, i+1); + oi.typep = type; oi.sizep = size; if (packed_object_info(p, obj-offset, oi) 0) @@ -666,6 +671,7 @@ static int create_pack_dictionaries(struct packed_git *p, free(data); } + stop_progress(progress_state); return 0; } @@ -1009,6 +1015,7 @@ static void process_one_pack(char *src_pack, char *dst_pack) off_t written = 0; char *packname; unsigned char pack_sha1[20]; + struct progress *progress_state; p = open_pack(src_pack); if (!p) @@ -1030,6 +1037,7 @@ static void process_one_pack(char *src_pack, char *dst_pack) written += packv4_write_tables(f, nr_objects, objs); /* Let's write objects out, updating the object index list in place */ + progress_state = start_progress(Writing objects, nr_objects); all_objs = objs; all_objs_nr = nr_objects; for (i = 0; i nr_objects; i++) { @@ -1039,7 +1047,9 @@ static void process_one_pack(char *src_pack, char *dst_pack) written += packv4_write_object(f, p, obj); obj-offset = obj_pos; obj-crc32 = crc32_end(f); + display_progress(progress_state, i+1); } + stop_progress(progress_state); sha1close(f, pack_sha1, CSUM_CLOSE | CSUM_FSYNC); -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/38] pack v4: split the object list and dictionary creation
Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 58 +++-- 1 file changed, 44 insertions(+), 14 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index 5c08871..20d97a4 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -261,7 +261,7 @@ static int sort_by_offset(const void *e1, const void *e2) return 0; } -static int create_pack_dictionaries(struct packed_git *p) +static struct idx_entry *get_packed_object_list(struct packed_git *p) { uint32_t nr_objects, i; struct idx_entry *objects; @@ -275,7 +275,15 @@ static int create_pack_dictionaries(struct packed_git *p) } qsort(objects, nr_objects, sizeof(*objects), sort_by_offset); - for (i = 0; i nr_objects; i++) { + return objects; +} + +static int create_pack_dictionaries(struct packed_git *p, + struct idx_entry *objects) +{ + unsigned int i; + + for (i = 0; i p-num_objects; i++) { void *data; enum object_type type; unsigned long size; @@ -310,20 +318,21 @@ static int create_pack_dictionaries(struct packed_git *p) typename(type), sha1_to_hex(objects[i].sha1)); free(data); } - free(objects); return 0; } -static int process_one_pack(const char *path) +static struct packed_git *open_pack(const char *path) { char arg[PATH_MAX]; int len; struct packed_git *p; len = strlcpy(arg, path, PATH_MAX); - if (len = PATH_MAX) - return error(name too long: %s, path); + if (len = PATH_MAX) { + error(name too long: %s, path); + return NULL; + } /* * In addition to foo.idx we accept foo.pack and foo; @@ -333,8 +342,10 @@ static int process_one_pack(const char *path) strcpy(arg + len - 5, .idx); len--; } else if (!has_extension(arg, .idx)) { - if (len + 4 = PATH_MAX) - return error(name too long: %s.idx, arg); + if (len + 4 = PATH_MAX) { + error(name too long: %s.idx, arg); + return NULL; + } strcpy(arg + len, .idx); len += 4; } @@ -345,17 +356,36 @@ static int process_one_pack(const char *path) */ if (len + 1 = PATH_MAX) { arg[len - 4] = '\0'; - return error(name too long: %s.pack, arg); + error(name too long: %s.pack, arg); + return NULL; } p = add_packed_git(arg, len, 1); - if (!p) - return error(packfile %s not found., arg); + if (!p) { + error(packfile %s not found., arg); + return NULL; + } install_packed_git(p); - if (open_pack_index(p)) - return error(packfile %s index not opened, p-pack_name); - return create_pack_dictionaries(p); + if (open_pack_index(p)) { + error(packfile %s index not opened, p-pack_name); + return NULL; + } + + return p; +} + +static void process_one_pack(char *src_pack) +{ + struct packed_git *p; + struct idx_entry *objs; + + p = open_pack(src_pack); + if (!p) + die(unable to open source pack); + + objs = get_packed_object_list(p); + create_pack_dictionaries(p, objs); } int main(int argc, char *argv[]) -- 1.8.4.38.g317e65b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Sep 2013, #01; Tue, 3)
Jens Lehmann jens.lehm...@web.de writes: Am 04.09.2013 21:19, schrieb Junio C Hamano: Matthieu Moy matthieu@grenoble-inp.fr writes: Junio C Hamano gits...@pobox.com writes: * mm/status-without-comment-char (2013-08-29) 5 commits - status: introduce status.displayCommentChar to disable display of # - SQUASH: do not fprintf() random strings - get rid of git submodule summary --for-status - wt-status: use argv_array API - builtin/stripspace.c: fix broken indentation Allow git status to omit the prefix to make its output a comment in a commit log editor, which is not necessary for human consumption. I'm waiting for the situation of git submodule --for-status to be settled to send a reroll. Don't merge this for now, and let bc/submodule-status-ignored converge. Thanks. I had an impression that bc/submodule-status-ignored was still being discussed. Has what I have in 'pu' settled? Almost (see my other mail concerning bc/submodule-status-ignored). If the removal of the --for-status option is reverted from patch #3 here (which I outlined in $gmane/233764) this series can be merged without really depending on bc/submodule-status-ignored (even though the temporarily unused --for-status option may look a bit strange until the latter is merged). I'll also need to reword the commit messages. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] submodule: don't print status output with ignore=all
Jens Lehmann jens.lehm...@web.de writes: Am 04.09.2013 08:31, schrieb Matthieu Moy: brian m. carlson sand...@crustytoothpaste.net writes: Tests are included which verify that this change has no effect on git submodule summary without the --for-status option. I still don't understand why this is needed. To avoid a change in behavior for git submodule summary, as that never honored the submodule.*.ignore nor the diff.ignoreSubmodules setting (and I don't think it ever should). I don't get it. If the goal is to keep the old behavior, then git status shouldn't be changed either. Fixing bugs needs to change the behavior. IOW, why was it a bug that git status showed ignored submodules and not a bug that git submodule summary did the same? Fine by me, what would you propose to clarify that? (Though I have the suspicion that the explanation will be three years late ;-) I have no idea, as I do not understand the reason myself yet. I'm not a frequent user of submodules and not a user of the ignore option at all, so I can't tell what's best. I'd just like the new behavior to be justified somewhere. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-gui: Modify push dialog to support Gerrit review
Am 9/2/2013 10:54, schrieb Joergen Edelbo: Changes done: Remove selection of branches to push - push always HEAD. This can be justified by the fact that this far the most common thing to do. What are your plans to support a topic-based workflow? Far the most common thing to happen is that someone forgets to push completed topics. With this change, aren't those people forced to relinguish their current work because they have to checkout the completed topics to push them? -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 31/38] sha1_file.c: make use of decode_varint()
On Thu, Sep 05, 2013 at 02:19:54AM -0400, Nicolas Pitre wrote: ... replacing the equivalent open coded loop. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- sha1_file.c | 14 +++--- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index a298933..67eb903 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1687,20 +1687,12 @@ static off_t get_delta_base(struct packed_git *p, * is stupid, as then a REF_DELTA would be smaller to store. */ if (type == OBJ_OFS_DELTA) { - unsigned used = 0; - unsigned char c = base_info[used++]; - base_offset = c 127; - while (c 128) { - base_offset += 1; - if (!base_offset || MSB(base_offset, 7)) - return 0; /* overflow */ - c = base_info[used++]; - base_offset = (base_offset 7) + (c 127); - } + const unsigned char *cp = base_info; + base_offset = decode_varint(cp); base_offset = delta_obj_offset - base_offset; if (base_offset = 0 || base_offset = delta_obj_offset) return 0; /* out of bound */ - *curpos += used; + *curpos += cp - base_info; } else if (type == OBJ_REF_DELTA) { /* The base entry _must_ be in the same pack */ base_offset = find_pack_entry_one(base_info, p); -- 1.8.4.38.g317e65b This patch seems to be a cleanup independent from pack v4, it applies cleanly on master and passes all tests in itself. Best, Gábor -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] git p4: implement view spec wildcards with p4 where
2013/9/4 Junio C Hamano gits...@pobox.com: kazuki saitoh ksaitoh...@gmail.com writes: Currently, git p4 does not support many of the view wildcards, such as * and %%n. It only knows the common ... mapping, and exclusions. Redo the entire wildcard code around the idea of directly querying the p4 server for the mapping. For each commit, invoke p4 where with committed file paths as args and use the client mapping to decide where the file goes in git. This simplifies a lot of code, and adds support for all wildcards supported by p4. Downside is that there is probably a 20%-ish slowdown with this approach. [pw: redo code and tests] Signed-off-by: Kazuki Saitoh ksaitoh...@gmail.com Signed-off-by: Pete Wyckoff p...@padd.com This was whitespace-damaged in the message I received, so what is queued is after manual fix-up. Please double check what will be queued on 'pu' later and Ack, and then I'll move it to 'next' and 'master'. Indeed, some line is broken by line wrapping. I use Gmail web interface and didn't read git format-patch --help. Sorry for the inconvenience. I checked 'pu' branch and it looks correct. Acked-by: Kazuki Saitoh ksaitoh...@gmail.com Thanks. --- git-p4.py | 223 +- 1 file changed, 59 insertions(+), 164 deletions(-) diff --git a/git-p4.py b/git-p4.py index 31e71ff..1793e86 100755 --- a/git-p4.py +++ b/git-p4.py @@ -780,11 +780,14 @@ def getClientSpec(): # dictionary of all client parameters entry = specList[0] +# the //client/ name +client_name = entry[Client] + # just the keys that start with View view_keys = [ k for k in entry.keys() if k.startswith(View) ] # hold this new View -view = View() +view = View(client_name) # append the lines, in order, to the view for view_num in range(len(view_keys)): @@ -1555,8 +1558,8 @@ class P4Submit(Command, P4UserMap): for b in body: labelTemplate += \t + b + \n labelTemplate += View:\n -for mapping in clientSpec.mappings: -labelTemplate += \t%s\n % mapping.depot_side.path +for depot_side in clientSpec.mappings: +labelTemplate += \t%s\n % depot_side if self.dry_run: print Would create p4 label %s for tag % name @@ -1568,7 +1571,7 @@ class P4Submit(Command, P4UserMap): # Use the label p4_system([tag, -l, name] + - [%s@%s % (mapping.depot_side.path, changelist) for mapping in clientSpec.mappings]) + [%s@%s % (depot_side, changelist) for depot_side in clientSpec.mappings]) if verbose: print created p4 label for tag %s % name @@ -1796,117 +1799,16 @@ class View(object): Represent a p4 view (p4 help views), and map files in a repo according to the view. -class Path(object): -A depot or client path, possibly containing wildcards. - The only one supported is ... at the end, currently. - Initialize with the full path, with //depot or //client. - -def __init__(self, path, is_depot): -self.path = path -self.is_depot = is_depot -self.find_wildcards() -# remember the prefix bit, useful for relative mappings -m = re.match((//[^/]+/), self.path) -if not m: -die(Path %s does not start with //prefix/ % self.path) -prefix = m.group(1) -if not self.is_depot: -# strip //client/ on client paths -self.path = self.path[len(prefix):] - -def find_wildcards(self): -Make sure wildcards are valid, and set up internal - variables. - -self.ends_triple_dot = False -# There are three wildcards allowed in p4 views -# (see p4 help views). This code knows how to -# handle ... (only at the end), but cannot deal with -# %%n or *. Only check the depot_side, as p4 should -# validate that the client_side matches too. -if re.search(r'%%[1-9]', self.path): -die(Can't handle %%n wildcards in view: %s % self.path) -if self.path.find(*) = 0: -die(Can't handle * wildcards in view: %s % self.path) -triple_dot_index = self.path.find(...) -if triple_dot_index = 0: -if triple_dot_index != len(self.path) - 3: -die(Can handle only single ... wildcard, at end: %s % -self.path) -self.ends_triple_dot = True - -def ensure_compatible(self, other_path): -Make sure the wildcards agree. -if self.ends_triple_dot != other_path.ends_triple_dot: -
Re: [PATCH 0/3] Reject non-ff pulls by default
On Wed, Sep 04, 2013 at 03:59:18PM -0700, Junio C Hamano wrote: Are there cases where you do not want to either rebase nor merge? If so what do you want to do after git pull fetches from the other side? Nothing? One other thing that I can see being useful occasionally is: git rebase @{u}@{1} --onto @{u} which allows local commits to be replayed onto a rewritten upstream branch. Although I agree with your side note below that people doing this may be better off fetching and then updating their local branch, particularly if @{1} is not the correct reflog entry for the upstream when they created the branch. Side note: a knee-jerk response to a yes answer to the last question from me has always been then why are you running 'git pull' in the first place. The next paragraph is my attempt to extend my imagination a bit, stepping outside that reaction. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] submodule: don't print status output with ignore=all
Matthieu Moy matthieu@grenoble-inp.fr writes: Jens Lehmann jens.lehm...@web.de writes: Fine by me, what would you propose to clarify that? (Though I have the suspicion that the explanation will be three years late ;-) I have no idea, as I do not understand the reason myself yet. I'm not a frequent user of submodules and not a user of the ignore option at all, so I can't tell what's best. I'd just like the new behavior to be justified somewhere. I also think that the documentation in Documentation/config.txt could be updated. Perhaps adding something like To view the summary for submodules ignored by 'git status', one can use 'git submodules summary' which shows a similar output but does not honor this option. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] git-gui: Modify push dialog to support Gerrit review
-Original Message- From: Johannes Sixt [mailto:j.s...@viscovery.net] Sent: 5. september 2013 10:57 Please do not top-post. Am 9/5/2013 10:29, schrieb Jørgen Edelbo: -Original Message- From: Johannes Sixt Am 9/2/2013 10:54, schrieb Joergen Edelbo: Changes done: Remove selection of branches to push - push always HEAD. This can be justified by the fact that this far the most common thing to do. What are your plans to support a topic-based workflow? Far the most common thing to happen is that someone forgets to push completed topics. With this change, aren't those people forced to relinguish their current work because they have to checkout the completed topics to push them? I am not quite sure what your concern is. When I have completed topics A and B, but forgot to push them, and now I am working on topic C, how do I push topics A and B? You say I can only push HEAD. I understand this that I have to stop work on C (perhaps commit or stash any unfinished work), then checkout A, push it, checkout B, push it, checkout C and unstash the unfinished work. If my understanding is correct, the new restriction is a no-go. Ok, this way of working is not supported. It just never occurred to me that you would work this way. Forgetting to push something that you have just completed is very far from what I am used to. I think it comes most natural to push what you have done before changing topic. The reason I make a commit is to get it out of the door. -- Hannes - Jørgen -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-gui: Modify push dialog to support Gerrit review
Am 9/5/2013 11:18, schrieb Jørgen Edelbo: Forgetting to push something that you have just completed is very far from what I am used to. Forgetting to push is just one of many reasons why a branch that is not equal to HEAD is not yet pushed... The new restriction is just too tight. -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/5] wt-status: use argv_array API
No behavior change, but two slight code reorganization: argv_array_push doesn't accept NULL strings, and duplicates its argument hence summary_limit must be written to before being inserted into argv. Signed-off-by: Matthieu Moy matthieu@imag.fr Signed-off-by: Junio C Hamano gits...@pobox.com --- wt-status.c | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/wt-status.c b/wt-status.c index cb24f1f..958a53c 100644 --- a/wt-status.c +++ b/wt-status.c @@ -8,6 +8,7 @@ #include diffcore.h #include quote.h #include run-command.h +#include argv-array.h #include remote.h #include refs.h #include submodule.h @@ -663,29 +664,30 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt char summary_limit[64]; char index[PATH_MAX]; const char *env[] = { NULL, NULL }; - const char *argv[8]; - - env[0] =index; - argv[0] = submodule; - argv[1] = summary; - argv[2] = uncommitted ? --files : --cached; - argv[3] = --for-status; - argv[4] = --summary-limit; - argv[5] = summary_limit; - argv[6] = uncommitted ? NULL : (s-amend ? HEAD^ : HEAD); - argv[7] = NULL; + struct argv_array argv = ARGV_ARRAY_INIT; sprintf(summary_limit, %d, s-submodule_summary); snprintf(index, sizeof(index), GIT_INDEX_FILE=%s, s-index_file); + env[0] = index; + argv_array_push(argv, submodule); + argv_array_push(argv, summary); + argv_array_push(argv, uncommitted ? --files : --cached); + argv_array_push(argv, --for-status); + argv_array_push(argv, --summary-limit); + argv_array_push(argv, summary_limit); + if (!uncommitted) + argv_array_push(argv, s-amend ? HEAD^ : HEAD); + memset(sm_summary, 0, sizeof(sm_summary)); - sm_summary.argv = argv; + sm_summary.argv = argv.argv; sm_summary.env = env; sm_summary.git_cmd = 1; sm_summary.no_stdin = 1; fflush(s-fp); sm_summary.out = dup(fileno(s-fp));/* run_command closes it */ run_command(sm_summary); + argv_array_clear(argv); } static void wt_status_print_other(struct wt_status *s, -- 1.8.4.4.g70bf5e8.dirty -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 5/5] tests: don't set status.oldStyle file-wide
The previous commit set status.oldStyle file-wide in t7060-wtstatus.sh and t7508-status.sh to make the patch small. However, now that status.oldStyle is not the default, it is better to disable it in tests so that the most common situation is also the most tested. While we're there, move the cat expect EOF blocks inside the tests. Signed-off-by: Matthieu Moy matthieu@imag.fr --- t/t7060-wtstatus.sh | 113 --- t/t7508-status.sh | 889 ++-- 2 files changed, 493 insertions(+), 509 deletions(-) diff --git a/t/t7060-wtstatus.sh b/t/t7060-wtstatus.sh index ddebd28..7d467c0 100755 --- a/t/t7060-wtstatus.sh +++ b/t/t7060-wtstatus.sh @@ -4,10 +4,6 @@ test_description='basic work tree status reporting' . ./test-lib.sh -test_expect_success 'use status.oldStyle by default ' ' - git config --global status.oldStyle true -' - test_expect_success setup ' git config --global advice.statusuoption false test_commit A @@ -33,20 +29,19 @@ test_expect_success 'Report new path with conflict' ' test_cmp expect actual ' -cat expect EOF -# On branch side -# You have unmerged paths. -# (fix conflicts and run git commit) -# -# Unmerged paths: -# (use git add/rm file... as appropriate to mark resolution) -# -# deleted by us: foo -# +test_expect_success 'M/D conflict does not segfault' ' + cat expect EOF +On branch side +You have unmerged paths. + (fix conflicts and run git commit) + +Unmerged paths: + (use git add/rm file... as appropriate to mark resolution) + + deleted by us: foo + no changes added to commit (use git add and/or git commit -a) EOF - -test_expect_success 'M/D conflict does not segfault' ' mkdir mdconflict ( cd mdconflict @@ -139,19 +134,19 @@ test_expect_success 'status when conflicts with add and rm advice (deleted by th test_commit on_second main.txt on_second test_commit master conflict.txt master test_must_fail git merge second_branch - cat expected -\EOF - # On branch master - # You have unmerged paths. - # (fix conflicts and run git commit) - # - # Unmerged paths: - # (use git add/rm file... as appropriate to mark resolution) - # - # both added: conflict.txt - # deleted by them:main.txt - # - no changes added to commit (use git add and/or git commit -a) - EOF + cat expected \EOF +On branch master +You have unmerged paths. + (fix conflicts and run git commit) + +Unmerged paths: + (use git add/rm file... as appropriate to mark resolution) + + both added: conflict.txt + deleted by them:main.txt + +no changes added to commit (use git add and/or git commit -a) +EOF git status --untracked-files=no actual test_i18ncmp expected actual ' @@ -172,20 +167,20 @@ test_expect_success 'prepare for conflicts' ' test_expect_success 'status when conflicts with add and rm advice (both deleted)' ' test_must_fail git merge conflict - cat expected -\EOF - # On branch conflict_second - # You have unmerged paths. - # (fix conflicts and run git commit) - # - # Unmerged paths: - # (use git add/rm file... as appropriate to mark resolution) - # - # both deleted: main.txt - # added by them: sub_master.txt - # added by us:sub_second.txt - # - no changes added to commit (use git add and/or git commit -a) - EOF + cat expected \EOF +On branch conflict_second +You have unmerged paths. + (fix conflicts and run git commit) + +Unmerged paths: + (use git add/rm file... as appropriate to mark resolution) + + both deleted: main.txt + added by them: sub_master.txt + added by us:sub_second.txt + +no changes added to commit (use git add and/or git commit -a) +EOF git status --untracked-files=no actual test_i18ncmp expected actual ' @@ -196,22 +191,22 @@ test_expect_success 'status when conflicts with only rm advice (both deleted)' ' test_must_fail git merge conflict git add sub_master.txt git add sub_second.txt - cat expected -\EOF - # On branch conflict_second - # You have unmerged paths. - # (fix conflicts and run git commit) - # - # Changes to be committed: - # - # new file: sub_master.txt - # - # Unmerged paths: - # (use git rm file... to mark resolution) - # - # both deleted: main.txt - # - # Untracked files not listed (use -u option to show untracked files) - EOF + cat expected \EOF +On branch conflict_second +You have unmerged paths. + (fix conflicts and run git commit) + +Changes to be committed: + + new file: sub_master.txt +
[PATCH v4 0/5] Disable git status comment prefix
Compared to v2, this changes essentially: * The prefix is actually disabled by default in this version. As a consequence, the option is renamed to status.oldStyle. * Since this is the default, the tests are updated to test the new defaults. In a first patch, I'm setting status.oldStyle=true in test files that require it (to keep the patch short), and the last patch actually updates the test expected results. This was actually useful as I did find (and fix) a few bugs updating the tests: - the --columns option was still showing the comment prefix - git commit --dry-run and failed git commit were still displaying comments to stdout. * The --for-status option is kept as a no-op. Matthieu Moy (5): builtin/stripspace.c: fix broken indentation wt-status: use argv_array API submodule summary: ignore --for-status option status: disable display of '#' comment prefix by default tests: don't set status.oldStyle file-wide Documentation/config.txt | 7 + builtin/commit.c | 10 + builtin/stripspace.c | 8 +- git-submodule.sh | 13 +- t/t3001-ls-files-others-exclude.sh | 2 +- t/t7060-wtstatus.sh| 109 +++-- t/t7401-submodule-summary.sh | 12 +- t/t7508-status.sh | 944 +++-- wt-status.c| 85 +++- wt-status.h| 1 + 10 files changed, 640 insertions(+), 551 deletions(-) -- 1.8.4.4.g70bf5e8.dirty -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Reject non-ff pulls by default
On Thu, Sep 05, 2013 at 07:01:03AM -0400, John Szakmeister wrote: On Wed, Sep 4, 2013 at 6:59 PM, Junio C Hamano gits...@pobox.com wrote: [snip] When git pull stops because what was fetched in FETCH_HEAD does not fast-forward, then what did _you_ do (and with the knowledge you currently have, what would you do)? In a single project, would you choose to sometimes rebase and sometimes merge, and if so, what is the choice depend on? When I am on these selected branches, I want to merge, but on other branches I want to rebase? Our team isn't quite proficient enough yet to have a completely rebase workflow... though we might have less of a problem if we did. So, several interesting points. Most of the time, `git pull` would be a fast-forward merge. We typically perform the merges of topic branches server-side--we have a build server who checks to make sure the result would be successful--and we just hit the big green button on the Merge button for the pull request (we use GitHub Enterprise at the moment). However, nearly as often, we just merge the branch locally because someone on the team is doing some manual testing, and it's just convenient to finish the process on the command line. What occasionally happens is that you merge the topic locally, but someone else has introduced a new commit to master. We try to preserve the mainline ordering of commits, so `git pull` doing a merge underneath the hood is undesirable (it moves the newly introduced commit off to the side). Rebasing your current master branch is not the answer either, because it picks up the commits introduced by the topic branch and rebases those to--at least with the -p option, and without it, the results are just as bad). Instead, we want to unfold our work, fast-forward merge the upstream, and the replay our actions--namely remerge the topic branch. It often ends up translating to this: $ git reset --hard HEAD~1 $ git merge --ff-only @{u} $ git merge topic $ git push So what I really want isn't quite rebase. I'm not sure any of the proposed solutions would work. It'd be really nice to replay only the mainline commits, without affecting commits introduced from a topic branch. Does git rebase --preserve-merges do what you want here? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/5] Disable git status comment prefix
Oops, this series forgot to update t7512-status-help.sh, which now fails. I'll send a reroll that updates it later (patch below). diff --git a/t/t7512-status-help.sh b/t/t7512-status-help.sh index 31a798f..0688d58 100755 --- a/t/t7512-status-help.sh +++ b/t/t7512-status-help.sh @@ -25,18 +25,18 @@ test_expect_success 'prepare for conflicts' ' test_expect_success 'status when conflicts unresolved' ' test_must_fail git merge master - cat expected -\EOF - # On branch conflicts - # You have unmerged paths. - # (fix conflicts and run git commit) - # - # Unmerged paths: - # (use git add file... to mark resolution) - # - # both modified: main.txt - # - no changes added to commit (use git add and/or git commit -a) - EOF + cat expected \EOF +On branch conflicts +You have unmerged paths. + (fix conflicts and run git commit) + +Unmerged paths: + (use git add file... to mark resolution) + + both modified: main.txt + +no changes added to commit (use git add and/or git commit -a) +EOF git status --untracked-files=no actual test_i18ncmp expected actual ' @@ -47,17 +47,17 @@ test_expect_success 'status when conflicts resolved before commit' ' test_must_fail git merge master echo one main.txt git add main.txt - cat expected -\EOF - # On branch conflicts - # All conflicts fixed but you are still merging. - # (use git commit to conclude merge) - # - # Changes to be committed: - # - # modified: main.txt - # - # Untracked files not listed (use -u option to show untracked files) - EOF + cat expected \EOF +On branch conflicts +All conflicts fixed but you are still merging. + (use git commit to conclude merge) + +Changes to be committed: + + modified: main.txt + +Untracked files not listed (use -u option to show untracked files) +EOF git status --untracked-files=no actual test_i18ncmp expected actual ' @@ -76,21 +76,21 @@ test_expect_success 'status when rebase in progress before resolving conflicts' test_when_finished git rebase --abort ONTO=$(git rev-parse --short HEAD^^) test_must_fail git rebase HEAD^ --onto HEAD^^ - cat expected -EOF - # rebase in progress; onto $ONTO - # You are currently rebasing branch '\''rebase_conflicts'\'' on '\''$ONTO'\''. - # (fix conflicts and then run git rebase --continue) - # (use git rebase --skip to skip this patch) - # (use git rebase --abort to check out the original branch) - # - # Unmerged paths: - # (use git reset HEAD file... to unstage) - # (use git add file... to mark resolution) - # - # both modified: main.txt - # - no changes added to commit (use git add and/or git commit -a) - EOF + cat expected EOF +rebase in progress; onto $ONTO +You are currently rebasing branch '\''rebase_conflicts'\'' on '\''$ONTO'\''. + (fix conflicts and then run git rebase --continue) + (use git rebase --skip to skip this patch) + (use git rebase --abort to check out the original branch) + +Unmerged paths: + (use git reset HEAD file... to unstage) + (use git add file... to mark resolution) + + both modified: main.txt + +no changes added to commit (use git add and/or git commit -a) +EOF git status --untracked-files=no actual test_i18ncmp expected actual ' @@ -103,18 +103,18 @@ test_expect_success 'status when rebase in progress before rebase --continue' ' test_must_fail git rebase HEAD^ --onto HEAD^^ echo three main.txt git add main.txt - cat expected -EOF - # rebase in progress; onto $ONTO - # You are currently rebasing branch '\''rebase_conflicts'\'' on '\''$ONTO'\''. - # (all conflicts fixed: run git rebase --continue) - # - # Changes to be committed: - # (use git reset HEAD file... to unstage) - # - # modified: main.txt - # - # Untracked files not listed (use -u option to show untracked files) - EOF + cat expected EOF +rebase in progress; onto $ONTO +You are currently rebasing branch '\''rebase_conflicts'\'' on '\''$ONTO'\''. + (all conflicts fixed: run git rebase --continue) + +Changes to be committed: + (use git reset HEAD file... to unstage) + + modified: main.txt + +Untracked files not listed (use -u option to show untracked files) +EOF git status --untracked-files=no actual test_i18ncmp expected actual ' @@ -135,21 +135,21 @@ test_expect_success 'status during rebase -i when conflicts unresolved' ' test_when_finished git rebase --abort ONTO=$(git rev-parse --short rebase_i_conflicts) test_must_fail git rebase -i rebase_i_conflicts - cat expected -EOF - # rebase
Re: [PATCH v4 06/11] replace: bypass the type check if -f option is used
On Wed, Sep 4, 2013 at 10:44 PM, Junio C Hamano gits...@pobox.com wrote: Christian Couder chrisc...@tuxfamily.org writes: If -f option, which means '--force', is used, we can allow an object to be replaced with one of a different type, as the user should know what (s)he is doing. Signed-off-by: Christian Couder chrisc...@tuxfamily.org --- This does not matter in a larger picture, but between 1/11 and this patch, there is a window where an operation that has been useful in some workflows becomes unavailable to the user. For future reference, it would be better to do this as a part of 1/11, to make sure that there always is an escape hatch available to the users. Ok, I will squash patchs 6/11, 7/11 and 8/11 with 1/11, 2/11 and 3/11 respectively. Thanks, Christian. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 10/11] Documentation/replace: list long option names
On Wed, Sep 4, 2013 at 10:45 PM, Junio C Hamano gits...@pobox.com wrote: Christian Couder chrisc...@tuxfamily.org writes: Signed-off-by: Christian Couder chrisc...@tuxfamily.org --- Documentation/git-replace.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt index a2bd2ee..414000e 100644 --- a/Documentation/git-replace.txt +++ b/Documentation/git-replace.txt @@ -54,13 +54,16 @@ achieve the same effect as the `--no-replace-objects` option. OPTIONS --- -f:: +--force:: If an existing replace ref for the same object exists, it will be overwritten (instead of failing). -d:: +--delete:: Delete existing replace refs for the given objects. -l pattern:: +--list pattern:: List replace refs for objects that match the given pattern (or all if no pattern is given). Typing git replace without arguments, also lists all replace This should be in the same commit as what adds them. Ok, I will squash 10/11 into 9/11. Thanks, Christian. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Reject non-ff pulls by default
Junio C Hamano gits...@pobox.com writes: Peff already covered (1)---it is highly doubtful that a merge is almost always wrong. In fact, if that _were_ the case, we should simply be defaulting to rebase, not failing the command and asking between merge and rebase like jc/pull-training-wheel topic did. We simply do not know what the user wants, as it heavily depends on the project, so we ask the user to choose one (and stick to it). From my experience leading the first large project using git at BBN, evolving a workflow (most work on topic branches, which are rebased, banning 'git pull'-created merge commits, and explicit merge commits to preserve --first-parent, basically), and seeing many people struggle to learn all this, my take is that a user who does not understand non-ff merge vs ff-merge vs rebase will end up doing the wrong thing. So two thoughts: In the glorious future, perhaps git could have a way to express a machine-parseable representation of the workflow and rules for a repo, so that these choices could be made accordingly. In the current world, I think it makes sense to error out when there are multiple reasonable choices depending on workflow. One of my team members, Richard Hansen, has argued to us that 'git pull' is harmful, essentially because it creates non-ff merges sometimes, while our rules say those should be rebased out. So we use [alias] up = !git remote update -p git merge --ff-only @{u} which acts like pull if ff merge works, and otherwise errors out. I think the key question is: can a user who doesn't really understand the implications of ff vs non-ff merges and the local workflow rules actually function ok, or do they need to stop and go back and understand. I'm in the you just have to take the time to understand camp, which led to us having a semi-custom syllabus from github training covering rebase, explicit vs ff merges and the consequences for first-parent history, etc. Therefore, I think git pull should do the update (perhaps of just the remote corresponding to @{u}, perhaps without -p) and a --ff-only merge, absent a configuration asking for non-ff merge or rebase. (Arguably, an ff merge is a degenerate case of rebase and also of the ff/non-ff merge, so it's safe with either policy.) Greg pgpT4Be5FrhuZ.pgp Description: PGP signature
Zero padded file modes...
I went to clone a repository from GitHub today and discovered something interesting: :: git clone https://github.com/liebke/incanter.git Cloning into 'incanter'... remote: Counting objects: 10457, done. remote: Compressing objects: 100% (3018/3018), done. error: object 4946e1ba09ba5655202a7a5d81ae106b08411061:contains zero-padded file modes fatal: Error in object fatal: index-pack failed At first, it surprised me that no one has seen the issue before, but then I remembered I have transfer.fsckObjects=true in my config. Turning it off, I was able to clone. Running `git fsck` I see: :: git fsck Checking object directories: 100% (256/256), done. warning in tree 4946e1ba09ba5655202a7a5d81ae106b08411061: contains zero-padded file modes warning in tree 553c5e006e53a8360126f053c3ade3d1d063c2f5: contains zero-padded file modes warning in tree 0a2e7f55d7f8e1fa5469e6d83ff20365881eed1a: contains zero-padded file modes Checking objects: 100% (10560/10560), done. So there appears to be several instances of the issue in the tree. Looking in the archives, I ran across this thread: http://comments.gmane.org/gmane.comp.version-control.git/143288 In there, Nicolas Pitre says: This is going to screw up pack v4 (yes, someday I'll have the time to make it real). I don't know if this is still true, but given that patches are being sent out about it, I thought it relevant. Also, searching on the issue, you'll find that a number of repositories have suffered from this problem, and it appears the only fix will result in different commit ids. Given all of that, should Git be updated to cope with zero padded modes? Or is there no some way of fixing the issue that doesn't involve changing commit ids? Thanks! -John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] cherry-pick: allow - as abbreviation of '@{-1}'
- abbreviation is handy for cherry-pick like checkout and merge. It's also good for uniformity that a - stands as the name of the previous branch where a branch name is accepted and it could not mean any other things like stdin. Signed-off-by: Hiroshige Umino hiroshig...@gmail.com --- builtin/revert.c | 2 ++ t/t3500-cherry.sh | 15 +++ 2 files changed, 17 insertions(+) diff --git a/builtin/revert.c b/builtin/revert.c index 8e87acd..52c35e7 100644 --- a/builtin/revert.c +++ b/builtin/revert.c @@ -202,6 +202,8 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix) memset(opts, 0, sizeof(opts)); opts.action = REPLAY_PICK; git_config(git_default_config, NULL); + if (!strcmp(argv[1], -)) + argv[1] = @{-1}; parse_args(argc, argv, opts); res = sequencer_pick_revisions(opts); if (res 0) diff --git a/t/t3500-cherry.sh b/t/t3500-cherry.sh index f038f34..547dbf8 100755 --- a/t/t3500-cherry.sh +++ b/t/t3500-cherry.sh @@ -55,4 +55,19 @@ test_expect_success \ expr $(echo $(git cherry master my-topic-branch) ) : + [^ ]* - .* ' +test_expect_success \ +'cherry-pick - does not work initially' \ +'test_must_fail git cherry-pick - +' + +test_expect_success \ +'cherry-pick the commit in the previous branch' \ +'git branch other + test_commit commit-to-pick newfile content + echo content expected + git checkout other + git cherry-pick - + test_cmp expected newfile +' + test_done -- 1.8.3.4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Reject non-ff pulls by default
On 2013-09-04 18:59, Junio C Hamano wrote: Philip Oakley philipoak...@iee.org writes: From: Junio C Hamano gits...@pobox.com John Keeping j...@keeping.me.uk writes: I think there are two distinct uses for pull, which boil down to: (1) git pull ... Peff already covered (1)---it is highly doubtful that a merge is almost always wrong. In fact, if that _were_ the case, we should simply be defaulting to rebase, not failing the command and asking between merge and rebase like jc/pull-training-wheel topic did. We simply do not know what the user wants, as it heavily depends on the project, so we ask the user to choose one (and stick to it). We only offer a limited list. It won't be sufficient for all use cases. It wasn't for me. Very interesting. Tell us more. I'm a bit late to the discussion, but I wanted to chime in. I detest 'git pull' and discourage everyone I meet from using it. See: http://stackoverflow.com/questions/15316601/why-is-git-pull-considered-harmful for my reasons. Instead, I encourage people to do this: git config --global alias.up '!git remote update -p; git merge --ff-only @{u}' and tell them to run 'git up' whenever they would be tempted to use a plain 'git pull'. I usually work with a central repository with topic branches. I follow this rule of thumb: * When merging a same-named branch (e.g., origin/foo into foo, foo into origin/foo), it should always be a fast-forward. This may require rebasing. * When merging a differently-named branch (e.g., feature.xyz into master), it should never be a fast-forward. In distributed workflows, I think of 'git pull collaborator-repo their-branch' as merging a differently-named branch (I wouldn't be merging if they hadn't told me that a separate feature they were working on is complete), so I generally want the merge commit. But when I do a 'git pull' without extra arguments, I'm updating a same-named branch so I never want a merge. When merging a differently-named branch, I prefer the merge --no-ff to be preceded by a rebase to get a nice, pretty graph: * merge feature.xyz - master |\ | * xyz part 3/3 | * xyz part 2/3 | * xyz part 1/3 |/ * merge feature.foo |\ | * foo part 2/2 | * foo part 1/2 |/ * merge feature.bar |\ ... The explicit merge has several benefits: * It clearly communicates to others that the feature is done. * It makes it easier to revert the entire feature by reverting the merge if necessary. * It allows our continuous integration tool to skip over the work-in-progress commits and test only complete features. * It makes it easier to review the entire feature in one diff. * 'git log --first-parent' shows a high-level summary of the changes over time, while a normal 'git log' shows the details. When git pull stops because what was fetched in FETCH_HEAD does not fast-forward, then what did _you_ do (and with the knowledge you currently have, what would you do)? I stop and review what's going on, then make a decision: * usually it's a rebase * sometimes it's a rebase --onto (because the branch was force-updated to undo a particularly bad commit) * sometimes it's a rebase -p (because there's an explicit merge of a different branch that I want to keep) * sometimes it's a reset --hard (my changes were made obsolete by a different upstream change) * sometimes it's a merge * sometimes I do nothing. This is a fairly regular pattern: I'm in the middle of working on something that I know will conflict with some changes that were just pushed upstream, and I want to finish my changes before starting the rebase. My collaborator contacts me and asks, Would you take a look at the changes I just pushed? If I type 'git pull' out of habit to get the commits, then I'll make a mess of my work-in-progress work tree. If I type 'git up' out of habit, then the merge --ff-only will fail as expected and I can quickly review the commits without messing with my work tree or HEAD. Even if I always rebase or always merge, I want to briefly review what changed in the remote branch *before* I start the rebase. This helps me understand the conflicts I might encounter. Thus, ff-only always works for me. I might have to type a second merge or rebase command, but that's OK -- it gives me an opportunity to think about what I want first. Non-ff merges are rare enough that the interruption isn't annoying at all. In a single project, would you choose to sometimes rebase and sometimes merge, and if so, what is the choice depend on? When I am on these selected branches, I want to merge, but on other branches I want to rebase? My choice depends on the circumstances of the divergence. It's never as simple as branch X always has this policy while branch Y has that policy. Are there cases where you do not want to either
[PATCH] Documentation/git-merge.txt: fix formatting of example block
You need at least four dashes in a line to have it recognized as listing block delimiter by asciidoc. Signed-off-by: Andreas Schwab sch...@linux-m68k.org --- Documentation/git-merge.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt index 8c7f2f6..a74c371 100644 --- a/Documentation/git-merge.txt +++ b/Documentation/git-merge.txt @@ -186,11 +186,11 @@ In such a case, you can unwrap the tag yourself before feeding it to `git merge`, or pass `--ff-only` when you do not have any work on your own. e.g. + git fetch origin git merge v1.2.3^0 git merge --ff-only v1.2.3 + HOW CONFLICTS ARE PRESENTED -- 1.8.4 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, Sep 05, 2013 at 10:00:39AM -0400, John Szakmeister wrote: I went to clone a repository from GitHub today and discovered something interesting: :: git clone https://github.com/liebke/incanter.git Cloning into 'incanter'... remote: Counting objects: 10457, done. remote: Compressing objects: 100% (3018/3018), done. error: object 4946e1ba09ba5655202a7a5d81ae106b08411061:contains zero-padded file modes fatal: Error in object fatal: index-pack failed Yep. These were mostly caused by a bug in Grit that is long-fixed. But the objects remain in many histories. It would have painful to rewrite them back then, and it would be even more painful now. This is going to screw up pack v4 (yes, someday I'll have the time to make it real). I don't know if this is still true, but given that patches are being sent out about it, I thought it relevant. I haven't looked carefully at the pack v4 patches yet, but I suspect that yes, it's still a problem. The premise of pack v4 is that we can do better by not storing the raw git object bytes, but rather storing specialized representations of the various components. For example, by using an integer to store the mode rather than the ascii representation. But that representation does not represent the oops, I have a 0-padded mode quirk. And we have to be able to recover the original object, byte for byte, from the v4 representation (to verify sha1, or to generate a loose object or v2 pack). There are basically two solutions: 1. Add a single-bit flag for I am 0-padded in the real data. We could probably even squeeze it into the same integer. 2. Have a classic section of the pack that stores the raw object bytes. For objects which do not match our expectations, store them raw instead of in v4 format. They will not get the benefit of v4 optimizations, but if they are the minority of objects, that will only end up with a slight slow-down. As I said, I have not looked carefully at the v4 patches, so maybe they handle this case already. But of the two solutions, I prefer (2). Doing (1) can solve _this_ problem, but it complicates the format, and does nothing for any future compatibility issues. Whereas (2) is easy to implement, since it is basically just pack v2 (and implementations would need a pack v2 reader anyway). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, Sep 5, 2013 at 10:36 PM, Jeff King p...@peff.net wrote: This is going to screw up pack v4 (yes, someday I'll have the time to make it real). I don't know if this is still true, but given that patches are being sent out about it, I thought it relevant. I haven't looked carefully at the pack v4 patches yet, but I suspect that yes, it's still a problem. The premise of pack v4 is that we can do better by not storing the raw git object bytes, but rather storing specialized representations of the various components. For example, by using an integer to store the mode rather than the ascii representation. But that representation does not represent the oops, I have a 0-padded mode quirk. And we have to be able to recover the original object, byte for byte, from the v4 representation (to verify sha1, or to generate a loose object or v2 pack). There are basically two solutions: 1. Add a single-bit flag for I am 0-padded in the real data. We could probably even squeeze it into the same integer. 2. Have a classic section of the pack that stores the raw object bytes. For objects which do not match our expectations, store them raw instead of in v4 format. They will not get the benefit of v4 optimizations, but if they are the minority of objects, that will only end up with a slight slow-down. 3. Detect this situation and fall back to v2. 4. Update v4 to allow storing raw tree entries mixing with v4-encoded tree entries. This is something between (1) and (2) As I said, I have not looked carefully at the v4 patches, so maybe they handle this case already. But of the two solutions, I prefer (2). Doing (1) can solve _this_ problem, but it complicates the format, and does nothing for any future compatibility issues. Whereas (2) is easy to implement, since it is basically just pack v2 (and implementations would need a pack v2 reader anyway). I think (4) fits better in v4 design and probably not hard to do. Nico recently added a code to embed a tree entry inline, but the mode must be encoded (and can't contain leading zeros). We could have another code to store mode in ascii. This also makes me wonder if we might have similar problems with timezones, which are also specially encoded in v4.. (3) is probably easiest. We need to scan through all tree entries first when creating v4 anyway. If we detect any anomalies, just switch back to v2 generation. The user will be force to rewrite history in order to take full advantage of v4 (they can have a pack of weird trees in v2 and the rest in v4 pack, but that's not optimal). -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3] check-ignore: Add option to ignore index contents
check-ignore currently shows how .gitignore rules would treat untracked paths. Tracked paths do not generate useful output. This prevents debugging of why a path became tracked unexpectedly unless that path is first removed from the index with `git rm --cached path`. This option (-i, --no-index) simply by-passes the check for the path being in the index and hence allows tracked paths to be checked too. Whilst this behaviour deviates from the characteristics of `git add` and `git status` its use case is unlikely to cause any user confusion. Test scripts are augmented to check this option against the standard ignores to ensure correct behaviour. Signed-off-by: Dave Williams d...@opensourcesolutions.co.uk --- In V3 I have taken on board comments from Junio Hamano and Eric Sunshine from V2 ($gmane/233660). In particlar Junio queried my approach in builtin/git-check-ignores.c that bypassed the functions that check a path is in the index as well as avoiding reading the index in the first place. In my view removing the bypass makes the code slightly less clear, relies on the_index being initialized and the functions using it to exit quickly when it has no content. Nevertheless I have bowed to his better domain knowledge and after undertaking brief analysis to check the assumptions I have removed the extra conditional steps. This has simplified the patch. The revised code appears to have the same behaviour as before and passes the test script (t/t9-ignors.sh). Regarding the test script I have tidied up the variables containing the separate option switches so they dont contain leading spaces, instead I have added spaces as needed when formatting the command line. This not only improves my patch but also the existing code which was a little inconsistent in this respect. Finally I have rebased from the latest commmit on master to pick up unrelated recent changes made to builtin/check-ignores.c and updated my code to be consistent with this. Hopefully I have put these patch notes in the right place now! Let me know if not. Dave Documentation/git-check-ignore.txt | 7 builtin/check-ignore.c | 6 ++- t/t0008-ignores.sh | 75 +- 3 files changed, 77 insertions(+), 11 deletions(-) diff --git a/Documentation/git-check-ignore.txt b/Documentation/git-check-ignore.txt index d2df487..96c591f 100644 --- a/Documentation/git-check-ignore.txt +++ b/Documentation/git-check-ignore.txt @@ -45,6 +45,13 @@ OPTIONS not be possible to distinguish between paths which match a pattern and those which don't. +-i, --no-index:: + Don't look in the index when undertaking the checks. This can + be used to debug why a path became tracked by e.g. `git add .` + and was not ignored by the rules as expected by the user or when + developing patterns including negation to match a path previously + added with `git add -f`. + OUTPUT -- diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c index 25aa2a5..f58f384 100644 --- a/builtin/check-ignore.c +++ b/builtin/check-ignore.c @@ -5,7 +5,7 @@ #include pathspec.h #include parse-options.h -static int quiet, verbose, stdin_paths, show_non_matching; +static int quiet, verbose, stdin_paths, show_non_matching, no_index; static const char * const check_ignore_usage[] = { git check-ignore [options] pathname..., git check-ignore [options] --stdin list-of-paths, @@ -24,6 +24,8 @@ static const struct option check_ignore_options[] = { N_(terminate input and output records by a NUL character)), OPT_BOOL('n', non-matching, show_non_matching, N_(show non-matching input paths)), + OPT_BOOL('i', no-index, no_index, +N_(ignore index when checking)), OPT_END() }; @@ -157,7 +159,7 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix) die(_(--non-matching is only valid with --verbose)); /* read_cache() is only necessary so we can watch out for submodules. */ - if (read_cache() 0) + if (!no_index read_cache() 0) die(_(index file corrupt)); memset(dir, 0, sizeof(dir)); diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh index c29342d..760c7bf 100755 --- a/t/t0008-ignores.sh +++ b/t/t0008-ignores.sh @@ -66,11 +66,11 @@ test_check_ignore () { init_vars rm -f $HOME/stdout $HOME/stderr $HOME/cmd - echo git $global_args check-ignore $quiet_opt $verbose_opt $non_matching_opt $args \ + echo git $global_args check-ignore $quiet_opt $verbose_opt $non_matching_opt $no_index_opt $args \ $HOME/cmd echo $expect_code $HOME/expected-exit-code test_expect_code $expect_code \ - git $global_args check-ignore $quiet_opt $verbose_opt $non_matching_opt $args \ + git $global_args check-ignore $quiet_opt $verbose_opt
Re: Zero padded file modes...
On 09/05/2013 11:36 AM, Jeff King wrote: [...] I haven't looked carefully at the pack v4 patches yet, but I suspect that yes, it's still a problem. The premise of pack v4 is that we can do better by not storing the raw git object bytes, but rather storing specialized representations of the various components. For example, by using an integer to store the mode rather than the ascii representation. But that representation does not represent the oops, I have a 0-padded mode quirk. And we have to be able to recover the original object, byte for byte, from the v4 representation (to verify sha1, or to generate a loose object or v2 pack). There are basically two solutions: 1. Add a single-bit flag for I am 0-padded in the real data. We could probably even squeeze it into the same integer. 2. Have a classic section of the pack that stores the raw object bytes. For objects which do not match our expectations, store them raw instead of in v4 format. They will not get the benefit of v4 optimizations, but if they are the minority of objects, that will only end up with a slight slow-down. As I said, I have not looked carefully at the v4 patches, so maybe they handle this case already. But of the two solutions, I prefer (2). Doing (1) can solve _this_ problem, but it complicates the format, and does nothing for any future compatibility issues. Whereas (2) is easy to implement, since it is basically just pack v2 (and implementations would need a pack v2 reader anyway). 3. Keep those objects in v2 packs instead of the v4 pack. Transfers would have to be v3 or multi-pack transfers would need to be supported. 4. Don't use v4 packs with projects that have crufty objects. Projects with such objects may choose to pay the cost to upgrade to v4 compatibility. There's nothing that requires the next pack format to support all of the broken stuff that's happened over the years. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, Sep 05, 2013 at 11:18:24PM +0700, Nguyen Thai Ngoc Duy wrote: There are basically two solutions: 1. Add a single-bit flag for I am 0-padded in the real data. We could probably even squeeze it into the same integer. 2. Have a classic section of the pack that stores the raw object bytes. For objects which do not match our expectations, store them raw instead of in v4 format. They will not get the benefit of v4 optimizations, but if they are the minority of objects, that will only end up with a slight slow-down. 3. Detect this situation and fall back to v2. 4. Update v4 to allow storing raw tree entries mixing with v4-encoded tree entries. This is something between (1) and (2) I wouldn't want to do (3). At some point pack v4 may become the standard format, but there will be some repositories which will never be allowed to adopt it. For (4), yes, that could work. But like (1), it only solves problems in tree entries. What happens if we have a quirky commit object that needs the same treatment (e.g., a timezone that does not fit into the commit name dictionary properly)? I think (4) fits better in v4 design and probably not hard to do. Nico recently added a code to embed a tree entry inline, but the mode must be encoded (and can't contain leading zeros). We could have another code to store mode in ascii. This also makes me wonder if we might have similar problems with timezones, which are also specially encoded in v4.. Yeah, that might be more elegant. (3) is probably easiest. We need to scan through all tree entries first when creating v4 anyway. If we detect any anomalies, just switch back to v2 generation. The user will be force to rewrite history in order to take full advantage of v4 (they can have a pack of weird trees in v2 and the rest in v4 pack, but that's not optimal). Splitting across two packs isn't great, though. What if v4 eventually becomes the normal on-the-wire format? I'd rather have some method for just embedding what are essentially v2 objects into the v4 pack, which would give us future room for handling these sorts of things. But like I said, I haven't looked closely yet, so maybe there are complications with that. In the meantime, I'll defer to the judgement of people who know what they are talking about. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, Sep 5, 2013 at 11:36 AM, Jeff King p...@peff.net wrote: On Thu, Sep 05, 2013 at 10:00:39AM -0400, John Szakmeister wrote: I went to clone a repository from GitHub today and discovered something interesting: :: git clone https://github.com/liebke/incanter.git Cloning into 'incanter'... remote: Counting objects: 10457, done. remote: Compressing objects: 100% (3018/3018), done. error: object 4946e1ba09ba5655202a7a5d81ae106b08411061:contains zero-padded file modes fatal: Error in object fatal: index-pack failed Yep. These were mostly caused by a bug in Grit that is long-fixed. But the objects remain in many histories. It would have painful to rewrite them back then, and it would be even more painful now. I guess there's still the other side of the question though. Are these repositories busted in the sense that something no longer works? I doesn't appear to be the case, but I've not used it extensively say I can't say for certain one way or another. In the sense that the content is not strictly compliant, transfer.fsckObjects did its job, but I wonder if fsck needs to be a little more tolerant now (at least with respect to transfer objects)? I can certainly cope with the issue--it's not a problem for me to flip the flag on the command line. I think it'd be nice to have transer.fsckObjects be the default at some point, considering how little people run fsck otherwise and how long these sorts of issues go undiscovered. Issues like the above seem to stand in the way of that happening though. -John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, 5 Sep 2013, Duy Nguyen wrote: On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote: Now the pack index v3 probably needs to be improved a little, again to accommodate completion of thin packs. Given that the main SHA1 table is now in the main pack file, it should be possible to still carry a small SHA1 table in the index file that corresponds to the appended objects only. This means that a SHA1 search will have to first use the main SHA1 table in the pack file as it is done now, and if not found then use the SHA1 table in the index file if it exists. And of course nth_packed_object_sha1() will have to be adjusted accordingly. What if the sender prepares the sha-1 table to contain missing objects in advance? The sender should know what base objects are missing. Then we only need to append objects at the receiving end and verify that all new objects are also present in the sha-1 table. I do like this idea very much. And that doesn't increase the thin pack size as the larger SHA1 table will be compensated by a smaller sha1ref encoding in those objects referring to the missing ones. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, 5 Sep 2013, Jeff King wrote: There are basically two solutions: 1. Add a single-bit flag for I am 0-padded in the real data. We could probably even squeeze it into the same integer. 2. Have a classic section of the pack that stores the raw object bytes. For objects which do not match our expectations, store them raw instead of in v4 format. They will not get the benefit of v4 optimizations, but if they are the minority of objects, that will only end up with a slight slow-down. That is basically what I just suggested. But instead of a special section, simply using a special object type number would do it. I'm even wondering if that couldn't be used for fixing a thin pack instead of the special provision I just added last night. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote: Now the pack index v3 probably needs to be improved a little, again to accommodate completion of thin packs. Given that the main SHA1 table is now in the main pack file, it should be possible to still carry a small SHA1 table in the index file that corresponds to the appended objects only. This means that a SHA1 search will have to first use the main SHA1 table in the pack file as it is done now, and if not found then use the SHA1 table in the index file if it exists. And of course nth_packed_object_sha1() will have to be adjusted accordingly. What if the sender prepares the sha-1 table to contain missing objects in advance? The sender should know what base objects are missing. Then we only need to append objects at the receiving end and verify that all new objects are also present in the sha-1 table. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, 5 Sep 2013, Jeff King wrote: On Thu, Sep 05, 2013 at 11:18:24PM +0700, Nguyen Thai Ngoc Duy wrote: There are basically two solutions: 1. Add a single-bit flag for I am 0-padded in the real data. We could probably even squeeze it into the same integer. 2. Have a classic section of the pack that stores the raw object bytes. For objects which do not match our expectations, store them raw instead of in v4 format. They will not get the benefit of v4 optimizations, but if they are the minority of objects, that will only end up with a slight slow-down. 3. Detect this situation and fall back to v2. 4. Update v4 to allow storing raw tree entries mixing with v4-encoded tree entries. This is something between (1) and (2) I wouldn't want to do (3). At some point pack v4 may become the standard format, but there will be some repositories which will never be allowed to adopt it. For (4), yes, that could work. But like (1), it only solves problems in tree entries. What happens if we have a quirky commit object that needs the same treatment (e.g., a timezone that does not fit into the commit name dictionary properly)? I think (4) fits better in v4 design and probably not hard to do. Nico recently added a code to embed a tree entry inline, but the mode must be encoded (and can't contain leading zeros). We could have another code to store mode in ascii. This also makes me wonder if we might have similar problems with timezones, which are also specially encoded in v4.. Yeah, that might be more elegant. (3) is probably easiest. We need to scan through all tree entries first when creating v4 anyway. If we detect any anomalies, just switch back to v2 generation. The user will be force to rewrite history in order to take full advantage of v4 (they can have a pack of weird trees in v2 and the rest in v4 pack, but that's not optimal). Splitting across two packs isn't great, though. What if v4 eventually becomes the normal on-the-wire format? I'd rather have some method for just embedding what are essentially v2 objects into the v4 pack, which would give us future room for handling these sorts of things. But like I said, I haven't looked closely yet, so maybe there are complications with that. In the meantime, I'll defer to the judgement of people who know what they are talking about. :) None of the above is particularly appealing to me. Pack v4 has to enforce some standardization in the object encoding to be efficient. Some compromizes have been applied to accommodate the fixing of a thin pack, although I was initially tempted to simply dodge the issue and allow thin packs in a repository. On this particular mode issue, I remember making a fuss at the time when this was discovered because the github implementation did generate such tree objects at the time. So instead of compromizing the pack v4 object encoding further, I'd simply suggest adding a special object type which is in fact simply the pack v2 representation i.e. the canonical object version, deflated. Right now pack v4 encodes only 5 object types: commit, tree, blob, delta and tag. Only the commit and tree objects have their representation transcoded. So that means we only need to add native_commit and native_tree object types. Then, anything that doesn't fit the strict expectation for transcoding a tree or a commit object is simply included as is without transcoding just like in pack v2. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/38] pack v4: add commit object parsing
On Thu, 5 Sep 2013, SZEDER Gábor wrote: Hi, On Thu, Sep 05, 2013 at 02:19:28AM -0400, Nicolas Pitre wrote: Let's create another dictionary table to hold the author and committer entries. We use the same table format used for tree entries where the 16 bit data prefix is conveniently used to store the timezone value. In order to copy straight from a commit object buffer, dict_add_entry() is modified to get the string length as the provided string pointer is not always be null terminated. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- packv4-create.c | 98 +++-- 1 file changed, 89 insertions(+), 9 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index eccd9fc..5c08871 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -1,5 +1,5 @@ /* - * packv4-create.c: management of dictionary tables used in pack v4 + * packv4-create.c: creation of dictionary tables and objects used in pack v4 * * (C) Nicolas Pitre n...@fluxnic.net * @@ -80,9 +80,9 @@ static void rehash_entries(struct dict_table *t) } } -int dict_add_entry(struct dict_table *t, int val, const char *str) +int dict_add_entry(struct dict_table *t, int val, const char *str, int str_len) { - int i, val_len = 2, str_len = strlen(str) + 1; + int i, val_len = 2; if (t-ptr + val_len + str_len t-size) { We need a +1 here on the left side, i.e. if (t-ptr + val_len + str_len + 1 t-size) { Absolutely, good catch. Sidenote: couldn't we call the 'ptr' field something else, like end_offset or end_idx? It took me some headscratching to figure out why is it OK to compare a pointer to an integer above, or use a pointer without dereferencing as an index into an array below (because ptr is, well, not a pointer after all). Indeed. This is a remnant of an earlier implementation which didn't use realloc() and therefore this used to be a real pointer. Both issues now addressed in my tree. Thanks Nicolas
Re: [PATCH] git-gui: Modify push dialog to support Gerrit review
Jørgen Edelbo j...@napatech.com writes: You say I can only push HEAD. I understand this that I have to stop work on C (perhaps commit or stash any unfinished work), then checkout A, push it, checkout B, push it, checkout C and unstash the unfinished work. If my understanding is correct, the new restriction is a no-go. Ok, this way of working is not supported. It just never occurred to me that you would work this way. Forgetting to push something that you have just completed is very far from what I am used to. I think it comes most natural to push what you have done before changing topic. The reason I make a commit is to get it out of the door. People work in very different ways, and mine and yours are extreme opposites. I almost never push out a commit that is just off the press, I use topic branches heavily and work on multiple topics (either related or unrelated) in parallel all the time, so it is very usual for me to push out more than one branches with a single push---by definition, if we support pushing only the current branch out, working on more than one topics and after perfecting these, push them together cannot be done. If one sets push.default to something other than the matching, and does not have any remote.*.push refspec in the configuration, J6t's I forgot to push while I was on that branch and my I deliberately did not push those branches out while I was on them, but now I know I am ready to push them out cannot be done without explicit refspecs on the command line. But stepping back a bit, I think this I commit because I want to get it out the door is coming from the same desire that led to a possible design mistake that made various push.default settings push only the current branch out. The possible design mistake that has been disturbing me is the following. The push.default setting controls two unrelated things, and that is one too many. Between the matching modes and the rest, it tells what branch(es) are pushed out (either all the branches with the same name or the current branch). That is one thing, and I tend to agree that push only the current branch would be a sane default many people would prefer (and that is the reason we made it the default for Git 2.0). Among the various non matching modes, it also determines where a branch that is pushed out would go (current pushes to the same name, upstream pushes to @{upstream}, etc.). But this once I choose what branch to push out, the branch being pushed out knows where it wants to go, does not take effect if one explicitly names what to push, e.g. in this sequence where one forgets to push 'frotz' out: j6t$ git checkout frotz ... work work work ... j6t$ git checkout xyzzy ... work work work ... ... realize 'frotz' is done j6t$ git push origin frotz The last push may work differently from the push in this sequence: j6t$ git checkout frotz ... work work work ... j6t$ git push ;# or git push origin In the latter sequence, the niceties of push.upstream to push 'frotz' out to the frotz@{upstream} (and your git push origin refs/heads/frotz:refs/for/frotz mapping done in git-gui) will take effect, but in the former, the refspec frotz on the command line is taken as a short-hand for frotz:frotz. We may want to teach git push that the command line refspec that names a local branch may not be pushing to the same name but wants to go through the same mapping used when git push is done while the branch is checked out, with some mechanism. It is a separate but very related topic, I think. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Must Read
i have a business proposal for you, write me back for more info. -Sent from my ipad. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-gui: Modify push dialog to support Gerrit review
Joergen Edelbo j...@napatech.com writes: +proc get_remote_branch {} { + global push_branchtype push_branch push_new + set branch {} + switch -- $push_branchtype { + existing { set branch $push_branch } + create { set branch $push_new } + } + return $branch +} + +proc get_remote_ref_spec {} { + global gerrit_review + set push_branch [get_remote_branch] + if {$gerrit_review} { + return refs/for/$push_branch + } else { + return refs/heads/$push_branch } +} I am puzzled. This may be fine for those who use Git-GUI and nothing else to push, but will not help whose who use both Git-GUI and the command line. Isn't the right way to improve the situation to let the command line tools know how the user wants to push things out and just have Git-GUI delegate the choice to the underlying git push? For example, if you are working on a topic 'frotz', and if the location you push is managed by Gerrit, isn't it the case that you always want to push it to refs/for/frotz, whether you are pushing via Git-GUI or from the command line? I think we discussed during 1.8.4 cycle a configuration like this: [branch frotz] push = refs/heads/frotz:refs/for/frotz as part of the triangular workflow topic that allows you to specify that when 'frotz' is pushed out, it goes to refs/for/frotz, or something like that, but I do not recall what came out of that. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation/git-merge.txt: fix formatting of example block
Thanks. It seems that this was broken from the beginning at 77c72780 (Documentation: merging a tag is a special case, 2013-03-21). Will apply. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] pathspec: catch prepending :(prefix) on pathspec with short magic
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: :(prefix) is in the long form. Suppose people pass :!foo with '!' being the short form of magic 'bar', the code will happily turn it to :(prefix..)!foo, which makes '!' part of the path and no longer a magic. The correct form must be ':(prefix..,bar)foo', but as so far we haven't had any magic in short form yet (*), the code to convert from short form to long one will be inactive anyway. Let's postpone it until a real short form magic appears. (*) The short form magic '/' is a special case and won't be caught by this die(), which is correct. When '/' magic is detected, prefixlen is set back to 0 and the whole if (prefixlen..) block is skipped. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- fixes on top of nd/magic-pathspec. pathspec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/pathspec.c b/pathspec.c index d9f4143..62fde50 100644 --- a/pathspec.c +++ b/pathspec.c @@ -231,7 +231,9 @@ static unsigned prefix_pathspec(struct pathspec_item *item, const char *start = elt; if (prefixlen !literal_global) { /* Preserve the actual prefix length of each pattern */ - if (long_magic_end) { + if (short_magic) + die(BUG: prefixing on short magic is not supported); + else if (long_magic_end) { strbuf_add(sb, start, long_magic_end - start); strbuf_addf(sb, ,prefix:%d, prefixlen); start = long_magic_end; Good. I wonder if we should start thinking about removing the big NEEDSWORK comment in front of this function. Also the pathspec_magic[] array was the table-driven way that was meant to be enhanced to fit future needs to parse all supported types of pathspec magic, but it seems that prefix:12 magic is implemented using a custom/special case code. We may want to see if we want to enrich the parser to fold this to match the table-driven approach better someday---this is not urgent as we are not adding any new pathspec magic now. Will queue. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-svn: Fix termination issues for remote svn connections
Uli Heller uli.hel...@daemons-point.com writes: When using git-svn in combination with serf-1.2.1 core dumps are created on termination. This is caused by a bug in serf, a fix for the bug exists (see https://code.google.com/p/serf/source/detail?r=2146). Nevertheless, I think it makes sense to fix the issue within the git perl module Ra.pm, too. The change frees the private copy of the remote access object on termination which prevents the error from happening. Note: Since subversion-1.8.0 and later do require serf-1.2.1 or later, the core dumps typically do show up when upgrading to a recent version of subversion. Credits: Jonathan Lambrechts for proposing a fix to Ra.pm. Evgeny Kotkov and Ivan Zhakov for fixing the issue in serf and pointing me to that fix. --- Thanks. Please sign-off your patch. I am Cc'ing Kyle McKay who apparently had some experience working with git-svn with newer svn that can only use serf, hoping that we can get an independent opinion/test just to be sure. Also Cc'ed is Eric Wong who has been the official git-svn area expert, but I understand that Eric hasn't needed to use git-svn for quite a while, so it is perfectly fine if he does not have any comment on this one. We may want to find a volunteer to move git svn forward as a new area expert (aka subsystem maintainer), by the way. perl/Git/SVN/Ra.pm | 5 + 1 file changed, 5 insertions(+) diff --git a/perl/Git/SVN/Ra.pm b/perl/Git/SVN/Ra.pm index 75ecc42..78dd346 100644 --- a/perl/Git/SVN/Ra.pm +++ b/perl/Git/SVN/Ra.pm @@ -32,6 +32,11 @@ BEGIN { } } +END { + $RA = undef; + $ra_invalid = 1; +} + sub _auth_providers () { my @rv = ( SVN::Client::get_simple_provider(), -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, Sep 05, 2013 at 01:09:34PM -0400, Nicolas Pitre wrote: On Thu, 5 Sep 2013, Jeff King wrote: There are basically two solutions: 1. Add a single-bit flag for I am 0-padded in the real data. We could probably even squeeze it into the same integer. 2. Have a classic section of the pack that stores the raw object bytes. For objects which do not match our expectations, store them raw instead of in v4 format. They will not get the benefit of v4 optimizations, but if they are the minority of objects, that will only end up with a slight slow-down. That is basically what I just suggested. But instead of a special section, simply using a special object type number would do it. Yeah, I think we are in agreement. I only suggested a separate section because I hadn't carefully read the v4 patches yet, and didn't know if there was room in the normal sequence. A special object number seems much more elegant. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 37/38] pack v4: introduce escape hatches in the name and path indexes
On Thu, 5 Sep 2013, Nicolas Pitre wrote: If the path or name index is zero, this means the entry data is to be found inline rather than being located in the dictionary table. This is there to allow easy completion of thin packs without having to add new table entries which would have required a full rewrite of the pack data. Signed-off-by: Nicolas Pitre n...@fluxnic.net I'm now dropping this patch. Please also remove this from your documentation patch. I think that we've found a way to better support thin packs. You said: What if the sender prepares the sha-1 table to contain missing objects in advance? The sender should know what base objects are missing. Then we only need to append objects at the receiving end and verify that all new objects are also present in the sha-1 table. So the SHA1 table is covered. Missing objects in a thin pack cannot themselves be deltas. We had their undeltified form at the end of a pack for the pack to be complete. Therefore those missing objects serve only as base objects for other deltas. Although this is possible to have deltified commit objects in pack v2, I don't think this happens very often. There is no deltified commit objects in pack v4. Blob objects are the same in pack v2 and pack v4. No dictionary references are needed. That leaves only tree objects. And because we've also discussed the need to have non transcoded object representations for those odd cases such as zero padded file modes, we might as well simply use that for the appended tree objects already needed to complete a thin pack. At least the strings in tree entries will be compressed that way. Problem solved, and one less special case in the code. What do you think? Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Reject non-ff pulls by default
On Thu, Sep 05, 2013 at 12:18:45PM -0700, Junio C Hamano wrote: John Keeping j...@keeping.me.uk writes: On Wed, Sep 04, 2013 at 03:59:18PM -0700, Junio C Hamano wrote: Are there cases where you do not want to either rebase nor merge? If so what do you want to do after git pull fetches from the other side? Nothing? One other thing that I can see being useful occasionally is: git rebase @{u}@{1} --onto @{u} which allows local commits to be replayed onto a rewritten upstream branch. Sure, that would make sense. I somehow thought that rebase by default looked in the reflog to do exactly that. Perhaps I am not remembering correctly. It just does @{upstream} by default, which tends to get messy if the upstream has been rewritten. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Reject non-ff pulls by default
John Keeping j...@keeping.me.uk writes: On Wed, Sep 04, 2013 at 03:59:18PM -0700, Junio C Hamano wrote: Are there cases where you do not want to either rebase nor merge? If so what do you want to do after git pull fetches from the other side? Nothing? One other thing that I can see being useful occasionally is: git rebase @{u}@{1} --onto @{u} which allows local commits to be replayed onto a rewritten upstream branch. Sure, that would make sense. I somehow thought that rebase by default looked in the reflog to do exactly that. Perhaps I am not remembering correctly. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/5] Disable git status comment prefix
Matthieu Moy matthieu@imag.fr writes: Compared to v2, this changes essentially: * The prefix is actually disabled by default in this version. As a consequence, the option is renamed to status.oldStyle. * Since this is the default, the tests are updated to test the new defaults. In a first patch, I'm setting status.oldStyle=true in test files that require it (to keep the patch short), and the last patch actually updates the test expected results. This was actually useful as I did find (and fix) a few bugs updating the tests: - the --columns option was still showing the comment prefix - git commit --dry-run and failed git commit were still displaying comments to stdout. * The --for-status option is kept as a no-op. Much nicer. Will replace and queue. One caveat, though. The name oldStyle will become problematic, when we want to remove some wart in the output format long after this no comment prefix by default series lands. Some people may expect setting oldStyle=true would give output from 1.8.4 era, while some others would expect that it would give output from 1.8.5 era that does not have comment prefix but still has that wart we will remove down the line. It is a common mistake to make to think that one is making the final change to the code and the result of that final change will stay the latest forever. I am sure I've done that for a few times and later regretted them. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 03/11] t6050-replace: test that objects are of the same type
From: Junio C Hamano gits...@pobox.com Christian Couder chrisc...@tuxfamily.org writes: +test_expect_success 'replaced and replacement objects must be of the same type' ' +test_must_fail git replace mytag $HASH1 2err +grep mytag. points to a replaced object of type .tag err +grep $HASH1. points to a replacement object of type .commit err Hmm, would these messages ever get translated? I think it is sufficient to make sure that the proposed replacement fails for these cases. Ok, I will get rid of the grep statements. Thanks, Christian. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Zero padded file modes...
On Thu, Sep 05, 2013 at 01:13:40PM -0400, John Szakmeister wrote: Yep. These were mostly caused by a bug in Grit that is long-fixed. But the objects remain in many histories. It would have painful to rewrite them back then, and it would be even more painful now. I guess there's still the other side of the question though. Are these repositories busted in the sense that something no longer works? No, as far as I know, everything still works fine. However, some diffs may be suboptimal, because we may have two different sha1s for the same subtree (so we may descend into the tree unnecessarily only to find that they are equivalent). And by the same token, any scripts doing non-recursive diffs may erroneously mark the trees as differing, even though they do not contain any differing files. But neither is a big problem in practice. If you had two clients in active use which were flip-flopping a sub-tree back and forth between representations, it would be a problem. But we are talking about a few isolated incidents far back in history. I doesn't appear to be the case, but I've not used it extensively say I can't say for certain one way or another. In the sense that the content is not strictly compliant, transfer.fsckObjects did its job, but I wonder if fsck needs to be a little more tolerant now (at least with respect to transfer objects)? Fsck actually treats this as a warning, not an error. It is transfer.fsckObjects (via index-pack --strict) that actually treats warnings as errors. It's possible that this should be loosened to allow through problems marked as FSCK_WARN (with a message, kind of like...a warning). Though it may also make sense to revisit some of the classifications in fsck (e.g., many of the warnings are indicative of seriously broken objects). GitHub uses transfer.fsckObjects, rejecting all warnings[1]. In practice it is not usually a big deal, as people are happy to fix up their objects _before_ they get widely published. The biggest push-back we get is when somebody tries to re-push history they got from another GitHub repo, and then says But why are you complaining? You served this crappy broken history? And it's a fair point. If you are forking (but not joining the existing fork network) of an existing project with irregularities in the history, it's not really an option to simply rewrite the history you are basing on. -Peff [1] Actually, we do let through 0-padded modes with a warning, explicitly because of the problem mentioned above. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/5] Disable git status comment prefix
Junio C Hamano gits...@pobox.com writes: One caveat, though. The name oldStyle will become problematic, when we want to remove some wart in the output format long after this no comment prefix by default series lands. Some people may expect setting oldStyle=true would give output from 1.8.4 era, while some others would expect that it would give output from 1.8.5 era that does not have comment prefix but still has that wart we will remove down the line. I'm fine with any name actually (since it is enabled by default, people don't need to know the name to benefit from the new output). Maybe status.displayCommentPrefix was the best name after all. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-svn: Fix termination issues for remote svn connections
Junio C Hamano gits...@pobox.com wrote: Uli Heller uli.hel...@daemons-point.com writes: Nevertheless, I think it makes sense to fix the issue within the git perl module Ra.pm, too. The change frees the private copy of the remote access object on termination which prevents the error from happening. Thanks. Please sign-off your patch. I am Cc'ing Kyle McKay who apparently had some experience working with git-svn with newer svn that can only use serf, hoping that we can get an independent opinion/test just to be sure. Also Cc'ed is Eric Wong who has been the official git-svn area expert, but I understand that Eric hasn't needed to use git-svn for quite a while, so it is perfectly fine if he does not have any comment on this one. We may want to find a volunteer to move git svn forward as a new area expert (aka subsystem maintainer), by the way. Correct, git-svn has the effect of being self-obsoleting. I agree with adding a workaround for broken things, however I suggest a code comment explaining why it is necessary. The commit message is important, too, but might get harder to track down if there's code movement/refactoring in the future. +END { + $RA = undef; + $ra_invalid = 1; +} -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
Nicolas Pitre n...@fluxnic.net writes: On Thu, 5 Sep 2013, Duy Nguyen wrote: On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote: Now the pack index v3 probably needs to be improved a little, again to accommodate completion of thin packs. Given that the main SHA1 table is now in the main pack file, it should be possible to still carry a small SHA1 table in the index file that corresponds to the appended objects only. This means that a SHA1 search will have to first use the main SHA1 table in the pack file as it is done now, and if not found then use the SHA1 table in the index file if it exists. And of course nth_packed_object_sha1() will have to be adjusted accordingly. What if the sender prepares the sha-1 table to contain missing objects in advance? The sender should know what base objects are missing. Then we only need to append objects at the receiving end and verify that all new objects are also present in the sha-1 table. I do like this idea very much. And that doesn't increase the thin pack size as the larger SHA1 table will be compensated by a smaller sha1ref encoding in those objects referring to the missing ones. Let me see if I understand the proposal correctly. Compared to a normal pack-v4 stream, a thin pack-v4 stream: - has all the SHA-1 object names involved in the stream in its main object name table---most importantly, names of objects that thin optimization omits from the pack data body are included; - uses the SHA-1 object name table offset to refer to other objects, even to ones that thin stream will not transfer in the pack data body; - is completed at the receiving end by appending the data for the objects that were not transferred due to the thin optimization. So the invariant all objects contained in the pack in: - A table of sorted SHA-1 object names for all objects contained in the pack. that appears in Documentation/technical/pack-format.txt is still kept at the end, and more importantly, any object that is mentioned in this table can be reconstructed by using pack data in the same packfile without referencing anything else. Most importantly, if we were to build a v2 .idx file for the resulting .pack, the list of object names in the .idx file would be identical to the object names in this table in the .pack file. If that is the case, I too like this. I briefly wondered if it makes sense to mention objects that are often referred to that do not exist in the pack in this table (e.g. new commits included in this pack refer to a tree object that has not changed for ages---their trees mention this subtree using a SHA-1 reference encoding and being able to name the old, unchanging tree with an index to the object table may save space), but that would break the above invariant in a big way---some objects mentioned in the table may not exist in the packfile itself---and it probably is not a good idea. Unlike that broken idea, include names of the objects that will be appended anyway approach to help fattening a thin-pack makes very good sense to me. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates
On 09/04/2013 05:27 PM, Junio C Hamano wrote: I am not saying the above is the best format, but the point is that the mode of the operation defines the structure Great, thanks for your comments. Based on that I've prototyped a new format. Rather than jumping straight to the patch, here is my proposed format documentation for review: - With `--stdin`, update-ref reads instructions from standard input and performs all modifications together. Specify commands of the form: create SP ref SP newvalue LF update SP ref SP newvalue [SP oldvalue] LF delete SP ref [SP oldvalue] LF verify SP ref [SP oldvalue] LF option SP opt LF Quote fields containing whitespace as if they were strings in C source code. Alternatively, use `-z` to specify commands without quoting: create SP ref NUL newvalue NUL update SP ref NUL newvalue NUL [oldvalue] NUL delete SP ref NUL [oldvalue] NUL verify SP ref NUL [oldvalue] NUL option SP opt NUL Lines of any other format or a repeated ref produce an error. Command meanings are: create:: Create ref with newvalue only if it does not exist. update:: Update ref to be newvalue, verifying oldvalue if given. Specify a zero newvalue to delete a ref and/or a zero oldvalue to make sure that a ref does not exist. delete:: Delete ref, verifying oldvalue if given. verify:: Verify ref against oldvalue but do not change it. If oldvalue zero or missing, the ref must not exist. option:: Specify an option to take effect for following commands. Valid options are `deref` and `no-deref` to specify whether to dereference symbolic refs. Use 40 0 or the empty string to specify a zero value, except that with `-z` an empty oldvalue is considered missing. If all refs can be locked with matching oldvalues simultaneously, all modifications are performed. Otherwise, no modifications are performed. Note that while each individual ref is updated or deleted atomically, a concurrent reader may still see a subset of the modifications. - Thanks, -Brad -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, 5 Sep 2013, Junio C Hamano wrote: Nicolas Pitre n...@fluxnic.net writes: On Thu, 5 Sep 2013, Duy Nguyen wrote: On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote: Now the pack index v3 probably needs to be improved a little, again to accommodate completion of thin packs. Given that the main SHA1 table is now in the main pack file, it should be possible to still carry a small SHA1 table in the index file that corresponds to the appended objects only. This means that a SHA1 search will have to first use the main SHA1 table in the pack file as it is done now, and if not found then use the SHA1 table in the index file if it exists. And of course nth_packed_object_sha1() will have to be adjusted accordingly. What if the sender prepares the sha-1 table to contain missing objects in advance? The sender should know what base objects are missing. Then we only need to append objects at the receiving end and verify that all new objects are also present in the sha-1 table. I do like this idea very much. And that doesn't increase the thin pack size as the larger SHA1 table will be compensated by a smaller sha1ref encoding in those objects referring to the missing ones. Let me see if I understand the proposal correctly. Compared to a normal pack-v4 stream, a thin pack-v4 stream: - has all the SHA-1 object names involved in the stream in its main object name table---most importantly, names of objects that thin optimization omits from the pack data body are included; - uses the SHA-1 object name table offset to refer to other objects, even to ones that thin stream will not transfer in the pack data body; - is completed at the receiving end by appending the data for the objects that were not transferred due to the thin optimization. So the invariant all objects contained in the pack in: - A table of sorted SHA-1 object names for all objects contained in the pack. that appears in Documentation/technical/pack-format.txt is still kept at the end, and more importantly, any object that is mentioned in this table can be reconstructed by using pack data in the same packfile without referencing anything else. Most importantly, if we were to build a v2 .idx file for the resulting .pack, the list of object names in the .idx file would be identical to the object names in this table in the .pack file. That is right. If that is the case, I too like this. I briefly wondered if it makes sense to mention objects that are often referred to that do not exist in the pack in this table (e.g. new commits included in this pack refer to a tree object that has not changed for ages---their trees mention this subtree using a SHA-1 reference encoding and being able to name the old, unchanging tree with an index to the object table may save space), but that would break the above invariant in a big way---some objects mentioned in the table may not exist in the packfile itself---and it probably is not a good idea. Yet, if an old tree that doesn't change is often referred to, it should be possible to have only one such reference in the whole pack and all the other trees can use a delta copy sequence to refer to it. At this point whether or not the tree being referred to is listed inline or in the SHA1 table doesn't make a big difference. Unlike that broken idea, include names of the objects that will be appended anyway approach to help fattening a thin-pack makes very good sense to me. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RE: [PATCH] git-gui: Modify push dialog to support Gerrit review
On Thu, Sep 05, 2013 at 09:18:25AM +, Jørgen Edelbo wrote: -Original Message- From: Johannes Sixt [mailto:j.s...@viscovery.net] Sent: 5. september 2013 10:57 Please do not top-post. Am 9/5/2013 10:29, schrieb Jørgen Edelbo: -Original Message- From: Johannes Sixt Am 9/2/2013 10:54, schrieb Joergen Edelbo: Changes done: Remove selection of branches to push - push always HEAD. This can be justified by the fact that this far the most common thing to do. What are your plans to support a topic-based workflow? Far the most common thing to happen is that someone forgets to push completed topics. With this change, aren't those people forced to relinguish their current work because they have to checkout the completed topics to push them? I am not quite sure what your concern is. When I have completed topics A and B, but forgot to push them, and now I am working on topic C, how do I push topics A and B? You say I can only push HEAD. I understand this that I have to stop work on C (perhaps commit or stash any unfinished work), then checkout A, push it, checkout B, push it, checkout C and unstash the unfinished work. If my understanding is correct, the new restriction is a no-go. Ok, this way of working is not supported. It just never occurred to me that you would work this way. Forgetting to push something that you have just completed is very far from what I am used to. I think it comes most natural to push what you have done before changing topic. The reason I make a commit is to get it out of the door. FWIW, I also think that we should keep the box which allows you to select the branch to push. I did not realize that you were removing it when I first glanced at your patch. Even if your reasoning that pushing the currently checked out branch is correct: This box has been around for too long, so it will annoy people that got used to the fact that they can select the branch to push. Another problem: It is not very intuitive to only select the branch to push to. You can do that on the command line but IMO using git push origin HEAD:refs/heads/branchname is way less common than git push origin branchname and I think that should also be reflected in the gui. It might be more common for a gerrit user but for the typical git user without gerrit it is not. So to make it easy for the user to transition from gui to commandline and back with your patch I would expect: The user selects a branch to push. The new Destination Branches section automatically selects/shows the same name for the default case as destination (like the cli). So if I only select the branch to push it behaves the same as before. If you detect (I assume that is possible somehow) that the remote is a gerrit remote: Push for Gerrit review would automatically be ticked and the branch a git pull would merge (e.g. the one from branch.name.merge) is selected as the destination branch under refs/for/... . If there is no config for that, fallback to master. This is what I would expect with no further extension of the current git command line behavior and config options. So that way your patch will be an *extension* and not a change of behavior. Another unrelated thing that is currently left out: You can transport the local branchname when pushing to the magical gerrit refs/for/... . I would like to see that appended as well. But opposed to the branch selection that is not a show stopper for the patch more a side note. Cheers Heiko -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates
Brad King brad.k...@kitware.com writes: On 09/04/2013 05:27 PM, Junio C Hamano wrote: I am not saying the above is the best format, but the point is that the mode of the operation defines the structure Great, thanks for your comments. Based on that I've prototyped a new format. Rather than jumping straight to the patch, here is my proposed format documentation for review: - With `--stdin`, update-ref reads instructions from standard input and performs all modifications together. Specify commands of the form: create SP ref SP newvalue LF update SP ref SP newvalue [SP oldvalue] LF delete SP ref [SP oldvalue] LF verify SP ref [SP oldvalue] LF option SP opt LF Quote fields containing whitespace as if they were strings in C source code. Alternatively, use `-z` to specify commands without quoting: create SP ref NUL newvalue NUL update SP ref NUL newvalue NUL [oldvalue] NUL delete SP ref NUL [oldvalue] NUL verify SP ref NUL [oldvalue] NUL option SP opt NUL That SP in '-z' format looks strange. Was there a reason why NUL was inappropriate? Lines of any other format or a repeated ref produce an error. Command meanings are: create:: Create ref with newvalue only if it does not exist. update:: Update ref to be newvalue, verifying oldvalue if given. Specify a zero newvalue to delete a ref and/or a zero oldvalue to make sure that a ref does not exist. delete:: Delete ref, verifying oldvalue if given. verify:: Verify ref against oldvalue but do not change it. If oldvalue zero or missing, the ref must not exist. option:: Specify an option to take effect for following commands. Valid options are `deref` and `no-deref` to specify whether to dereference symbolic refs. This last one is somewhat peculiar, especially because it says following command*s*. How would I request to update refs HEAD and next in an all-or-none fashion, while applying 'no-deref' only to HEAD but not next? update refs/heads/next newvalue option --no-deref update HEAD newvalue sounds somewhat convoluted. When I said create or update in the message you are responding to, I did not mean that we should have two separate commands. The regular command line does create-or-update; if it exists already, it is an update, and if it does not, it is a create. If we were to have two, I would say we should have: create-or-update ref newvalue [oldvalue] create-or-update-no-deref ref newvalue [oldvalue] An old value of 0{40} would mean I should be the one creating; if somebody else created it while I was preparing this request, please fail. Similarly for delete: delete ref [oldvalue] delete-no-deref ref [oldvalue] Also how would one set the reflog message for the proposed update? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Reject non-ff pulls by default
From: Junio C Hamano gits...@pobox.com Philip Oakley philipoak...@iee.org writes: From: Junio C Hamano gits...@pobox.com John Keeping j...@keeping.me.uk writes: I think there are two distinct uses for pull, which boil down to: (1) git pull ... Peff already covered (1)---it is highly doubtful that a merge is almost always wrong. In fact, if that _were_ the case, we should simply be defaulting to rebase, not failing the command and asking between merge and rebase like jc/pull-training-wheel topic did. We simply do not know what the user wants, as it heavily depends on the project, so we ask the user to choose one (and stick to it). We only offer a limited list. It won't be sufficient for all use cases. It wasn't for me. Very interesting. Tell us more. What I do now is avoid Pull because of the hassle of fixing anything that may have gone wrong. Instead I now use a 'git fetch', followed by a 'push . (+etc:etc)' once I understand what I've got, or what I need to do different if wasn't a simple fast forward 'pull'. When git pull stops because what was fetched in FETCH_HEAD does not fast-forward, then what did _you_ do (and with the knowledge you currently have, what would you do)? In a single project, would you choose to sometimes rebase and sometimes merge, and if so, what is the choice depend on? When I am on these selected branches, I want to merge, but on other branches I want to rebase? In my case I have two home machines (main Windows machine and an occasional Linux laptop, though not directly networked together) and github as my level group, and have MSysGit and git/git (on github) as true upstreams, though they haven't been named that way [Aside: we are short of a good name for one's 'across-stream server' that one uses for backup/transfer such as github]. I general now use a forced update to bring my local machine up to date relative to whatever is upstream or on my across stream server, such as when transferring development from one machine to the other (where overwrite is the desired action) - e.g. when testing on the Linux laptop and a few corrections, before patch preparation on the Windows machine (different levels of familiarity). I occasionally will need to rebase my topic onto an updated git/master or git/pu if it is to be submitted upstream (patches to the list) or if upstream has moved, though I want to choose where I will rebase the topic onto. I don't need merging in that scenario, as I see those via your git repo ;-) It's not clear to me that a single default that uses a merge or rebase, without a 'stop if' criteria would be of any help in my situation. My thoughts are that the options on a fetch-pull are for the branch to be: * Overwritte (--force) (i.e. a conflict scenario) * Stop if not-ff (conflict scenario, this patch series) * rebase existing onto tracked [not a conflict in terms of initiation] * merge existing into tracked [not a conflict in terms of initiation] * fast-forward (bring tracked onto existing) [desired] Philip -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Problem setting up a shared git repository
I hope it's not too inappropriate to send a random question your way, but I've exhausted all other means and am quite lost at the moment.. I'm trying to setup a distributed development repository with a central repository acting as the production copy. I'm doing so on a Windows file share with no SSH / HTTP accessibility. Basically each developer will have their own copy of the project, and the shared drive should have a copy of the master copy (prod/master branch), along with the work-tree files. The idea is that any developer should be able to do independent development, staged commits, etc.. then push to the central (origin) repository, and production scripts will reflect these changes upon a push. I got pretty close to this setup by creating a bare repository on the file share server (f:\GitDBs\foo.git), then cloning the bare repository onto the production path like so: git clone f:\GitDBs\foo.git foo I cloned the bare repository just the same onto my local dev path.. and proceeded with development. This worked fine, and I was able to push / pull changes into origin (bare repo), and then I would go to my prod (f:\foo) repository (clone of bare f:\GitDBs\foo.git), then pull the changes.. The problem I faced later on was in parallel development, when changes were made to a file in one repository, and at the same time other changes made to the same file in another repository.. I couldh't push changes from the dev\foo to prod\foo or to origin.. I'm completely lost at the moment.. I try to set --git-dir or --work-tree and I get mixed results.. either the setting is not allowed when working in bare repositories, or I can't run certain operations from the work-tree and git-dir.. like git status, which in a split git-dir / work-tree environment (on windows), returns an error which specifies the operation is invalid in a bare repository, or specifies that the work-tree is not recognized as a repository... :( Please help me.. I'm new to Git, but love it already.. I would hate it if I had to work without it.. Thank you in advance for all your help! - Eyal -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 37/38] pack v4: introduce escape hatches in the name and path indexes
On Thu, 5 Sep 2013, Nicolas Pitre wrote: On Thu, 5 Sep 2013, Nicolas Pitre wrote: If the path or name index is zero, this means the entry data is to be found inline rather than being located in the dictionary table. This is there to allow easy completion of thin packs without having to add new table entries which would have required a full rewrite of the pack data. Signed-off-by: Nicolas Pitre n...@fluxnic.net I'm now dropping this patch. Please also remove this from your documentation patch. Well... I couldn't resist another little change that has been nagging me for a while. Both the author and committer time stamps are very closely related most of the time. So the committer time stamp is now encoded as a difference against the author time stamp with the LSB indicating a negative difference. On git.git this saves 0.3% on the pack size. Not much, but still impressive for only a time stamp. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/11] Documentation/replace: tell that -f option bypasses the type check
From: Philip Oakley philipoak...@iee.org Sent: Saturday, August 31, 2013 11:16 PM From: Christian Couder chrisc...@tuxfamily.org Signed-off-by: Christian Couder chrisc...@tuxfamily.org --- Documentation/git-replace.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt index 736b48c..a2bd2ee 100644 --- a/Documentation/git-replace.txt +++ b/Documentation/git-replace.txt @@ -21,10 +21,12 @@ replaced. The content of the 'replace' reference is the SHA-1 of the replacement object. The replaced object and the replacement object must be of the same type. -There is no other restriction on them. +This restriction can be bypassed using `-f`. Unless `-f` is given, the 'replace' reference must not yet exist. +There is no other restriction on the replaced and replacement objects. Is this trying to allude to the fact that merge commits may be exchanged with non-merge commits? I strongly believe that this ability to exchange merge and non-merge commits should be stated _explicitly_ to counteract the false beliefs that are listed out on the internet. It's probably better stated in a separate patch for that explicit purpose to avoid mixed messages within this commit. Not sure how this method of preparing a comment patch will pan out.. ---8 From a0c0e765cfd969c9c8a6ff3a2cb6b2f1391d2e7d Mon Sep 17 00:00:00 2001 From: Philip Oakley philipoak...@iee.org Date: Thu, 5 Sep 2013 22:54:04 +0100 Subject: [PATCH] Doc: 'replace' merge and non-merge commits Signed-off-by: Philip Oakley philipoak...@iee.org --- This is supplemental to Christian Couder's 'replace' patch series (2013-09-03 69dada4 (Christian Couder): t6050-replace: use some long option names). It adds the clarification that merge and non-merge commits are replaceable. Merges are often treated as special case objects so tell users that they are not special here. --- Documentation/git-replace.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt index 414000e..f373ab4 100644 --- a/Documentation/git-replace.txt +++ b/Documentation/git-replace.txt @@ -26,6 +26,7 @@ This restriction can be bypassed using `-f`. Unless `-f` is given, the 'replace' reference must not yet exist. There is no other restriction on the replaced and replacement objects. +Merge commits can be replaced by non-merge commits and vice versa. Replacement references will be used by default by all Git commands except those doing reachability traversal (prune, pack transfer and -- 1.8.1.msysgit.1 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3] check-ignore: Add option to ignore index contents
Dave Williams d...@opensourcesolutions.co.uk writes: check-ignore currently shows how .gitignore rules would treat untracked paths. Tracked paths do not generate useful output. This prevents debugging of why a path became tracked unexpectedly unless that path is first removed from the index with `git rm --cached path`. This option (-i, --no-index) simply by-passes the check for the path being in the index and hence allows tracked paths to be checked too. Now the long option name is --no-index, it makes me wonder if -i is a good synonym for it, and the longer I stare at it, the more certain I become convinced that it is a bad choice. Do we even need a short-and-sweet one-letter option for this? I'd prefer starting with only the long option. I came up with a squashable tweak to remove -i on top of this patch; will tentatively queue this patch with it to 'pu'. In particlar Junio queried my approach in builtin/git-check-ignores.c that bypassed the functions that check a path is in the index as well as avoiding reading the index in the first place. Thanks. I think this version is cleaner without those if (!no_index) used for special casing in the codeflow. An empty index is a valid state, in which the in-core index starts when any git program begins. Not reading the index should be the only thing necessary to mimick the state in which nothing has been added to the index, and if that is not the case, we have found a bug in the existing code. Regarding the test script I have tidied up the variables containing the separate option switches so they dont contain leading spaces, instead I have added spaces as needed when formatting the command line. This not only improves my patch but also the existing code which was a little inconsistent in this respect. Yeah, that's very much appreciated. Finally I have rebased from the latest commmit on master to pick up unrelated recent changes made to builtin/check-ignores.c and updated my code to be consistent with this. Hopefully I have put these patch notes in the right place now! Let me know if not. Dave Nicely done. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] cherry-pick: allow - as abbreviation of '@{-1}'
Hiroshige Umino hiroshig...@gmail.com writes: - abbreviation is handy for cherry-pick like checkout and merge. It's also good for uniformity that a - stands as the name of the previous branch where a branch name is accepted and it could not mean any other things like stdin. Signed-off-by: Hiroshige Umino hiroshig...@gmail.com --- This makes sense to me. The test t3500 is about git cherry command, so I came up with a tweak to move it to t3501, which is about cherry-pick, on top of this patch. Will tentatively queue this patch with that tweak to 'pu'. Thanks. builtin/revert.c | 2 ++ t/t3500-cherry.sh | 15 +++ 2 files changed, 17 insertions(+) diff --git a/builtin/revert.c b/builtin/revert.c index 8e87acd..52c35e7 100644 --- a/builtin/revert.c +++ b/builtin/revert.c @@ -202,6 +202,8 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix) memset(opts, 0, sizeof(opts)); opts.action = REPLAY_PICK; git_config(git_default_config, NULL); + if (!strcmp(argv[1], -)) + argv[1] = @{-1}; parse_args(argc, argv, opts); res = sequencer_pick_revisions(opts); if (res 0) diff --git a/t/t3500-cherry.sh b/t/t3500-cherry.sh index f038f34..547dbf8 100755 --- a/t/t3500-cherry.sh +++ b/t/t3500-cherry.sh @@ -55,4 +55,19 @@ test_expect_success \ expr $(echo $(git cherry master my-topic-branch) ) : + [^ ]* - .* ' +test_expect_success \ +'cherry-pick - does not work initially' \ +'test_must_fail git cherry-pick - +' + +test_expect_success \ +'cherry-pick the commit in the previous branch' \ +'git branch other + test_commit commit-to-pick newfile content + echo content expected + git checkout other + git cherry-pick - + test_cmp expected newfile +' + test_done -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/11] Documentation/replace: tell that -f option bypasses the type check
Philip Oakley philipoak...@iee.org writes: From: Philip Oakley philipoak...@iee.org Sent: Saturday, August 31, 2013 11:16 PM From: Christian Couder chrisc...@tuxfamily.org Signed-off-by: Christian Couder chrisc...@tuxfamily.org --- Documentation/git-replace.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt index 736b48c..a2bd2ee 100644 --- a/Documentation/git-replace.txt +++ b/Documentation/git-replace.txt @@ -21,10 +21,12 @@ replaced. The content of the 'replace' reference is the SHA-1 of the replacement object. The replaced object and the replacement object must be of the same type. -There is no other restriction on them. +This restriction can be bypassed using `-f`. Unless `-f` is given, the 'replace' reference must not yet exist. +There is no other restriction on the replaced and replacement objects. Is this trying to allude to the fact that merge commits may be exchanged with non-merge commits? I strongly believe that this ability to exchange merge and non-merge commits should be stated _explicitly_ to counteract the false beliefs that are listed out on the internet. It's probably better stated in a separate patch for that explicit purpose to avoid mixed messages within this commit. Not sure how this method of preparing a comment patch will pan out.. ---8 Make it --- 8 --- perhaps to balance out the perforation on both sides. From a0c0e765cfd969c9c8a6ff3a2cb6b2f1391d2e7d Mon Sep 17 00:00:00 2001 Not needed nor wanted. From: Philip Oakley philipoak...@iee.org Not needed but does not hurt. Date: Thu, 5 Sep 2013 22:54:04 +0100 Is OK but redundant given that your message has a timestamp when we saw your patch for the first time anyway. Subject: [PATCH] Doc: 'replace' merge and non-merge commits Signed-off-by: Philip Oakley philipoak...@iee.org --- This is supplemental to Christian Couder's 'replace' patch series (2013-09-03 69dada4 (Christian Couder): t6050-replace: use some long option names). It adds the clarification that merge and non-merge commits are replaceable. Merges are often treated as special case objects so tell users that they are not special here. I think the last paragraph deserves to be in the proposed commit log message proper. It explains why it is a good idea to have the added line in the documentation very well. --- Documentation/git-replace.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt index 414000e..f373ab4 100644 --- a/Documentation/git-replace.txt +++ b/Documentation/git-replace.txt @@ -26,6 +26,7 @@ This restriction can be bypassed using `-f`. Unless `-f` is given, the 'replace' reference must not yet exist. There is no other restriction on the replaced and replacement objects. +Merge commits can be replaced by non-merge commits and vice versa. Replacement references will be used by default by all Git commands except those doing reachability traversal (prune, pack transfer and -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-svn: Fix termination issues for remote svn connections
Eric Wong normalper...@yhbt.net writes: Junio C Hamano gits...@pobox.com wrote: Uli Heller uli.hel...@daemons-point.com writes: Nevertheless, I think it makes sense to fix the issue within the git perl module Ra.pm, too. The change frees the private copy of the remote access object on termination which prevents the error from happening. Thanks. Please sign-off your patch. I am Cc'ing Kyle McKay who apparently had some experience working with git-svn with newer svn that can only use serf, hoping that we can get an independent opinion/test just to be sure. Also Cc'ed is Eric Wong who has been the official git-svn area expert, but I understand that Eric hasn't needed to use git-svn for quite a while, so it is perfectly fine if he does not have any comment on this one. We may want to find a volunteer to move git svn forward as a new area expert (aka subsystem maintainer), by the way. Correct, git-svn has the effect of being self-obsoleting. I agree with adding a workaround for broken things, however I suggest a code comment explaining why it is necessary. The commit message is important, too, but might get harder to track down if there's code movement/refactoring in the future. Thanks for a good suggestion. I agree that this addition is a good example where in-code comment would really help the future readers. +END { + $RA = undef; + $ra_invalid = 1; +} -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/5] Disable git status comment prefix
Matthieu Moy matthieu@grenoble-inp.fr writes: Junio C Hamano gits...@pobox.com writes: One caveat, though. The name oldStyle will become problematic, when we want to remove some wart in the output format long after this no comment prefix by default series lands. Some people may expect setting oldStyle=true would give output from 1.8.4 era, while some others would expect that it would give output from 1.8.5 era that does not have comment prefix but still has that wart we will remove down the line. I'm fine with any name actually (since it is enabled by default, people don't need to know the name to benefit from the new output). Maybe status.displayCommentPrefix was the best name after all. Yeah, I think so. It makes it clear what kind of old behaviour the user is asking for. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/5] Disable git status comment prefix
On Thu, Sep 05, 2013 at 09:36:47PM +0200, Matthieu Moy wrote: Junio C Hamano gits...@pobox.com writes: One caveat, though. The name oldStyle will become problematic, when we want to remove some wart in the output format long after this no comment prefix by default series lands. Some people may expect setting oldStyle=true would give output from 1.8.4 era, while some others would expect that it would give output from 1.8.5 era that does not have comment prefix but still has that wart we will remove down the line. I'm fine with any name actually (since it is enabled by default, people don't need to know the name to benefit from the new output). Maybe status.displayCommentPrefix was the best name after all. FWIW, I had the same thought as Junio. I much prefer something like status.displayCommentPrefix for clarity and future-proofing. As for the feature itself, I am still undecided whether I like it. I've tried looking at the output of the series, and it feels weird to me. Part of it is undoubtedly that my brain is simply used to the other way. But it also seems to drop some of the vertical whitespace, which makes things feel very crowded. E.g., before: # On branch private # Your branch and 'origin/next' have diverged, # and have 472 and 59 different commits each, respectively. # # Untracked files: # t/foo # test-obj-pool # test-string-pool # test-treap # test-url-normalize nothing added to commit but untracked files present after: On branch private Your branch and 'origin/next' have diverged, and have 472 and 59 different commits each, respectively. Untracked files: t/foo test-obj-pool test-string-pool test-treap test-url-normalize nothing added to commit but untracked files present The blank before Untracked files was dropped (was this intentional? I didn't see any note of it in the discussion), and the bottom nothing added line butts against the untracked list more obviously, because they now all have the same comment indentation. I wonder if it would look a little nicer as: On branch private Your branch and 'origin/next' have diverged, and have 472 and 59 different commits each, respectively. Untracked files: t/foo test-obj-pool test-string-pool test-treap test-url-normalize nothing added to commit but untracked files present As an orthogonal thing to your patch, I feel like the first two items (branch and ahead/behind) are kind of squished and oddly formatted (in both the original and yours). Could we do something like: Your branch 'private' and its upstream 'origin/next' have diverged, and have 472 and 59 different commits each, respectively. when we are going to print both? That's 69 characters, which might overrun 80 if you have long branch names, but we could also line-break it differently. That doesn't need to be part of your topic, but while we are talking about the format of the message, maybe it is worth thinking about. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html