date:20130905

[PATCH 00/38] pack version 4 basic functionalities

2013-09-05 Thread Nicolas Pitre

After the initial posting here:

  http://news.gmane.org/group/gmane.comp.version-control.git/thread=233061

This is a repost plus the basic read side working, at least to validate
the write side and the pack format itself.  And many many bug fixes.

This can also be fetched here:

  git://git.linaro.org/people/nico/git

I consider the actual pack format definition final as implemented
by this code.

TODO:

- index-pack support

- native tree walk support

- native commit graph walk support

- better heuristics when creating tree delta encoding

- integration with pack-objects

- transfer protocol backward compatibility

- thin pack completion

- figure out unexplained runtime performance issues

However, as I mentioned already, I've put more time on this project lately
than I actually had available.  I really wanted to bring this project far
enough to be able to kick it out the door for others to take over, and
there we are.

I'm always available for design discussions and code review.  But don't
expect much additional code from me at this point.

@junio: I'm hoping you can take this branch as is, and apply any ffurther
patches on top.

The diffstat goes like this:

 Makefile|3 +
 cache.h |   11 +
 hex.c   |   11 +
 pack-check.c|4 +-
 pack-revindex.c |7 +-
 pack-write.c|6 +-
 packv4-create.c | 1105 +
 packv4-parse.c  |  408 ++
 packv4-parse.h  |9 +
 sha1_file.c |  110 -
 10 files changed, 1648 insertions(+), 26 deletions(-)

Enjoy !

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 34/38] pack v4: code to retrieve a path component

2013-09-05 Thread Nicolas Pitre

Because the path dictionary table is located right after the name
dictionary table, we currently need to load the later to find the
former.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 cache.h|  2 ++
 packv4-parse.c | 36 
 2 files changed, 38 insertions(+)

diff --git a/cache.h b/cache.h
index 6ce327e..5f2147a 100644
--- a/cache.h
+++ b/cache.h
@@ -1030,6 +1030,8 @@ extern struct packed_git {
int version;
int index_version;
struct packv4_dict *name_dict;
+   off_t name_dict_end;
+   struct packv4_dict *path_dict;
time_t mtime;
int pack_fd;
unsigned pack_local:1,
diff --git a/packv4-parse.c b/packv4-parse.c
index 431f47e..b80b73e 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -114,6 +114,7 @@ static void load_name_dict(struct packed_git *p)
if (!names)
die(bad pack name dictionary in %s, p-pack_name);
p-name_dict = names;
+   p-name_dict_end = offset;
 }
 
 const unsigned char *get_nameref(struct packed_git *p, const unsigned char 
**srcp)
@@ -131,6 +132,41 @@ const unsigned char *get_nameref(struct packed_git *p, 
const unsigned char **src
return p-name_dict-data + p-name_dict-offsets[index];
 }
 
+static void load_path_dict(struct packed_git *p)
+{
+   off_t offset;
+   struct packv4_dict *paths;
+
+   /*
+* For now we need to load the name dictionary to know where
+* it ends and therefore where the path dictionary starts.
+*/
+   if (!p-name_dict)
+   load_name_dict(p);
+
+   offset = p-name_dict_end;
+   paths = load_dict(p, offset);
+   if (!paths)
+   die(bad pack path dictionary in %s, p-pack_name);
+   p-path_dict = paths;
+}
+
+const unsigned char *get_pathref(struct packed_git *p, const unsigned char 
**srcp)
+{
+   unsigned int index;
+
+   if (!p-path_dict)
+   load_path_dict(p);
+
+   index = decode_varint(srcp);
+   if (index  1 || index - 1 = p-path_dict-nb_entries) {
+   error(%s: index overflow, __func__);
+   return NULL;
+   }
+   index -= 1;
+   return p-path_dict-data + p-path_dict-offsets[index];
+}
+
 void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs,
 off_t offset, unsigned long size)
 {
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 28/38] pack v4: code to load and prepare a pack dictionary table for use

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-parse.c | 77 ++
 1 file changed, 77 insertions(+)

diff --git a/packv4-parse.c b/packv4-parse.c
index 299fc48..26894bc 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -28,3 +28,80 @@ const unsigned char *get_sha1ref(struct packed_git *p,
 
return sha1;
 }
+
+struct packv4_dict {
+   const unsigned char *data;
+   unsigned int nb_entries;
+   unsigned int offsets[FLEX_ARRAY];
+};
+
+static struct packv4_dict *load_dict(struct packed_git *p, off_t *offset)
+{
+   struct pack_window *w_curs = NULL;
+   off_t curpos = *offset;
+   unsigned long dict_size, avail;
+   unsigned char *src, *data;
+   const unsigned char *cp;
+   git_zstream stream;
+   struct packv4_dict *dict;
+   int nb_entries, i, st;
+
+   /* get uncompressed dictionary data size */
+   src = use_pack(p, w_curs, curpos, avail);
+   cp = src;
+   dict_size = decode_varint(cp);
+   if (dict_size  3) {
+   error(bad dict size);
+   return NULL;
+   }
+   curpos += cp - src;
+
+   data = xmallocz(dict_size);
+   memset(stream, 0, sizeof(stream));
+   stream.next_out = data;
+   stream.avail_out = dict_size + 1;
+
+   git_inflate_init(stream);
+   do {
+   src = use_pack(p, w_curs, curpos, stream.avail_in);
+   stream.next_in = src;
+   st = git_inflate(stream, Z_FINISH);
+   curpos += stream.next_in - src;
+   } while ((st == Z_OK || st == Z_BUF_ERROR)  stream.avail_out);
+   git_inflate_end(stream);
+   unuse_pack(w_curs);
+   if (st != Z_STREAM_END || stream.total_out != dict_size) {
+   error(pack dictionary bad);
+   free(data);
+   return NULL;
+   }
+
+   /* count number of entries */
+   nb_entries = 0;
+   cp = data;
+   while (cp  data + dict_size - 3) {
+   cp += 2;  /* prefix bytes */
+   cp += strlen((const char *)cp);  /* entry string */
+   cp += 1;  /* terminating NUL */
+   nb_entries++;
+   }
+   if (cp - data != dict_size) {
+   error(dict size mismatch);
+   free(data);
+   return NULL;
+   }
+
+   dict = xmalloc(sizeof(*dict) + nb_entries * sizeof(dict-offsets[0]));
+   dict-data = data;
+   dict-nb_entries = nb_entries;
+
+   cp = data;
+   for (i = 0; i  nb_entries; i++) {
+   dict-offsets[i] = cp - data;
+   cp += 2;
+   cp += strlen((const char *)cp) + 1;
+   }
+
+   *offset = curpos;
+   return dict;
+}
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 32/38] pack v4: parse delta base reference

2013-09-05 Thread Nicolas Pitre

There is only one type of delta with pack v4.  The base reference
encoding already handles either an offset (via the pack index) or a
literal SHA1.

We assume in the literal SHA1 case that the object lives in the same
pack, just like with previous pack versions.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 sha1_file.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 67eb903..f3bfa28 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1686,7 +1686,19 @@ static off_t get_delta_base(struct packed_git *p,
 * that is assured.  An OFS_DELTA longer than the hash size
 * is stupid, as then a REF_DELTA would be smaller to store.
 */
-   if (type == OBJ_OFS_DELTA) {
+   if (p-version = 4) {
+   if (base_info[0] != 0) {
+   const unsigned char *cp = base_info;
+   unsigned int base_index = decode_varint(cp);
+   if (!base_index || base_index - 1 = p-num_objects)
+   return 0;  /* out of bounds */
+   *curpos += cp - base_info;
+   base_offset = nth_packed_object_offset(p, base_index - 
1);
+   } else {
+   base_offset = find_pack_entry_one(base_info+1, p);
+   *curpos += 21;
+   }
+   } else if (type == OBJ_OFS_DELTA) {
const unsigned char *cp = base_info;
base_offset = decode_varint(cp);
base_offset = delta_obj_offset - base_offset;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 33/38] pack v4: we can read commit objects now

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 Makefile   |  1 +
 packv4-parse.c |  1 +
 packv4-parse.h |  7 +++
 sha1_file.c| 10 ++
 4 files changed, 19 insertions(+)
 create mode 100644 packv4-parse.h

diff --git a/Makefile b/Makefile
index ba6cafc..22fc276 100644
--- a/Makefile
+++ b/Makefile
@@ -702,6 +702,7 @@ LIB_H += notes.h
 LIB_H += object.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
+LIB_H += packv4-parse.h
 LIB_H += parse-options.h
 LIB_H += patch-ids.h
 LIB_H += pathspec.h
diff --git a/packv4-parse.c b/packv4-parse.c
index bca1a97..431f47e 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -9,6 +9,7 @@
  */
 
 #include cache.h
+#include packv4-parse.h
 #include varint.h
 
 const unsigned char *get_sha1ref(struct packed_git *p,
diff --git a/packv4-parse.h b/packv4-parse.h
new file mode 100644
index 000..40aa75a
--- /dev/null
+++ b/packv4-parse.h
@@ -0,0 +1,7 @@
+#ifndef PACKV4_PARSE_H
+#define PACKV4_PARSE_H
+
+void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs,
+off_t offset, unsigned long size);
+
+#endif
diff --git a/sha1_file.c b/sha1_file.c
index f3bfa28..b57d9f8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -19,6 +19,7 @@
 #include tree-walk.h
 #include refs.h
 #include pack-revindex.h
+#include packv4-parse.h
 #include sha1-lookup.h
 #include bulk-checkin.h
 #include streaming.h
@@ -2172,6 +2173,15 @@ void *unpack_entry(struct packed_git *p, off_t 
obj_offset,
break;
case OBJ_COMMIT:
case OBJ_TREE:
+   if (p-version = 4  !base_from_cache) {
+   if (type == OBJ_COMMIT) {
+   data = pv4_get_commit(p, w_curs, curpos, size);
+   } else {
+   die(no pack v4 tree parsing yet);
+   }
+   break;
+   }
+   /* fall through */
case OBJ_BLOB:
case OBJ_TAG:
if (!base_from_cache)
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 38/38] packv4-create: add a command line argument to limit tree copy sequences

2013-09-05 Thread Nicolas Pitre

Because there is no delta object cache for tree objects yet, walking
tree entries may result in a lot of recursion.

Let's add --min-tree-copy=N where N is the minimum number of copied
entries in a single copy sequence allowed for encoding tree deltas.
The default is 1. Specifying 0 disables tree deltas entirely.

This allows for experiments with the delta width and see the influence
on pack size vs runtime access cost.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 9d6ffc0..34dcebf 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -19,6 +19,7 @@
 
 static int pack_compression_seen;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
+static int min_tree_copy = 1;
 
 struct data_entry {
unsigned offset;
@@ -441,7 +442,7 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
if (!size)
return NULL;
 
-   if (!delta_size)
+   if (!delta_size || !min_tree_copy)
delta = NULL;
 
/*
@@ -530,7 +531,6 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
cp += encode_varint(copy_count, cp);
if (first_delta)
cp += encode_sha1ref(delta_sha1, cp);
-   copy_count = 0;
 
/*
 * Now let's make sure this is going to take less
@@ -538,12 +538,14 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
 * created in parallel.  If so we dump the copy
 * sequence over those entries in the output buffer.
 */
-   if (cp - copy_buf  out - buffer[copy_pos]) {
+   if (copy_count = min_tree_copy 
+   cp - copy_buf  out - buffer[copy_pos]) {
out = buffer + copy_pos;
memcpy(out, copy_buf, cp - copy_buf);
out += cp - copy_buf;
first_delta = 0;
}
+   copy_count = 0;
}
 
if (end - out  48) {
@@ -574,7 +576,8 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
cp += encode_varint(copy_count, cp);
if (first_delta)
cp += encode_sha1ref(delta_sha1, cp);
-   if (cp - copy_buf  out - buffer[copy_pos]) {
+   if (copy_count = min_tree_copy 
+   cp - copy_buf  out - buffer[copy_pos]) {
out = buffer + copy_pos;
memcpy(out, copy_buf, cp - copy_buf);
out += cp - copy_buf;
@@ -1078,14 +1081,24 @@ static int git_pack_config(const char *k, const char 
*v, void *cb)
 
 int main(int argc, char *argv[])
 {
-   if (argc != 3) {
-   fprintf(stderr, Usage: %s src_packfile dst_packfile\n, 
argv[0]);
+   char *src_pack, *dst_pack;
+
+   if (argc == 3) {
+   src_pack = argv[1];
+   dst_pack = argv[2];
+   } else if (argc == 4  !prefixcmp(argv[1], --min-tree-copy=)) {
+   min_tree_copy = atoi(argv[1] + strlen(--min-tree-copy=));
+   src_pack = argv[2];
+   dst_pack = argv[3];
+   } else {
+   fprintf(stderr, Usage: %s [--min-tree-copy=n] src_packfile 
dst_packfile\n, argv[0]);
exit(1);
}
+
git_config(git_pack_config, NULL);
if (!pack_compression_seen  core_compression_seen)
pack_compression_level = core_compression_level;
-   process_one_pack(argv[1], argv[2]);
+   process_one_pack(src_pack, dst_pack);
if (0)
dict_dump();
return 0;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 35/38] pack v4: decode tree objects

2013-09-05 Thread Nicolas Pitre

For now we recreate the whole tree object in its canonical form.

Eventually, the core code should grow some ability to walk packv4 tree
entries directly which would be way more efficient.  Not only would that
avoid double tree entry parsing, but the pack v4 encoding allows for
getting at child objects without going through the SHA1 search.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-parse.c | 137 ++---
 1 file changed, 131 insertions(+), 6 deletions(-)

diff --git a/packv4-parse.c b/packv4-parse.c
index b80b73e..04eab46 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -151,19 +151,15 @@ static void load_path_dict(struct packed_git *p)
p-path_dict = paths;
 }
 
-const unsigned char *get_pathref(struct packed_git *p, const unsigned char 
**srcp)
+const unsigned char *get_pathref(struct packed_git *p, unsigned int index)
 {
-   unsigned int index;
-
if (!p-path_dict)
load_path_dict(p);
 
-   index = decode_varint(srcp);
-   if (index  1 || index - 1 = p-path_dict-nb_entries) {
+   if (index = p-path_dict-nb_entries) {
error(%s: index overflow, __func__);
return NULL;
}
-   index -= 1;
return p-path_dict-data + p-path_dict-offsets[index];
 }
 
@@ -240,3 +236,132 @@ void *pv4_get_commit(struct packed_git *p, struct 
pack_window **w_curs,
 
return dst;
 }
+
+static int decode_entries(struct packed_git *p, struct pack_window **w_curs,
+ off_t offset, unsigned int start, unsigned int count,
+ unsigned char **dstp, unsigned long *sizep, int hdr)
+{
+   unsigned long avail;
+   unsigned int nb_entries;
+   const unsigned char *src, *scp;
+   off_t copy_objoffset = 0;
+
+   src = use_pack(p, w_curs, offset, avail);
+   scp = src;
+
+   if (hdr) {
+   /* we need to skip over the object header */
+   while (*scp  128)
+   if (++scp - src = avail - 20)
+   return -1;
+   /* let's still make sure this is actually a tree */
+   if ((*scp++  0xf) != OBJ_TREE)
+   return -1;
+   }
+
+   nb_entries = decode_varint(scp);
+   if (scp == src || start  nb_entries || count  nb_entries - start)
+   return -1;
+   offset += scp - src;
+   avail -= scp - src;
+   src = scp;
+
+   while (count) {
+   unsigned int what;
+
+   if (avail  20) {
+   src = use_pack(p, w_curs, offset, avail);
+   if (avail  20)
+   return -1;
+   }
+   scp = src;
+
+   what = decode_varint(scp);
+   if (scp == src)
+   return -1;
+
+   if (!(what  1)  start != 0) {
+   /*
+* This is a single entry and we have to skip it.
+* The path index was parsed and is in 'what'.
+* Skip over the SHA1 index.
+*/
+   while (*scp++  128);
+   start--;
+   } else if (!(what  1)  start == 0) {
+   /*
+* This is an actual tree entry to recreate.
+*/
+   const unsigned char *path, *sha1;
+   unsigned mode;
+   int len;
+
+   path = get_pathref(p, what  1);
+   sha1 = get_sha1ref(p, scp);
+   if (!path || !sha1)
+   return -1;
+   mode = (path[0]  8) | path[1];
+   len = snprintf((char *)*dstp, *sizep, %o %s%c,
+  mode, path+2, '\0');
+   if (len + 20  *sizep)
+   return -1;
+   hashcpy(*dstp + len, sha1);
+   *dstp += len + 20;
+   *sizep -= len + 20;
+   count--;
+   } else if (what  1) {
+   /*
+* Copy from another tree object.
+*/
+   unsigned int copy_start, copy_count;
+
+   copy_start = what  1;
+   copy_count = decode_varint(scp);
+   if (!copy_count)
+   return -1;
+
+   /*
+* The LSB of copy_count is a flag indicating if
+* a third value is provided to specify the source
+* object.  This may be omitted when it doesn't
+* change, but has to be specified at least for the
+* first copy

[PATCH 31/38] sha1_file.c: make use of decode_varint()

2013-09-05 Thread Nicolas Pitre

... replacing the equivalent open coded loop.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 sha1_file.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index a298933..67eb903 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1687,20 +1687,12 @@ static off_t get_delta_base(struct packed_git *p,
 * is stupid, as then a REF_DELTA would be smaller to store.
 */
if (type == OBJ_OFS_DELTA) {
-   unsigned used = 0;
-   unsigned char c = base_info[used++];
-   base_offset = c  127;
-   while (c  128) {
-   base_offset += 1;
-   if (!base_offset || MSB(base_offset, 7))
-   return 0;  /* overflow */
-   c = base_info[used++];
-   base_offset = (base_offset  7) + (c  127);
-   }
+   const unsigned char *cp = base_info;
+   base_offset = decode_varint(cp);
base_offset = delta_obj_offset - base_offset;
if (base_offset = 0 || base_offset = delta_obj_offset)
return 0;  /* out of bound */
-   *curpos += used;
+   *curpos += cp - base_info;
} else if (type == OBJ_REF_DELTA) {
/* The base entry _must_ be in the same pack */
base_offset = find_pack_entry_one(base_info, p);
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 29/38] pack v4: code to retrieve a name

2013-09-05 Thread Nicolas Pitre

The name dictionary is loaded if not already done.  We know it is
located right after the SHA1 table (20 bytes per object) which is
itself right after the 12-byte header.

Then the index is parsed from the input buffer and a pointer to the
corresponding entry is returned.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 cache.h|  3 +++
 packv4-parse.c | 24 
 2 files changed, 27 insertions(+)

diff --git a/cache.h b/cache.h
index 59d9ba7..6ce327e 100644
--- a/cache.h
+++ b/cache.h
@@ -1015,6 +1015,8 @@ struct pack_window {
unsigned int inuse_cnt;
 };
 
+struct packv4_dict;
+
 extern struct packed_git {
struct packed_git *next;
struct pack_window *windows;
@@ -1027,6 +1029,7 @@ extern struct packed_git {
unsigned char *bad_object_sha1;
int version;
int index_version;
+   struct packv4_dict *name_dict;
time_t mtime;
int pack_fd;
unsigned pack_local:1,
diff --git a/packv4-parse.c b/packv4-parse.c
index 26894bc..074e107 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -105,3 +105,27 @@ static struct packv4_dict *load_dict(struct packed_git *p, 
off_t *offset)
*offset = curpos;
return dict;
 }
+
+static void load_name_dict(struct packed_git *p)
+{
+   off_t offset = 12 + p-num_objects * 20;
+   struct packv4_dict *names = load_dict(p, offset);
+   if (!names)
+   die(bad pack name dictionary in %s, p-pack_name);
+   p-name_dict = names;
+}
+
+const unsigned char *get_nameref(struct packed_git *p, const unsigned char 
**srcp)
+{
+   unsigned int index;
+
+   if (!p-name_dict)
+   load_name_dict(p);
+
+   index = decode_varint(srcp);
+   if (index = p-name_dict-nb_entries) {
+   error(%s: index overflow, __func__);
+   return NULL;
+   }
+   return p-name_dict-data + p-name_dict-offsets[index];
+}
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 37/38] pack v4: introduce escape hatches in the name and path indexes

2013-09-05 Thread Nicolas Pitre

If the path or name index is zero, this means the entry data is to be
found inline rather than being located in the dictionary table. This is
there to allow easy completion of thin packs without having to add new
table entries which would have required a full rewrite of the pack data.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c |  6 +++---
 packv4-parse.c  | 28 ++--
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index fd16222..9d6ffc0 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -343,7 +343,7 @@ void *pv4_encode_commit(void *buffer, unsigned long *sizep)
index = dict_add_entry(commit_name_table, tz_val, in, end - in);
if (index  0)
goto bad_dict;
-   out += encode_varint(index, out);
+   out += encode_varint(index + 1, out);
time = strtoul(end, end, 10);
if (!end || end[0] != ' ' || end[6] != '\n')
goto bad_data;
@@ -361,7 +361,7 @@ void *pv4_encode_commit(void *buffer, unsigned long *sizep)
index = dict_add_entry(commit_name_table, tz_val, in, end - in);
if (index  0)
goto bad_dict;
-   out += encode_varint(index, out);
+   out += encode_varint(index + 1, out);
time = strtoul(end, end, 10);
if (!end || end[0] != ' ' || end[6] != '\n')
goto bad_data;
@@ -561,7 +561,7 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
free(buffer);
return NULL;
}
-   out += encode_varint(index  1, out);
+   out += encode_varint((index + 1)  1, out);
out += encode_sha1ref(name_entry.sha1, out);
}
 
diff --git a/packv4-parse.c b/packv4-parse.c
index 4c218d2..6db4ed3 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -125,11 +125,19 @@ const unsigned char *get_nameref(struct packed_git *p, 
const unsigned char **src
load_name_dict(p);
 
index = decode_varint(srcp);
-   if (index = p-name_dict-nb_entries) {
+
+   if (!index) {
+   /* the entry data is inline */
+   const unsigned char *data = *srcp;
+   *srcp += 2 + strlen((const char *)*srcp + 2) + 1;
+   return data;
+   }
+
+   if (index - 1 = p-name_dict-nb_entries) {
error(%s: index overflow, __func__);
return NULL;
}
-   return p-name_dict-data + p-name_dict-offsets[index];
+   return p-name_dict-data + p-name_dict-offsets[index - 1];
 }
 
 static void load_path_dict(struct packed_git *p)
@@ -151,16 +159,24 @@ static void load_path_dict(struct packed_git *p)
p-path_dict = paths;
 }
 
-const unsigned char *get_pathref(struct packed_git *p, unsigned int index)
+const unsigned char *get_pathref(struct packed_git *p, unsigned int index,
+const unsigned char **srcp)
 {
if (!p-path_dict)
load_path_dict(p);
 
-   if (index = p-path_dict-nb_entries) {
+   if (!index) {
+   /* the entry data is inline */
+   const unsigned char *data = *srcp;
+   *srcp += 2 + strlen((const char *)*srcp + 2) + 1;
+   return data;
+   }
+
+   if (index - 1 = p-path_dict-nb_entries) {
error(%s: index overflow, __func__);
return NULL;
}
-   return p-path_dict-data + p-path_dict-offsets[index];
+   return p-path_dict-data + p-path_dict-offsets[index - 1];
 }
 
 void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs,
@@ -296,7 +312,7 @@ static int decode_entries(struct packed_git *p, struct 
pack_window **w_curs,
unsigned mode;
int len;
 
-   path = get_pathref(p, what  1);
+   path = get_pathref(p, what  1, scp);
sha1 = get_sha1ref(p, scp);
if (!path || !sha1)
return -1;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 36/38] pack v4: get tree objects

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-parse.c | 25 +
 packv4-parse.h |  2 ++
 sha1_file.c|  2 +-
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/packv4-parse.c b/packv4-parse.c
index 04eab46..4c218d2 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -365,3 +365,28 @@ static int decode_entries(struct packed_git *p, struct 
pack_window **w_curs,
 
return 0;
 }
+
+void *pv4_get_tree(struct packed_git *p, struct pack_window **w_curs,
+  off_t offset, unsigned long size)
+{
+   unsigned long avail;
+   unsigned int nb_entries;
+   unsigned char *dst, *dcp;
+   const unsigned char *src, *scp;
+   int ret;
+
+   src = use_pack(p, w_curs, offset, avail);
+   scp = src;
+   nb_entries = decode_varint(scp);
+   if (scp == src)
+   return NULL;
+
+   dst = xmallocz(size);
+   dcp = dst;
+   ret = decode_entries(p, w_curs, offset, 0, nb_entries, dcp, size, 0);
+   if (ret  0 || size != 0) {
+   free(dst);
+   return NULL;
+   }
+   return dst;
+}
diff --git a/packv4-parse.h b/packv4-parse.h
index 40aa75a..5f9d809 100644
--- a/packv4-parse.h
+++ b/packv4-parse.h
@@ -3,5 +3,7 @@
 
 void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs,
 off_t offset, unsigned long size);
+void *pv4_get_tree(struct packed_git *p, struct pack_window **w_curs,
+  off_t offset, unsigned long size);
 
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index b57d9f8..79e1293 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2177,7 +2177,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
if (type == OBJ_COMMIT) {
data = pv4_get_commit(p, w_curs, curpos, size);
} else {
-   die(no pack v4 tree parsing yet);
+   data = pv4_get_tree(p, w_curs, curpos, size);
}
break;
}
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/38] pack v4: commit object encoding

2013-09-05 Thread Nicolas Pitre

This goes as follows:

- Tree reference: either variable length encoding of the index
  into the SHA1 table or the literal SHA1 prefixed by 0 (see
  encode_sha1ref()).

- Parent count: variable length encoding of the number of parents.
  This is normally going to occupy a single byte but doesn't have to.

- List of parent references: a list of encode_sha1ref() encoded
  references, or nothing if the parent count was zero.

- Author reference: variable length encoding of an index into the author
  identifier dictionary table which also covers the time zone.  To make
  the overall encoding efficient, the author table is sorted by usage
  frequency so the most used names are first and require the shortest
  index encoding.

- Author time stamp: variable length encoded.  Year 2038 ready!

- Committer reference: same as author reference.

- Committer time stamp: same as author time stamp.

The remainder of the canonical commit object content is then zlib
compressed and appended to the above.

Rationale: The most important commit object data is densely encoded while
requiring no zlib inflate processing on access, and all SHA1 references
are most likely to be direct indices into the pack index file requiring
no SHA1 search into the pack index file.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 119 
 1 file changed, 119 insertions(+)

diff --git a/packv4-create.c b/packv4-create.c
index 12527c0..d4a79f4 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -14,6 +14,9 @@
 #include pack.h
 #include varint.h
 
+
+static int pack_compression_level = Z_DEFAULT_COMPRESSION;
+
 struct data_entry {
unsigned offset;
unsigned size;
@@ -274,6 +277,122 @@ static int encode_sha1ref(const unsigned char *sha1, 
unsigned char *buf)
return 1 + 20;
 }
 
+/*
+ * This converts a canonical commit object buffer into its
+ * tightly packed representation using the already populated
+ * and sorted commit_name_table dictionary.  The parsing is
+ * strict so to ensure the canonical version may always be
+ * regenerated and produce the same hash.
+ */
+void *pv4_encode_commit(void *buffer, unsigned long *sizep)
+{
+   unsigned long size = *sizep;
+   char *in, *tail, *end;
+   unsigned char *out;
+   unsigned char sha1[20];
+   int nb_parents, index, tz_val;
+   unsigned long time;
+   z_stream stream;
+   int status;
+
+   /*
+* It is guaranteed that the output is always going to be smaller
+* than the input.  We could even do this conversion in place.
+*/
+   in = buffer;
+   tail = in + size;
+   buffer = xmalloc(size);
+   out = buffer;
+
+   /* parse the tree line */
+   if (in + 46 = tail || memcmp(in, tree , 5) || in[45] != '\n')
+   goto bad_data;
+   if (get_sha1_lowhex(in + 5, sha1)  0)
+   goto bad_data;
+   in += 46;
+   out += encode_sha1ref(sha1, out);
+
+   /* count how many parent lines */
+   nb_parents = 0;
+   while (in + 48  tail  !memcmp(in, parent , 7)  in[47] == '\n') {
+   nb_parents++;
+   in += 48;
+   }
+   out += encode_varint(nb_parents, out);
+
+   /* rewind and parse the parent lines */
+   in -= 48 * nb_parents;
+   while (nb_parents--) {
+   if (get_sha1_lowhex(in + 7, sha1))
+   goto bad_data;
+   out += encode_sha1ref(sha1, out);
+   in += 48;
+   }
+
+   /* parse the author line */
+   /* it must be at least author x x 0 +\n i.e. 21 chars */
+   if (in + 21 = tail || memcmp(in, author , 7))
+   goto bad_data;
+   in += 7;
+   end = get_nameend_and_tz(in, tz_val);
+   if (!end)
+   goto bad_data;
+   index = dict_add_entry(commit_name_table, tz_val, in, end - in);
+   if (index  0)
+   goto bad_dict;
+   out += encode_varint(index, out);
+   time = strtoul(end, end, 10);
+   if (!end || end[0] != ' ' || end[6] != '\n')
+   goto bad_data;
+   out += encode_varint(time, out);
+   in = end + 7;
+
+   /* parse the committer line */
+   /* it must be at least committer x x 0 +\n i.e. 24 chars */
+   if (in + 24 = tail || memcmp(in, committer , 7))
+   goto bad_data;
+   in += 10;
+   end = get_nameend_and_tz(in, tz_val);
+   if (!end)
+   goto bad_data;
+   index = dict_add_entry(commit_name_table, tz_val, in, end - in);
+   if (index  0)
+   goto bad_dict;
+   out += encode_varint(index, out);
+   time = strtoul(end, end, 10);
+   if (!end || end[0] != ' ' || end[6] != '\n')
+   goto bad_data;
+   out += encode_varint(time, out);
+   in = end + 7;
+
+   /* finally, deflate the remaining data */
+   memset(stream, 0, sizeof(stream));
+   deflateInit(stream,

[PATCH 01/38] pack v4: initial pack dictionary structure and code

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 137 
 1 file changed, 137 insertions(+)
 create mode 100644 packv4-create.c

diff --git a/packv4-create.c b/packv4-create.c
new file mode 100644
index 000..2de6d41
--- /dev/null
+++ b/packv4-create.c
@@ -0,0 +1,137 @@
+/*
+ * packv4-create.c: management of dictionary tables used in pack v4
+ *
+ * (C) Nicolas Pitre n...@fluxnic.net
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include cache.h
+
+struct data_entry {
+   unsigned offset;
+   unsigned hits;
+};
+
+struct dict_table {
+   char *data;
+   unsigned ptr;
+   unsigned size;
+   struct data_entry *entry;
+   unsigned nb_entries;
+   unsigned max_entries;
+   unsigned *hash;
+   unsigned hash_size;
+};
+
+struct dict_table *create_dict_table(void)
+{
+   return xcalloc(sizeof(struct dict_table), 1);
+}
+
+void destroy_dict_table(struct dict_table *t)
+{
+   free(t-data);
+   free(t-entry);
+   free(t-hash);
+   free(t);
+}
+
+static int locate_entry(struct dict_table *t, const char *str)
+{
+   int i = 0;
+   const unsigned char *s = (const unsigned char *) str;
+
+   while (*s)
+   i = i * 111 + *s++;
+   i = (unsigned)i % t-hash_size;
+
+   while (t-hash[i]) {
+   unsigned n = t-hash[i] - 1;
+   if (!strcmp(str, t-data + t-entry[n].offset))
+   return n;
+   if (++i = t-hash_size)
+   i = 0;
+   }
+   return -1 - i;
+}
+
+static void rehash_entries(struct dict_table *t)
+{
+   unsigned n;
+
+   t-hash_size *= 2;
+   if (t-hash_size  1024)
+   t-hash_size = 1024;
+   t-hash = xrealloc(t-hash, t-hash_size * sizeof(*t-hash));
+   memset(t-hash, 0, t-hash_size * sizeof(*t-hash));
+
+   for (n = 0; n  t-nb_entries; n++) {
+   int i = locate_entry(t, t-data + t-entry[n].offset);
+   if (i  0)
+   t-hash[-1 - i] = n + 1;
+   }
+}
+
+int dict_add_entry(struct dict_table *t, const char *str)
+{
+   int i, len = strlen(str) + 1;
+
+   if (t-ptr + len = t-size) {
+   t-size = (t-size + len + 1024) * 3 / 2;
+   t-data = xrealloc(t-data, t-size);
+   }
+   memcpy(t-data + t-ptr, str, len);
+
+   i = (t-nb_entries) ? locate_entry(t, t-data + t-ptr) : -1;
+   if (i = 0) {
+   t-entry[i].hits++;
+   return i;
+   }
+
+   if (t-nb_entries = t-max_entries) {
+   t-max_entries = (t-max_entries + 1024) * 3 / 2;
+   t-entry = xrealloc(t-entry, t-max_entries * 
sizeof(*t-entry));
+   }
+   t-entry[t-nb_entries].offset = t-ptr;
+   t-entry[t-nb_entries].hits = 1;
+   t-ptr += len + 1;
+   t-nb_entries++;
+
+   if (t-hash_size * 3 = t-nb_entries * 4)
+   rehash_entries(t);
+   else
+   t-hash[-1 - i] = t-nb_entries;
+
+   return t-nb_entries - 1;
+}
+
+static int cmp_dict_entries(const void *a_, const void *b_)
+{
+   const struct data_entry *a = a_;
+   const struct data_entry *b = b_;
+   int diff = b-hits - a-hits;
+   if (!diff)
+   diff = a-offset - b-offset;
+   return diff;
+}
+
+static void sort_dict_entries_by_hits(struct dict_table *t)
+{
+   qsort(t-entry, t-nb_entries, sizeof(*t-entry), cmp_dict_entries);
+   t-hash_size = (t-nb_entries * 4 / 3) / 2;
+   rehash_entries(t);
+}
+
+void dict_dump(struct dict_table *t)
+{
+   int i;
+
+   sort_dict_entries_by_hits(t);
+   for (i = 0; i  t-nb_entries; i++)
+   printf(%d\t%s\n,
+   t-entry[i].hits,
+   t-data + t-entry[i].offset);
+}
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/38] pack index v3

2013-09-05 Thread Nicolas Pitre

This is a minor change over pack index v2.  Since pack v4 already contains
the sorted SHA1 table, it is therefore ommitted from the index file.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 pack-write.c|  6 +-
 packv4-create.c | 10 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/pack-write.c b/pack-write.c
index ca9e63b..631007e 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -87,6 +87,8 @@ const char *write_idx_file(const char *index_name, struct 
pack_idx_entry **objec
 
/* if last object's offset is = 2^31 we should use index V2 */
index_version = need_large_offset(last_obj_offset, opts) ? 2 : 
opts-version;
+   if (index_version  opts-version)
+   index_version = opts-version;
 
/* index versions 2 and above need a header */
if (index_version = 2) {
@@ -127,7 +129,9 @@ const char *write_idx_file(const char *index_name, struct 
pack_idx_entry **objec
uint32_t offset = htonl(obj-offset);
sha1write(f, offset, 4);
}
-   sha1write(f, obj-sha1, 20);
+   /* Pack v4 (using index v3) carries the SHA1 table already */
+   if (index_version  3)
+   sha1write(f, obj-sha1, 20);
git_SHA1_Update(ctx, obj-sha1, 20);
if ((opts-flags  WRITE_IDX_STRICT) 
(i  !hashcmp(list[-2]-sha1, obj-sha1)))
diff --git a/packv4-create.c b/packv4-create.c
index a9e9002..22cdf8e 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -1014,8 +1014,10 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
struct packed_git *p;
struct sha1file *f;
struct pack_idx_entry *objs, **p_objs;
+   struct pack_idx_option idx_opts;
unsigned i, nr_objects;
off_t written = 0;
+   unsigned char pack_sha1[20];
 
p = open_pack(src_pack);
if (!p)
@@ -1041,11 +1043,17 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
for (i = 0; i  nr_objects; i++) {
off_t obj_pos = written;
struct pack_idx_entry *obj = p_objs[i];
+   crc32_begin(f);
written += packv4_write_object(f, p, obj);
obj-offset = obj_pos;
+   obj-crc32 = crc32_end(f);
}
 
-   sha1close(f, NULL, CSUM_CLOSE | CSUM_FSYNC);
+   sha1close(f, pack_sha1, CSUM_CLOSE | CSUM_FSYNC);
+
+   reset_pack_idx_option(idx_opts);
+   idx_opts.version = 3;
+   write_idx_file(dst_pack, p_objs, nr_objects, idx_opts, pack_sha1);
 }
 
 static int git_pack_config(const char *k, const char *v, void *cb)
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/38] pack v4: honor pack.compression config option

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/packv4-create.c b/packv4-create.c
index c8d3053..45f8427 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -16,6 +16,7 @@
 #include varint.h
 
 
+static int pack_compression_seen;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 
 struct data_entry {
@@ -1047,12 +1048,30 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
sha1close(f, NULL, CSUM_CLOSE | CSUM_FSYNC);
 }
 
+static int git_pack_config(const char *k, const char *v, void *cb)
+{
+   if (!strcmp(k, pack.compression)) {
+   int level = git_config_int(k, v);
+   if (level == -1)
+   level = Z_DEFAULT_COMPRESSION;
+   else if (level  0 || level  Z_BEST_COMPRESSION)
+   die(bad pack compression level %d, level);
+   pack_compression_level = level;
+   pack_compression_seen = 1;
+   return 0;
+   }
+   return git_default_config(k, v, cb);
+}
+
 int main(int argc, char *argv[])
 {
if (argc != 3) {
fprintf(stderr, Usage: %s src_packfile dst_packfile\n, 
argv[0]);
exit(1);
}
+   git_config(git_pack_config, NULL);
+   if (!pack_compression_seen  core_compression_seen)
+   pack_compression_level = core_compression_level;
process_one_pack(argv[1], argv[2]);
if (0)
dict_dump();
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/38] pack v4: relax commit parsing a bit

2013-09-05 Thread Nicolas Pitre

At least commit af25e94d4dcfb9608846242fabdd4e6014e5c9f0 in the Linux
kernel repository has author   1120285620 -0700

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 45f8427..a9e9002 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -158,12 +158,12 @@ static char *get_nameend_and_tz(char *from, int *tz_val)
char *end, *tz;
 
tz = strchr(from, '\n');
-   /* let's assume the smallest possible string to be x x 0 +\n */
-   if (!tz || tz - from  13)
+   /* let's assume the smallest possible string to be   0 +\n */
+   if (!tz || tz - from  11)
return NULL;
tz -= 4;
end = tz - 4;
-   while (end - from  5  *end != ' ')
+   while (end - from  3  *end != ' ')
end--;
if (end[-1] != '' || end[0] != ' ' || tz[-2] != ' ')
return NULL;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/38] pack v4: object headers

2013-09-05 Thread Nicolas Pitre

In pack v4 the object size and type is encoded differently from pack v3.
The object size uses the same efficient variable length number encoding
already used elsewhere.

The object type has 4 bits allocated to it compared to 3 bits in pack v3.
This should be quite sufficient for the foreseeable future, especially
since pack v4 has only one type of delta object instead of two.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/packv4-create.c b/packv4-create.c
index 61b70c8..6098062 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -635,6 +635,33 @@ static unsigned long packv4_write_tables(struct sha1file 
*f, unsigned nr_objects
return written;
 }
 
+static int write_object_header(struct sha1file *f, enum object_type type, 
unsigned long size)
+{
+   unsigned char buf[16];
+   uint64_t val;
+   int len;
+
+   /*
+* We really have only one kind of delta object.
+*/
+   if (type == OBJ_OFS_DELTA)
+   type = OBJ_REF_DELTA;
+
+   /*
+* We allocate 4 bits in the LSB for the object type which should
+* be good for quite a while, given that we effectively encodes
+* only 5 object types: commit, tree, blob, delta, tag.
+*/
+   val = size;
+   if (MSB(val, 4))
+   die(fixme: the code doesn't currently cope with big sizes);
+   val = 4;
+   val |= type;
+   len = encode_varint(val, buf);
+   sha1write(f, buf, len);
+   return len;
+}
+
 static struct packed_git *open_pack(const char *path)
 {
char arg[PATH_MAX];
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/38] pack v4: scan tree objects

2013-09-05 Thread Nicolas Pitre

Let's read a pack to feed our dictionary with all the path strings
contained in all the tree objects.

Dump the resulting dictionary sorted by frequency to stdout.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 Makefile|   1 +
 packv4-create.c | 137 
 2 files changed, 138 insertions(+)

diff --git a/Makefile b/Makefile
index 3588ca1..4716113 100644
--- a/Makefile
+++ b/Makefile
@@ -550,6 +550,7 @@ PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
 PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += remote-testsvn.o
+PROGRAM_OBJS += packv4-create.o
 
 # Binary suffix, set to .exe for Windows builds
 X =
diff --git a/packv4-create.c b/packv4-create.c
index 2de6d41..00762a5 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -9,6 +9,8 @@
  */
 
 #include cache.h
+#include object.h
+#include tree-walk.h
 
 struct data_entry {
unsigned offset;
@@ -125,6 +127,22 @@ static void sort_dict_entries_by_hits(struct dict_table *t)
rehash_entries(t);
 }
 
+static struct dict_table *tree_path_table;
+
+static int add_tree_dict_entries(void *buf, unsigned long size)
+{
+   struct tree_desc desc;
+   struct name_entry name_entry;
+
+   if (!tree_path_table)
+   tree_path_table = create_dict_table();
+
+   init_tree_desc(desc, buf, size);
+   while (tree_entry(desc, name_entry))
+   dict_add_entry(tree_path_table, name_entry.path);
+   return 0;
+}
+
 void dict_dump(struct dict_table *t)
 {
int i;
@@ -135,3 +153,122 @@ void dict_dump(struct dict_table *t)
t-entry[i].hits,
t-data + t-entry[i].offset);
 }
+
+struct idx_entry
+{
+   off_toffset;
+   const unsigned char *sha1;
+};
+
+static int sort_by_offset(const void *e1, const void *e2)
+{
+   const struct idx_entry *entry1 = e1;
+   const struct idx_entry *entry2 = e2;
+   if (entry1-offset  entry2-offset)
+   return -1;
+   if (entry1-offset  entry2-offset)
+   return 1;
+   return 0;
+}
+static int create_pack_dictionaries(struct packed_git *p)
+{
+   uint32_t nr_objects, i;
+   struct idx_entry *objects;
+
+   nr_objects = p-num_objects;
+   objects = xmalloc((nr_objects + 1) * sizeof(*objects));
+   objects[nr_objects].offset = p-index_size - 40;
+   for (i = 0; i  nr_objects; i++) {
+   objects[i].sha1 = nth_packed_object_sha1(p, i);
+   objects[i].offset = nth_packed_object_offset(p, i);
+   }
+   qsort(objects, nr_objects, sizeof(*objects), sort_by_offset);
+
+   for (i = 0; i  nr_objects; i++) {
+   void *data;
+   enum object_type type;
+   unsigned long size;
+   struct object_info oi = {};
+
+   oi.typep = type;
+   oi.sizep = size;
+   if (packed_object_info(p, objects[i].offset, oi)  0)
+   die(cannot get type of %s from %s,
+   sha1_to_hex(objects[i].sha1), p-pack_name);
+
+   switch (type) {
+   case OBJ_TREE:
+   break;
+   default:
+   continue;
+   }
+   data = unpack_entry(p, objects[i].offset, type, size);
+   if (!data)
+   die(cannot unpack %s from %s,
+   sha1_to_hex(objects[i].sha1), p-pack_name);
+   if (check_sha1_signature(objects[i].sha1, data, size, 
typename(type)))
+   die(packed %s from %s is corrupt,
+   sha1_to_hex(objects[i].sha1), p-pack_name);
+   if (add_tree_dict_entries(data, size)  0)
+   die(can't process %s object %s,
+   typename(type), sha1_to_hex(objects[i].sha1));
+   free(data);
+   }
+   free(objects);
+
+   return 0;
+}
+
+static int process_one_pack(const char *path)
+{
+   char arg[PATH_MAX];
+   int len;
+   struct packed_git *p;
+
+   len = strlcpy(arg, path, PATH_MAX);
+   if (len = PATH_MAX)
+   return error(name too long: %s, path);
+
+   /*
+* In addition to foo.idx we accept foo.pack and foo;
+* normalize these forms to foo.idx for add_packed_git().
+*/
+   if (has_extension(arg, .pack)) {
+   strcpy(arg + len - 5, .idx);
+   len--;
+   } else if (!has_extension(arg, .idx)) {
+   if (len + 4 = PATH_MAX)
+   return error(name too long: %s.idx, arg);
+   strcpy(arg + len, .idx);
+   len += 4;
+   }
+
+   /*
+* add_packed_git() uses our buffer (containing foo.idx) to
+* build the pack filename (foo.pack).  Make sure it fits.
+*/
+   if (len + 1 = PATH_MAX) {
+   arg[len - 4] = '\0';
+

[PATCH 16/38] pack v4: object writing

2013-09-05 Thread Nicolas Pitre

This adds the missing code to finally be able to produce a complete
pack file version 4.  We trap commit and tree objects as those have
a completely new encoding.  Other object types are copied almost
unchanged.

As we go the pack index entries are updated  in place to store the new
object offsets once they're written to the destination file.  This will
be needed later for writing the pack index file.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 74 ++---
 1 file changed, 71 insertions(+), 3 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index b0e344f..5d76234 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -722,6 +722,59 @@ static unsigned long copy_object_data(struct sha1file *f, 
struct packed_git *p,
return written;
 }
 
+static off_t packv4_write_object(struct sha1file *f, struct packed_git *p,
+struct pack_idx_entry *obj)
+{
+   void *src, *result;
+   struct object_info oi = {};
+   enum object_type type;
+   unsigned long size;
+   unsigned int hdrlen;
+
+   oi.typep = type;
+   oi.sizep = size;
+   if (packed_object_info(p, obj-offset, oi)  0)
+   die(cannot get type of %s from %s,
+   sha1_to_hex(obj-sha1), p-pack_name);
+
+   /* Some objects are copied without decompression */
+   switch (type) {
+   case OBJ_COMMIT:
+   case OBJ_TREE:
+   break;
+   default:
+   return copy_object_data(f, p, obj-offset);
+   }
+
+   /* The rest is converted into their new format */
+   src = unpack_entry(p, obj-offset, type, size);
+   if (!src)
+   die(cannot unpack %s from %s,
+   sha1_to_hex(obj-sha1), p-pack_name);
+   if (check_sha1_signature(obj-sha1, src, size, typename(type)))
+   die(packed %s from %s is corrupt,
+   sha1_to_hex(obj-sha1), p-pack_name);
+
+   hdrlen = write_object_header(f, type, size);
+   switch (type) {
+   case OBJ_COMMIT:
+   result = pv4_encode_commit(src, size);
+   break;
+   case OBJ_TREE:
+   result = pv4_encode_tree(src, size);
+   break;
+   default:
+   die(unexpected object type %d, type);
+   }
+   free(src);
+   if (!result)
+   die(can't convert %s object %s,
+   typename(type), sha1_to_hex(obj-sha1));
+   sha1write(f, result, size);
+   free(result);
+   return hdrlen + size;
+}
+
 static struct packed_git *open_pack(const char *path)
 {
char arg[PATH_MAX];
@@ -780,7 +833,8 @@ static void process_one_pack(char *src_pack, char *dst_pack)
struct packed_git *p;
struct sha1file *f;
struct pack_idx_entry *objs, **p_objs;
-   unsigned nr_objects;
+   unsigned i, nr_objects;
+   off_t written = 0;
 
p = open_pack(src_pack);
if (!p)
@@ -791,12 +845,26 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
p_objs = sort_objs_by_offset(objs, nr_objects);
 
create_pack_dictionaries(p, p_objs);
+   sort_dict_entries_by_hits(commit_name_table);
+   sort_dict_entries_by_hits(tree_path_table);
 
f = packv4_open(dst_pack);
if (!f)
die(unable to open destination pack);
-   packv4_write_header(f, nr_objects);
-   packv4_write_tables(f, nr_objects, objs);
+   written += packv4_write_header(f, nr_objects);
+   written += packv4_write_tables(f, nr_objects, objs);
+
+   /* Let's write objects out, updating the object index list in place */
+   all_objs = objs;
+   all_objs_nr = nr_objects;
+   for (i = 0; i  nr_objects; i++) {
+   off_t obj_pos = written;
+   struct pack_idx_entry *obj = p_objs[i];
+   written += packv4_write_object(f, p, obj);
+   obj-offset = obj_pos;
+   }
+
+   sha1close(f, NULL, CSUM_CLOSE | CSUM_FSYNC);
 }
 
 int main(int argc, char *argv[])
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/38] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry

2013-09-05 Thread Nicolas Pitre

Let's create a struct pack_idx_entry list with sorted sha1 which will
be useful later.  The offset sorted list is now a separate indirect
list.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 72 +
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 20d97a4..012129b 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -11,6 +11,7 @@
 #include cache.h
 #include object.h
 #include tree-walk.h
+#include pack.h
 
 struct data_entry {
unsigned offset;
@@ -244,46 +245,53 @@ static void dict_dump(void)
dump_dict_table(tree_path_table);
 }
 
-struct idx_entry
+static struct pack_idx_entry *get_packed_object_list(struct packed_git *p)
 {
-   off_toffset;
-   const unsigned char *sha1;
-};
+   unsigned i, nr_objects = p-num_objects;
+   struct pack_idx_entry *objects;
+
+   objects = xmalloc((nr_objects + 1) * sizeof(*objects));
+   objects[nr_objects].offset = p-pack_size - 20;
+   for (i = 0; i  nr_objects; i++) {
+   hashcpy(objects[i].sha1, nth_packed_object_sha1(p, i));
+   objects[i].offset = nth_packed_object_offset(p, i);
+   }
+
+   return objects;
+}
 
 static int sort_by_offset(const void *e1, const void *e2)
 {
-   const struct idx_entry *entry1 = e1;
-   const struct idx_entry *entry2 = e2;
-   if (entry1-offset  entry2-offset)
+   const struct pack_idx_entry * const *entry1 = e1;
+   const struct pack_idx_entry * const *entry2 = e2;
+   if ((*entry1)-offset  (*entry2)-offset)
return -1;
-   if (entry1-offset  entry2-offset)
+   if ((*entry1)-offset  (*entry2)-offset)
return 1;
return 0;
 }
 
-static struct idx_entry *get_packed_object_list(struct packed_git *p)
+static struct pack_idx_entry **sort_objs_by_offset(struct pack_idx_entry *list,
+   unsigned nr_objects)
 {
-   uint32_t nr_objects, i;
-   struct idx_entry *objects;
+   unsigned i;
+   struct pack_idx_entry **sorted;
 
-   nr_objects = p-num_objects;
-   objects = xmalloc((nr_objects + 1) * sizeof(*objects));
-   objects[nr_objects].offset = p-index_size - 40;
-   for (i = 0; i  nr_objects; i++) {
-   objects[i].sha1 = nth_packed_object_sha1(p, i);
-   objects[i].offset = nth_packed_object_offset(p, i);
-   }
-   qsort(objects, nr_objects, sizeof(*objects), sort_by_offset);
+   sorted = xmalloc((nr_objects + 1) * sizeof(*sorted));
+   for (i = 0; i  nr_objects + 1; i++)
+   sorted[i] = list[i];
+   qsort(sorted, nr_objects + 1, sizeof(*sorted), sort_by_offset);
 
-   return objects;
+   return sorted;
 }
 
 static int create_pack_dictionaries(struct packed_git *p,
-   struct idx_entry *objects)
+   struct pack_idx_entry **obj_list)
 {
unsigned int i;
 
for (i = 0; i  p-num_objects; i++) {
+   struct pack_idx_entry *obj = obj_list[i];
void *data;
enum object_type type;
unsigned long size;
@@ -292,9 +300,9 @@ static int create_pack_dictionaries(struct packed_git *p,
 
oi.typep = type;
oi.sizep = size;
-   if (packed_object_info(p, objects[i].offset, oi)  0)
+   if (packed_object_info(p, obj-offset, oi)  0)
die(cannot get type of %s from %s,
-   sha1_to_hex(objects[i].sha1), p-pack_name);
+   sha1_to_hex(obj-sha1), p-pack_name);
 
switch (type) {
case OBJ_COMMIT:
@@ -306,16 +314,16 @@ static int create_pack_dictionaries(struct packed_git *p,
default:
continue;
}
-   data = unpack_entry(p, objects[i].offset, type, size);
+   data = unpack_entry(p, obj-offset, type, size);
if (!data)
die(cannot unpack %s from %s,
-   sha1_to_hex(objects[i].sha1), p-pack_name);
-   if (check_sha1_signature(objects[i].sha1, data, size, 
typename(type)))
+   sha1_to_hex(obj-sha1), p-pack_name);
+   if (check_sha1_signature(obj-sha1, data, size, typename(type)))
die(packed %s from %s is corrupt,
-   sha1_to_hex(objects[i].sha1), p-pack_name);
+   sha1_to_hex(obj-sha1), p-pack_name);
if (add_dict_entries(data, size)  0)
die(can't process %s object %s,
-   typename(type), sha1_to_hex(objects[i].sha1));
+   typename(type), sha1_to_hex(obj-sha1));
free(data);
}
 
@@

[PATCH 04/38] pack v4: add tree entry mode support to dictionary entries

2013-09-05 Thread Nicolas Pitre

Augment dict entries with a 16-bit prefix in order to store the file
mode value of tree entries.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 56 
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 00762a5..eccd9fc 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -14,11 +14,12 @@
 
 struct data_entry {
unsigned offset;
+   unsigned size;
unsigned hits;
 };
 
 struct dict_table {
-   char *data;
+   unsigned char *data;
unsigned ptr;
unsigned size;
struct data_entry *entry;
@@ -41,18 +42,19 @@ void destroy_dict_table(struct dict_table *t)
free(t);
 }
 
-static int locate_entry(struct dict_table *t, const char *str)
+static int locate_entry(struct dict_table *t, const void *data, int size)
 {
-   int i = 0;
-   const unsigned char *s = (const unsigned char *) str;
+   int i = 0, len = size;
+   const unsigned char *p = data;
 
-   while (*s)
-   i = i * 111 + *s++;
+   while (len--)
+   i = i * 111 + *p++;
i = (unsigned)i % t-hash_size;
 
while (t-hash[i]) {
unsigned n = t-hash[i] - 1;
-   if (!strcmp(str, t-data + t-entry[n].offset))
+   if (t-entry[n].size == size 
+   memcmp(t-data + t-entry[n].offset, data, size) == 0)
return n;
if (++i = t-hash_size)
i = 0;
@@ -71,23 +73,28 @@ static void rehash_entries(struct dict_table *t)
memset(t-hash, 0, t-hash_size * sizeof(*t-hash));
 
for (n = 0; n  t-nb_entries; n++) {
-   int i = locate_entry(t, t-data + t-entry[n].offset);
+   int i = locate_entry(t, t-data + t-entry[n].offset,
+   t-entry[n].size);
if (i  0)
t-hash[-1 - i] = n + 1;
}
 }
 
-int dict_add_entry(struct dict_table *t, const char *str)
+int dict_add_entry(struct dict_table *t, int val, const char *str)
 {
-   int i, len = strlen(str) + 1;
+   int i, val_len = 2, str_len = strlen(str) + 1;
 
-   if (t-ptr + len = t-size) {
-   t-size = (t-size + len + 1024) * 3 / 2;
+   if (t-ptr + val_len + str_len  t-size) {
+   t-size = (t-size + val_len + str_len + 1024) * 3 / 2;
t-data = xrealloc(t-data, t-size);
}
-   memcpy(t-data + t-ptr, str, len);
 
-   i = (t-nb_entries) ? locate_entry(t, t-data + t-ptr) : -1;
+   t-data[t-ptr] = val  8;
+   t-data[t-ptr + 1] = val;
+   memcpy(t-data + t-ptr + val_len, str, str_len);
+
+   i = (t-nb_entries) ?
+   locate_entry(t, t-data + t-ptr, val_len + str_len) : -1;
if (i = 0) {
t-entry[i].hits++;
return i;
@@ -98,8 +105,9 @@ int dict_add_entry(struct dict_table *t, const char *str)
t-entry = xrealloc(t-entry, t-max_entries * 
sizeof(*t-entry));
}
t-entry[t-nb_entries].offset = t-ptr;
+   t-entry[t-nb_entries].size = val_len + str_len;
t-entry[t-nb_entries].hits = 1;
-   t-ptr += len + 1;
+   t-ptr += val_len + str_len;
t-nb_entries++;
 
if (t-hash_size * 3 = t-nb_entries * 4)
@@ -139,7 +147,8 @@ static int add_tree_dict_entries(void *buf, unsigned long 
size)
 
init_tree_desc(desc, buf, size);
while (tree_entry(desc, name_entry))
-   dict_add_entry(tree_path_table, name_entry.path);
+   dict_add_entry(tree_path_table, name_entry.mode,
+  name_entry.path);
return 0;
 }
 
@@ -148,10 +157,16 @@ void dict_dump(struct dict_table *t)
int i;
 
sort_dict_entries_by_hits(t);
-   for (i = 0; i  t-nb_entries; i++)
-   printf(%d\t%s\n,
-   t-entry[i].hits,
-   t-data + t-entry[i].offset);
+   for (i = 0; i  t-nb_entries; i++) {
+   int16_t val;
+   uint16_t uval;
+   val = t-data[t-entry[i].offset]  8;
+   val |= t-data[t-entry[i].offset + 1];
+   uval = val;
+   printf(%d\t%d\t%o\t%s\n,
+   t-entry[i].hits, val, uval,
+   t-data + t-entry[i].offset + 2);
+   }
 }
 
 struct idx_entry
@@ -170,6 +185,7 @@ static int sort_by_offset(const void *e1, const void *e2)
return 1;
return 0;
 }
+
 static int create_pack_dictionaries(struct packed_git *p)
 {
uint32_t nr_objects, i;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 26/38] pack v4: object header decode

2013-09-05 Thread Nicolas Pitre

For this we need the pack version.  However only open_packed_git_1() has
been audited for pack v4 so far, hence the version validation is not
added to pack_version_ok() just yet.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 cache.h |  1 +
 sha1_file.c | 14 --
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index c939b60..59d9ba7 100644
--- a/cache.h
+++ b/cache.h
@@ -1025,6 +1025,7 @@ extern struct packed_git {
uint32_t num_objects;
uint32_t num_bad_objects;
unsigned char *bad_object_sha1;
+   int version;
int index_version;
time_t mtime;
int pack_fd;
diff --git a/sha1_file.c b/sha1_file.c
index 5c63781..a298933 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -10,6 +10,7 @@
 #include string-list.h
 #include delta.h
 #include pack.h
+#include varint.h
 #include blob.h
 #include commit.h
 #include run-command.h
@@ -845,10 +846,11 @@ static int open_packed_git_1(struct packed_git *p)
return error(file %s is far too short to be a packfile, 
p-pack_name);
if (hdr.hdr_signature != htonl(PACK_SIGNATURE))
return error(file %s is not a GIT packfile, p-pack_name);
-   if (!pack_version_ok(hdr.hdr_version))
+   if (!pack_version_ok(hdr.hdr_version)  hdr.hdr_version != htonl(4))
return error(packfile %s is version %PRIu32 and not
 supported (try upgrading GIT to a newer version),
p-pack_name, ntohl(hdr.hdr_version));
+   p-version = ntohl(hdr.hdr_version);
 
/* Verify the pack matches its index. */
if (p-num_objects != ntohl(hdr.hdr_entries))
@@ -1725,7 +1727,15 @@ int unpack_object_header(struct packed_git *p,
 * insane, so we know won't exceed what we have been given.
 */
base = use_pack(p, w_curs, *curpos, left);
-   used = unpack_object_header_buffer(base, left, type, sizep);
+   if (p-version  4) {
+   used = unpack_object_header_buffer(base, left, type, sizep);
+   } else {
+   const unsigned char *cp = base;
+   uintmax_t val = decode_varint(cp);
+   used = cp - base;
+   type = val  0xf;
+   *sizep = val  4;
+   }
if (!used) {
type = OBJ_BAD;
} else
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/38] packv4-create: normalize pack name to properly generate the pack index file name

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 73 +++--
 1 file changed, 34 insertions(+), 39 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 22cdf8e..c23c791 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -956,56 +956,46 @@ static off_t packv4_write_object(struct sha1file *f, 
struct packed_git *p,
return hdrlen + size;
 }
 
-static struct packed_git *open_pack(const char *path)
+static char *normalize_pack_name(const char *path)
 {
-   char arg[PATH_MAX];
+   char buf[PATH_MAX];
int len;
-   struct packed_git *p;
 
-   len = strlcpy(arg, path, PATH_MAX);
-   if (len = PATH_MAX) {
-   error(name too long: %s, path);
-   return NULL;
-   }
+   len = strlcpy(buf, path, PATH_MAX);
+   if (len = PATH_MAX - 6)
+   die(name too long: %s, path);
 
/*
 * In addition to foo.idx we accept foo.pack and foo;
-* normalize these forms to foo.idx for add_packed_git().
+* normalize these forms to foo.pack.
 */
-   if (has_extension(arg, .pack)) {
-   strcpy(arg + len - 5, .idx);
-   len--;
-   } else if (!has_extension(arg, .idx)) {
-   if (len + 4 = PATH_MAX) {
-   error(name too long: %s.idx, arg);
-   return NULL;
-   }
-   strcpy(arg + len, .idx);
-   len += 4;
+   if (has_extension(buf, .idx)) {
+   strcpy(buf + len - 4, .pack);
+   len++;
+   } else if (!has_extension(buf, .pack)) {
+   strcpy(buf + len, .pack);
+   len += 5;
}
 
-   /*
-* add_packed_git() uses our buffer (containing foo.idx) to
-* build the pack filename (foo.pack).  Make sure it fits.
-*/
-   if (len + 1 = PATH_MAX) {
-   arg[len - 4] = '\0';
-   error(name too long: %s.pack, arg);
-   return NULL;
-   }
+   return xstrdup(buf);
+}
 
-   p = add_packed_git(arg, len, 1);
-   if (!p) {
-   error(packfile %s not found., arg);
-   return NULL;
-   }
+static struct packed_git *open_pack(const char *path)
+{
+   char *packname = normalize_pack_name(path);
+   int len = strlen(packname);
+   struct packed_git *p;
+
+   strcpy(packname + len - 5, .idx);
+   p = add_packed_git(packname, len - 1, 1);
+   if (!p)
+   die(packfile %s not found., packname);
 
install_packed_git(p);
-   if (open_pack_index(p)) {
-   error(packfile %s index not opened, p-pack_name);
-   return NULL;
-   }
+   if (open_pack_index(p))
+   die(packfile %s index not opened, p-pack_name);
 
+   free(packname);
return p;
 }
 
@@ -1017,6 +1007,7 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
struct pack_idx_option idx_opts;
unsigned i, nr_objects;
off_t written = 0;
+   char *packname;
unsigned char pack_sha1[20];
 
p = open_pack(src_pack);
@@ -1031,7 +1022,8 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
sort_dict_entries_by_hits(commit_name_table);
sort_dict_entries_by_hits(tree_path_table);
 
-   f = packv4_open(dst_pack);
+   packname = normalize_pack_name(dst_pack);
+   f = packv4_open(packname);
if (!f)
die(unable to open destination pack);
written += packv4_write_header(f, nr_objects);
@@ -1053,7 +1045,10 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
 
reset_pack_idx_option(idx_opts);
idx_opts.version = 3;
-   write_idx_file(dst_pack, p_objs, nr_objects, idx_opts, pack_sha1);
+   strcpy(packname + strlen(packname) - 5, .idx);
+   write_idx_file(packname, p_objs, nr_objects, idx_opts, pack_sha1);
+
+   free(packname);
 }
 
 static int git_pack_config(const char *k, const char *v, void *cb)
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/38] pack v4: basic SHA1 reference encoding

2013-09-05 Thread Nicolas Pitre

The SHA1 reference is either an index into a SHA1 table using the variable
length number encoding, or the literal 20 bytes SHA1 prefixed with a 0.

The index 0 discriminates between an actual index value or the literal
SHA1.  Therefore when the index is used its value must be increased by 1.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/packv4-create.c b/packv4-create.c
index 012129b..12527c0 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -12,6 +12,7 @@
 #include object.h
 #include tree-walk.h
 #include pack.h
+#include varint.h
 
 struct data_entry {
unsigned offset;
@@ -245,6 +246,34 @@ static void dict_dump(void)
dump_dict_table(tree_path_table);
 }
 
+/*
+ * Encode an object SHA1 reference with either an object index into the
+ * pack SHA1 table incremented by 1, or the literal SHA1 value prefixed
+ * with a zero byte if the needed SHA1 is not available in the table.
+ */
+static struct pack_idx_entry *all_objs;
+static unsigned all_objs_nr;
+static int encode_sha1ref(const unsigned char *sha1, unsigned char *buf)
+{
+   unsigned lo = 0, hi = all_objs_nr;
+
+   do {
+   unsigned mi = (lo + hi) / 2;
+   int cmp = hashcmp(all_objs[mi].sha1, sha1);
+
+   if (cmp == 0)
+   return encode_varint(mi + 1, buf);
+   if (cmp  0)
+   hi = mi;
+   else
+   lo = mi+1;
+   } while (lo  hi);
+
+   *buf++ = 0;
+   hashcpy(buf, sha1);
+   return 1 + 20;
+}
+
 static struct pack_idx_entry *get_packed_object_list(struct packed_git *p)
 {
unsigned i, nr_objects = p-num_objects;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/38] pack v4: tree object delta encoding

2013-09-05 Thread Nicolas Pitre

In order to be able to quickly walk tree objects, let's encode their
delta as a range of entries into another tree object.

In order to discriminate between a copy sequence from a regular entry,
the entry index LSB is reserved to indicate a copy sequence.  Therefore
the actual index of a path component is shifted left one bit.

The encoding allows for the base object to change so multiple base
objects can be borrowed from.  The code doesn't try to exploit this
possibility at the moment though.

The code isn't optimal at the moment as it doesn't consider the case
where a copy sequence could be larger than the local sequence it
means to replace.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 108 +---
 1 file changed, 103 insertions(+), 5 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 5d76234..6830a0a 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -394,24 +394,53 @@ bad:
return NULL;
 }
 
+static int compare_tree_entries(struct name_entry *e1, struct name_entry *e2)
+{
+   int len1 = tree_entry_len(e1);
+   int len2 = tree_entry_len(e2);
+   int len = len1  len2 ? len1 : len2;
+   unsigned char c1, c2;
+   int cmp;
+
+   cmp = memcmp(e1-path, e2-path, len);
+   if (cmp)
+   return cmp;
+   c1 = e1-path[len];
+   c2 = e2-path[len];
+   if (!c1  S_ISDIR(e1-mode))
+   c1 = '/';
+   if (!c2  S_ISDIR(e2-mode))
+   c2 = '/';
+   return c1 - c2;
+}
+
 /*
  * This converts a canonical tree object buffer into its
  * tightly packed representation using the already populated
  * and sorted tree_path_table dictionary.  The parsing is
  * strict so to ensure the canonical version may always be
  * regenerated and produce the same hash.
+ *
+ * If a delta buffer is provided, we may encode multiple ranges of tree
+ * entries against that buffer.
  */
-void *pv4_encode_tree(void *_buffer, unsigned long *sizep)
+void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
+ void *delta, unsigned long delta_size,
+ const unsigned char *delta_sha1)
 {
unsigned long size = *sizep;
unsigned char *in, *out, *end, *buffer = _buffer;
-   struct tree_desc desc;
-   struct name_entry name_entry;
+   struct tree_desc desc, delta_desc;
+   struct name_entry name_entry, delta_entry;
int nb_entries;
+   unsigned int copy_start, copy_count = 0, delta_pos = 0, first_delta = 1;
 
if (!size)
return NULL;
 
+   if (!delta_size)
+   delta = NULL;
+
/*
 * We can't make sure the result will always be smaller than the
 * input. The smallest possible entry is 0 x\040 byte SHA1
@@ -434,9 +463,42 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep)
out += encode_varint(nb_entries, out);
 
init_tree_desc(desc, in, size);
+   if (delta) {
+   init_tree_desc(delta_desc, delta, delta_size);
+   if (!tree_entry(delta_desc, delta_entry))
+   delta = NULL;
+   }
+
while (tree_entry(desc, name_entry)) {
int pathlen, index;
 
+   /*
+* Try to match entries against our delta object.
+*/
+   if (delta) {
+   int ret;
+
+   do {
+   ret = compare_tree_entries(name_entry, 
delta_entry);
+   if (ret = 0 || copy_count != 0)
+   break;
+   delta_pos++;
+   if (!tree_entry(delta_desc, delta_entry))
+   delta = NULL;
+   } while (delta);
+
+   if (ret == 0  name_entry.mode == delta_entry.mode 
+   hashcmp(name_entry.sha1, delta_entry.sha1) == 0) {
+   if (!copy_count)
+   copy_start = delta_pos;
+   copy_count++;
+   delta_pos++;
+   if (!tree_entry(delta_desc, delta_entry))
+   delta = NULL;
+   continue;
+   }
+   }
+
if (end - out  48) {
unsigned long sofar = out - buffer;
buffer = xrealloc(buffer, (sofar + 48)*2);
@@ -444,6 +506,32 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep)
out = buffer + sofar;
}
 
+   if (copy_count) {
+   /*
+* Let's write a sequence indicating we're copying
+* entries from another object:
+*
+

[PATCH 13/38] pack v4: creation code

2013-09-05 Thread Nicolas Pitre

Let's actually open the destination pack file and write the header and
the tables.

The header isn't much different from pack v3, except for the pack version
number of course.

The first table is the sorted SHA1 table normally found in the pack index
file.  With pack v4 we write this table in the main pack file instead as
it is index referenced by subsequent objects in the pack.  Doing so has
many advantages:

- The SHA1 references used to be duplicated on disk: once in the pack
  index file, and then at least once or more within commit and tree
  objects referencing them.  The only SHA1 which is not being listed more
  than once this way is the one for a branch tip commit object and those
  are normally very few.  Now all that SHA1 data is represented only once.

- The SHA1 references found in commit and tree objects can be obtained
  on disk directly without having to deflate those objects first.

The SHA1 table size is obtained by multiplying the number of objects by 20.

And then the commit and path dictionary tables are written right after
the SHA1 table.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 60 -
 1 file changed, 55 insertions(+), 5 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 92d3662..61b70c8 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -593,6 +593,48 @@ static unsigned long write_dict_table(struct sha1file *f, 
struct dict_table *t)
return hdrlen + datalen;
 }
 
+static struct sha1file * packv4_open(char *path)
+{
+   int fd;
+
+   fd = open(path, O_CREAT|O_EXCL|O_WRONLY, 0600);
+   if (fd  0)
+   die_errno(unable to create '%s', path);
+   return sha1fd(fd, path);
+}
+
+static unsigned int packv4_write_header(struct sha1file *f, unsigned 
nr_objects)
+{
+   struct pack_header hdr;
+
+   hdr.hdr_signature = htonl(PACK_SIGNATURE);
+   hdr.hdr_version = htonl(4);
+   hdr.hdr_entries = htonl(nr_objects);
+   sha1write(f, hdr, sizeof(hdr));
+
+   return sizeof(hdr);
+}
+
+static unsigned long packv4_write_tables(struct sha1file *f, unsigned 
nr_objects,
+struct pack_idx_entry *objs)
+{
+   unsigned i;
+   unsigned long written = 0;
+
+   /* The sorted list of object SHA1's is always first */
+   for (i = 0; i  nr_objects; i++)
+   sha1write(f, objs[i].sha1, 20);
+   written = 20 * nr_objects;
+
+   /* Then the commit dictionary table */
+   written += write_dict_table(f, commit_name_table);
+
+   /* Followed by the path component dictionary table */
+   written += write_dict_table(f, tree_path_table);
+
+   return written;
+}
+
 static struct packed_git *open_pack(const char *path)
 {
char arg[PATH_MAX];
@@ -646,9 +688,10 @@ static struct packed_git *open_pack(const char *path)
return p;
 }
 
-static void process_one_pack(char *src_pack)
+static void process_one_pack(char *src_pack, char *dst_pack)
 {
struct packed_git *p;
+   struct sha1file *f;
struct pack_idx_entry *objs, **p_objs;
unsigned nr_objects;
 
@@ -661,15 +704,22 @@ static void process_one_pack(char *src_pack)
p_objs = sort_objs_by_offset(objs, nr_objects);
 
create_pack_dictionaries(p, p_objs);
+
+   f = packv4_open(dst_pack);
+   if (!f)
+   die(unable to open destination pack);
+   packv4_write_header(f, nr_objects);
+   packv4_write_tables(f, nr_objects, objs);
 }
 
 int main(int argc, char *argv[])
 {
-   if (argc != 2) {
-   fprintf(stderr, Usage: %s packfile\n, argv[0]);
+   if (argc != 3) {
+   fprintf(stderr, Usage: %s src_packfile dst_packfile\n, 
argv[0]);
exit(1);
}
-   process_one_pack(argv[1]);
-   dict_dump();
+   process_one_pack(argv[1], argv[2]);
+   if (0)
+   dict_dump();
return 0;
 }
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/38] export packed_object_info()

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 cache.h | 1 +
 sha1_file.c | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 85b544f..b6634c4 100644
--- a/cache.h
+++ b/cache.h
@@ -1160,6 +1160,7 @@ struct object_info {
} u;
 };
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*);
+extern int packed_object_info(struct packed_git *, off_t, struct object_info 
*);
 
 /* Dumb servers support */
 extern int update_server_info(int);
diff --git a/sha1_file.c b/sha1_file.c
index 8e27db1..c2020d0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1782,8 +1782,8 @@ unwind:
goto out;
 }
 
-static int packed_object_info(struct packed_git *p, off_t obj_offset,
- struct object_info *oi)
+int packed_object_info(struct packed_git *p, off_t obj_offset,
+  struct object_info *oi)
 {
struct pack_window *w_curs = NULL;
unsigned long size;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/38] introduce get_sha1_lowhex()

2013-09-05 Thread Nicolas Pitre

This is like get_sha1_hex() but stricter in accepting lowercase letters
only.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 cache.h |  3 +++
 hex.c   | 11 +++
 2 files changed, 14 insertions(+)

diff --git a/cache.h b/cache.h
index b6634c4..4231dfa 100644
--- a/cache.h
+++ b/cache.h
@@ -850,8 +850,11 @@ extern int for_each_abbrev(const char *prefix, 
each_abbrev_fn, void *);
  * Return 0 on success.  Reading stops if a NUL is encountered in the
  * input, so it is safe to pass this function an arbitrary
  * null-terminated string.
+ *
+ * The low version accepts numbers and lowercase letters only.
  */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
+extern int get_sha1_lowhex(const char *hex, unsigned char *sha1);
 
 extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
 extern int read_ref_full(const char *refname, unsigned char *sha1,
diff --git a/hex.c b/hex.c
index 9ebc050..1d7eae1 100644
--- a/hex.c
+++ b/hex.c
@@ -56,6 +56,17 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
return 0;
 }
 
+int get_sha1_lowhex(const char *hex, unsigned char *sha1)
+{
+   int i;
+
+   /* uppercase letters (as well as '\0') have bit 5 clear */
+   for (i = 0; i  20; i++)
+   if (!(hex[i]  0x20))
+   return -1;
+   return get_sha1_hex(hex, sha1);
+}
+
 char *sha1_to_hex(const unsigned char *sha1)
 {
static int bufno;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 27/38] pack v4: code to obtain a SHA1 from a sha1ref

2013-09-05 Thread Nicolas Pitre

Let's start actually parsing pack v4 data.  Here's the first item.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 Makefile   |  1 +
 packv4-parse.c | 30 ++
 2 files changed, 31 insertions(+)
 create mode 100644 packv4-parse.c

diff --git a/Makefile b/Makefile
index 4716113..ba6cafc 100644
--- a/Makefile
+++ b/Makefile
@@ -838,6 +838,7 @@ LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
+LIB_OBJS += packv4-parse.o
 LIB_OBJS += pager.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += parse-options-cb.o
diff --git a/packv4-parse.c b/packv4-parse.c
new file mode 100644
index 000..299fc48
--- /dev/null
+++ b/packv4-parse.c
@@ -0,0 +1,30 @@
+/*
+ * Code to parse pack v4 object encoding
+ *
+ * (C) Nicolas Pitre n...@fluxnic.net
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include cache.h
+#include varint.h
+
+const unsigned char *get_sha1ref(struct packed_git *p,
+const unsigned char **bufp)
+{
+   const unsigned char *sha1;
+
+   if (!**bufp) {
+   sha1 = *bufp + 1;
+   *bufp += 21;
+   } else {
+   unsigned int index = decode_varint(bufp);
+   if (index  1 || index - 1  p-num_objects)
+   die(bad index in %s, __func__);
+   sha1 = p-sha1_table + (index - 1) * 20;
+   }
+
+   return sha1;
+}
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/38] pack v4: load delta candidate for encoding tree objects

2013-09-05 Thread Nicolas Pitre

The SHA1 of the base object is retrieved and the corresponding object
is loaded in memory for pv4_encode_tree() to look at.  Simple but
effective.  Obviously this relies on the delta matching already performed
during the pack v3 delta search.  Some native delta search for pack v4
could be investigated eventually.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 63 ++---
 1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 6830a0a..15c5959 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -820,18 +820,56 @@ static unsigned long copy_object_data(struct sha1file *f, 
struct packed_git *p,
return written;
 }
 
+static unsigned char *get_delta_base(struct packed_git *p, off_t offset,
+unsigned char *sha1_buf)
+{
+   struct pack_window *w_curs = NULL;
+   enum object_type type;
+   unsigned long avail, size;
+   int hdrlen;
+   unsigned char *src;
+   const unsigned char *base_sha1 = NULL; ;
+
+   src = use_pack(p, w_curs, offset, avail);
+   hdrlen = unpack_object_header_buffer(src, avail, type, size);
+
+   if (type == OBJ_OFS_DELTA) {
+   const unsigned char *cp = src + hdrlen;
+   off_t base_offset = decode_varint(cp);
+   base_offset = offset - base_offset;
+   if (base_offset = 0 || base_offset = offset) {
+   error(delta offset out of bound);
+   } else {
+   struct revindex_entry *revidx;
+   revidx = find_pack_revindex(p, base_offset);
+   base_sha1 = nth_packed_object_sha1(p, revidx-nr);
+   }
+   } else if (type == OBJ_REF_DELTA) {
+   base_sha1 = src + hdrlen;
+   } else
+   error(expected to get a delta but got a %s, typename(type));
+
+   unuse_pack(w_curs);
+
+   if (!base_sha1)
+   return NULL;
+   hashcpy(sha1_buf, base_sha1);
+   return sha1_buf;
+}
+
 static off_t packv4_write_object(struct sha1file *f, struct packed_git *p,
 struct pack_idx_entry *obj)
 {
void *src, *result;
struct object_info oi = {};
-   enum object_type type;
+   enum object_type type, packed_type;
unsigned long size;
unsigned int hdrlen;
 
oi.typep = type;
oi.sizep = size;
-   if (packed_object_info(p, obj-offset, oi)  0)
+   packed_type = packed_object_info(p, obj-offset, oi);
+   if (packed_type  0)
die(cannot get type of %s from %s,
sha1_to_hex(obj-sha1), p-pack_name);
 
@@ -859,7 +897,26 @@ static off_t packv4_write_object(struct sha1file *f, 
struct packed_git *p,
result = pv4_encode_commit(src, size);
break;
case OBJ_TREE:
-   result = pv4_encode_tree(src, size, NULL, 0, NULL);
+   if (packed_type != OBJ_TREE) {
+   unsigned char sha1_buf[20], *ref_sha1;
+   void *ref;
+   enum object_type ref_type;
+   unsigned long ref_size;
+
+   ref_sha1 = get_delta_base(p, obj-offset, sha1_buf);
+   if (!ref_sha1)
+   die(unable to get delta base sha1 for %s,
+   sha1_to_hex(obj-sha1));
+   ref = read_sha1_file(ref_sha1, ref_type, ref_size);
+   if (!ref || ref_type != OBJ_TREE)
+   die(cannot obtain delta base for %s,
+   sha1_to_hex(obj-sha1));
+   result = pv4_encode_tree(src, size,
+ref, ref_size, ref_sha1);
+   free(ref);
+   } else {
+   result = pv4_encode_tree(src, size, NULL, 0, NULL);
+   }
break;
default:
die(unexpected object type %d, type);
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/38] pack v4: add commit object parsing

2013-09-05 Thread Nicolas Pitre

Let's create another dictionary table to hold the author and committer
entries.  We use the same table format used for tree entries where the
16 bit data prefix is conveniently used to store the timezone value.

In order to copy straight from a commit object buffer, dict_add_entry()
is modified to get the string length as the provided string pointer is
not always be null terminated.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 98 +++--
 1 file changed, 89 insertions(+), 9 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index eccd9fc..5c08871 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -1,5 +1,5 @@
 /*
- * packv4-create.c: management of dictionary tables used in pack v4
+ * packv4-create.c: creation of dictionary tables and objects used in pack v4
  *
  * (C) Nicolas Pitre n...@fluxnic.net
  *
@@ -80,9 +80,9 @@ static void rehash_entries(struct dict_table *t)
}
 }
 
-int dict_add_entry(struct dict_table *t, int val, const char *str)
+int dict_add_entry(struct dict_table *t, int val, const char *str, int str_len)
 {
-   int i, val_len = 2, str_len = strlen(str) + 1;
+   int i, val_len = 2;
 
if (t-ptr + val_len + str_len  t-size) {
t-size = (t-size + val_len + str_len + 1024) * 3 / 2;
@@ -92,6 +92,7 @@ int dict_add_entry(struct dict_table *t, int val, const char 
*str)
t-data[t-ptr] = val  8;
t-data[t-ptr + 1] = val;
memcpy(t-data + t-ptr + val_len, str, str_len);
+   t-data[t-ptr + val_len + str_len] = 0;
 
i = (t-nb_entries) ?
locate_entry(t, t-data + t-ptr, val_len + str_len) : -1;
@@ -107,7 +108,7 @@ int dict_add_entry(struct dict_table *t, int val, const 
char *str)
t-entry[t-nb_entries].offset = t-ptr;
t-entry[t-nb_entries].size = val_len + str_len;
t-entry[t-nb_entries].hits = 1;
-   t-ptr += val_len + str_len;
+   t-ptr += val_len + str_len + 1;
t-nb_entries++;
 
if (t-hash_size * 3 = t-nb_entries * 4)
@@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct dict_table *t)
rehash_entries(t);
 }
 
+static struct dict_table *commit_name_table;
 static struct dict_table *tree_path_table;
 
+/*
+ * Parse the author/committer line from a canonical commit object.
+ * The 'from' argument points right after the author  or committer 
+ * string.  The time zone is parsed and stored in *tz_val.  The returned
+ * pointer is right after the end of the email address which is also just
+ * before the time value, or NULL if a parsing error is encountered.
+ */
+static char *get_nameend_and_tz(char *from, int *tz_val)
+{
+   char *end, *tz;
+
+   tz = strchr(from, '\n');
+   /* let's assume the smallest possible string to be x x 0 +\n */
+   if (!tz || tz - from  13)
+   return NULL;
+   tz -= 4;
+   end = tz - 4;
+   while (end - from  5  *end != ' ')
+   end--;
+   if (end[-1] != '' || end[0] != ' ' || tz[-2] != ' ')
+   return NULL;
+   *tz_val = (tz[0] - '0') * 1000 +
+ (tz[1] - '0') * 100 +
+ (tz[2] - '0') * 10 +
+ (tz[3] - '0');
+   switch (tz[-1]) {
+   default:return NULL;
+   case '+':   break;
+   case '-':   *tz_val = -*tz_val;
+   }
+   return end;
+}
+
+static int add_commit_dict_entries(void *buf, unsigned long size)
+{
+   char *name, *end = NULL;
+   int tz_val;
+
+   if (!commit_name_table)
+   commit_name_table = create_dict_table();
+
+   /* parse and add author info */
+   name = strstr(buf, \nauthor );
+   if (name) {
+   name += 8;
+   end = get_nameend_and_tz(name, tz_val);
+   }
+   if (!name || !end)
+   return -1;
+   dict_add_entry(commit_name_table, tz_val, name, end - name);
+
+   /* parse and add committer info */
+   name = strstr(end, \ncommitter );
+   if (name) {
+  name += 11;
+  end = get_nameend_and_tz(name, tz_val);
+   }
+   if (!name || !end)
+   return -1;
+   dict_add_entry(commit_name_table, tz_val, name, end - name);
+
+   return 0;
+}
+
 static int add_tree_dict_entries(void *buf, unsigned long size)
 {
struct tree_desc desc;
@@ -146,13 +212,16 @@ static int add_tree_dict_entries(void *buf, unsigned long 
size)
tree_path_table = create_dict_table();
 
init_tree_desc(desc, buf, size);
-   while (tree_entry(desc, name_entry))
+   while (tree_entry(desc, name_entry)) {
+   int pathlen = tree_entry_len(name_entry);
dict_add_entry(tree_path_table, name_entry.mode,
-  name_entry.path);
+   name_entry.path, pathlen);
+   }
+
return 0;
 }
 
-void dict_dump(struct dict_table *t)
+void

[PATCH 15/38] pack v4: object data copy

2013-09-05 Thread Nicolas Pitre

Blob and tag objects have no particular changes except for their object
header.

Delta objects are also copied as is, except for their delta base reference
which is converted to the new way as used elsewhere in pack v4 encoding
i.e. an index into the SHA1 table or a literal SHA1 prefixed by 0 if not
found in the table (see encode_sha1ref).  This is true for both REF_DELTA
as well as OFS_DELTA.

Object payload is validated against the recorded CRC32 in the source
pack index file when possible before being copied.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 60 +
 1 file changed, 60 insertions(+)

diff --git a/packv4-create.c b/packv4-create.c
index 6098062..b0e344f 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -12,6 +12,7 @@
 #include object.h
 #include tree-walk.h
 #include pack.h
+#include pack-revindex.h
 #include varint.h
 
 
@@ -662,6 +663,65 @@ static int write_object_header(struct sha1file *f, enum 
object_type type, unsign
return len;
 }
 
+static unsigned long copy_object_data(struct sha1file *f, struct packed_git *p,
+ off_t offset)
+{
+   struct pack_window *w_curs = NULL;
+   struct revindex_entry *revidx;
+   enum object_type type;
+   unsigned long avail, size, datalen, written;
+   int hdrlen, reflen, idx_nr;
+   unsigned char *src, buf[24];
+
+   revidx = find_pack_revindex(p, offset);
+   idx_nr = revidx-nr;
+   datalen = revidx[1].offset - offset;
+
+   src = use_pack(p, w_curs, offset, avail);
+   hdrlen = unpack_object_header_buffer(src, avail, type, size);
+
+   written = write_object_header(f, type, size);
+
+   if (type == OBJ_OFS_DELTA) {
+   const unsigned char *cp = src + hdrlen;
+   off_t base_offset = decode_varint(cp);
+   hdrlen = cp - src;
+   base_offset = offset - base_offset;
+   if (base_offset = 0 || base_offset = offset)
+   die(delta offset out of bound);
+   revidx = find_pack_revindex(p, base_offset);
+   reflen = encode_sha1ref(nth_packed_object_sha1(p, revidx-nr), 
buf);
+   sha1write(f, buf, reflen);
+   written += reflen;
+   } else if (type == OBJ_REF_DELTA) {
+   reflen = encode_sha1ref(src + hdrlen, buf);
+   hdrlen += 20;
+   sha1write(f, buf, reflen);
+   written += reflen;
+   }
+
+   if (p-index_version  1 
+   check_pack_crc(p, w_curs, offset, datalen, idx_nr))
+   die(bad CRC for object at offset %PRIuMAX in %s,
+   (uintmax_t)offset, p-pack_name);
+
+   offset += hdrlen;
+   datalen -= hdrlen;
+
+   while (datalen) {
+   src = use_pack(p, w_curs, offset, avail);
+   if (avail  datalen)
+   avail = datalen;
+   sha1write(f, src, avail);
+   written += avail;
+   offset += avail;
+   datalen -= avail;
+   }
+   unuse_pack(w_curs);
+
+   return written;
+}
+
 static struct packed_git *open_pack(const char *path)
 {
char arg[PATH_MAX];
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 25/38] pack v4: initial pack index v3 support on the read side

2013-09-05 Thread Nicolas Pitre

A bit crud but good enough for now.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 cache.h |  1 +
 pack-check.c|  4 +++-
 pack-revindex.c |  7 ---
 sha1_file.c | 56 ++--
 4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 4231dfa..c939b60 100644
--- a/cache.h
+++ b/cache.h
@@ -1021,6 +1021,7 @@ extern struct packed_git {
off_t pack_size;
const void *index_data;
size_t index_size;
+   const unsigned char *sha1_table;
uint32_t num_objects;
uint32_t num_bad_objects;
unsigned char *bad_object_sha1;
diff --git a/pack-check.c b/pack-check.c
index 63a595c..8200f24 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -25,6 +25,7 @@ int check_pack_crc(struct packed_git *p, struct pack_window 
**w_curs,
 {
const uint32_t *index_crc;
uint32_t data_crc = crc32(0, NULL, 0);
+   unsigned sha1_table;
 
do {
unsigned long avail;
@@ -36,8 +37,9 @@ int check_pack_crc(struct packed_git *p, struct pack_window 
**w_curs,
len -= avail;
} while (len);
 
+   sha1_table = p-index_version  3 ? (p-num_objects * (20/4)) : 0;
index_crc = p-index_data;
-   index_crc += 2 + 256 + p-num_objects * (20/4) + nr;
+   index_crc += 2 + 256 + sha1_table + nr;
 
return data_crc != ntohl(*index_crc);
 }
diff --git a/pack-revindex.c b/pack-revindex.c
index b4d2b35..739a568 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -170,9 +170,10 @@ static void create_pack_revindex(struct pack_revindex *rix)
index += 4 * 256;
 
if (p-index_version  1) {
-   const uint32_t *off_32 =
-   (uint32_t *)(index + 8 + p-num_objects * (20 + 4));
-   const uint32_t *off_64 = off_32 + p-num_objects;
+   const uint32_t *off_32, *off_64;
+   unsigned sha1 = p-index_version  3 ? 20 : 0;
+   off_32 = (uint32_t *)(index + 8 + p-num_objects * (sha1 + 4));
+   off_64 = off_32 + p-num_objects;
for (i = 0; i  num_ent; i++) {
uint32_t off = ntohl(*off_32++);
if (!(off  0x8000)) {
diff --git a/sha1_file.c b/sha1_file.c
index c2020d0..5c63781 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -504,7 +504,7 @@ static int check_packed_git_idx(const char *path,  struct 
packed_git *p)
hdr = idx_map;
if (hdr-idx_signature == htonl(PACK_IDX_SIGNATURE)) {
version = ntohl(hdr-idx_version);
-   if (version  2 || version  2) {
+   if (version  2 || version  3) {
munmap(idx_map, idx_size);
return error(index file %s is version %PRIu32
  and is not supported by this binary
@@ -539,12 +539,13 @@ static int check_packed_git_idx(const char *path,  struct 
packed_git *p)
munmap(idx_map, idx_size);
return error(wrong index v1 file size in %s, path);
}
-   } else if (version == 2) {
+   } else if (version == 2 || version == 3) {
+   unsigned long min_size, max_size;
/*
 * Minimum size:
 *  - 8 bytes of header
 *  - 256 index entries 4 bytes each
-*  - 20-byte sha1 entry * nr
+*  - 20-byte sha1 entry * nr (version 2 only)
 *  - 4-byte crc entry * nr
 *  - 4-byte offset entry * nr
 *  - 20-byte SHA1 of the packfile
@@ -553,8 +554,10 @@ static int check_packed_git_idx(const char *path,  struct 
packed_git *p)
 * variable sized table containing 8-byte entries
 * for offsets larger than 2^31.
 */
-   unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
-   unsigned long max_size = min_size;
+   min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
+   if (version != 2)
+   min_size -= nr*20;
+   max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
if (idx_size  min_size || idx_size  max_size) {
@@ -573,6 +576,36 @@ static int check_packed_git_idx(const char *path,  struct 
packed_git *p)
}
}
 
+   if (version = 3) {
+   /* the SHA1 table is located in the main pack file */
+   void *pack_map;
+   struct pack_header *pack_hdr;
+
+   fd = git_open_noatime(p-pack_name);
+   if (fd  0) {
+   munmap(idx_map, idx_size);
+   return error(unable to open %s, p-pack_name);
+   }
+   if (fstat(fd, st) != 0 || xsize_t(st.st_size)  12 + nr*20) {
+   close(fd);
+

[PATCH 19/38] packv4-create: optimize delta encoding

2013-09-05 Thread Nicolas Pitre

Make sure the copy sequence is smaller than the list of tree entries it
is meant to replace.  We do so by encoding tree entries in parallel with
the delta entry comparison, and then comparing the length of both
sequences.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 65 +++--
 1 file changed, 45 insertions(+), 20 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 15c5959..c8d3053 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -433,7 +433,8 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
struct tree_desc desc, delta_desc;
struct name_entry name_entry, delta_entry;
int nb_entries;
-   unsigned int copy_start, copy_count = 0, delta_pos = 0, first_delta = 1;
+   unsigned int copy_start = 0, copy_count = 0, copy_pos = 0, copy_end = 0;
+   unsigned int delta_pos = 0, first_delta = 1;
 
if (!size)
return NULL;
@@ -489,24 +490,23 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
 
if (ret == 0  name_entry.mode == delta_entry.mode 
hashcmp(name_entry.sha1, delta_entry.sha1) == 0) {
-   if (!copy_count)
+   if (!copy_count) {
copy_start = delta_pos;
+   copy_pos = out - buffer;
+   copy_end = 0;
+   }
copy_count++;
delta_pos++;
if (!tree_entry(delta_desc, delta_entry))
delta = NULL;
-   continue;
-   }
-   }
+   } else
+   copy_end = 1;
+   } else
+   copy_end = 1;
 
-   if (end - out  48) {
-   unsigned long sofar = out - buffer;
-   buffer = xrealloc(buffer, (sofar + 48)*2);
-   end = buffer + (sofar + 48)*2;
-   out = buffer + sofar;
-   }
+   if (copy_count  copy_end) {
+   unsigned char copy_buf[48], *cp = copy_buf;
 
-   if (copy_count) {
/*
 * Let's write a sequence indicating we're copying
 * entries from another object:
@@ -524,12 +524,31 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
 */
copy_start = (copy_start  1) | 1;
copy_count = (copy_count  1) | first_delta;
-   out += encode_varint(copy_start, out);
-   out += encode_varint(copy_count, out);
+   cp += encode_varint(copy_start, cp);
+   cp += encode_varint(copy_count, cp);
if (first_delta)
-   out += encode_sha1ref(delta_sha1, out);
+   cp += encode_sha1ref(delta_sha1, cp);
copy_count = 0;
-   first_delta = 0;
+
+   /*
+* Now let's make sure this is going to take less
+* space than the corresponding direct entries we've
+* created in parallel.  If so we dump the copy
+* sequence over those entries in the output buffer.
+*/
+   if (cp - copy_buf  out - buffer[copy_pos]) {
+   out = buffer + copy_pos;
+   memcpy(out, copy_buf, cp - copy_buf);
+   out += cp - copy_buf;
+   first_delta = 0;
+   }
+   }
+
+   if (end - out  48) {
+   unsigned long sofar = out - buffer;
+   buffer = xrealloc(buffer, (sofar + 48)*2);
+   end = buffer + (sofar + 48)*2;
+   out = buffer + sofar;
}
 
pathlen = tree_entry_len(name_entry);
@@ -545,13 +564,19 @@ void *pv4_encode_tree(void *_buffer, unsigned long *sizep,
}
 
if (copy_count) {
-   /* flush the trailing copy */
+   /* process the trailing copy */
+   unsigned char copy_buf[48], *cp = copy_buf;
copy_start = (copy_start  1) | 1;
copy_count = (copy_count  1) | first_delta;
-   out += encode_varint(copy_start, out);
-   out += encode_varint(copy_count, out);
+   cp += encode_varint(copy_start, cp);
+   cp += encode_varint(copy_count, cp);
if (first_delta)
-

[PATCH 30/38] pack v4: code to recreate a canonical commit object

2013-09-05 Thread Nicolas Pitre

Usage of snprintf() is possibly not the most efficient approach.
For example we could simply copy the needed strings and generate
the SHA1 hex strings directly into the destination buffer.  But
such optimizations may come later.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-parse.c | 74 ++
 1 file changed, 74 insertions(+)

diff --git a/packv4-parse.c b/packv4-parse.c
index 074e107..bca1a97 100644
--- a/packv4-parse.c
+++ b/packv4-parse.c
@@ -129,3 +129,77 @@ const unsigned char *get_nameref(struct packed_git *p, 
const unsigned char **src
}
return p-name_dict-data + p-name_dict-offsets[index];
 }
+
+void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs,
+off_t offset, unsigned long size)
+{
+   unsigned long avail;
+   git_zstream stream;
+   int len, st;
+   unsigned int nb_parents;
+   unsigned char *dst, *dcp;
+   const unsigned char *src, *scp, *sha1, *name;
+   unsigned long time;
+   int16_t tz;
+
+   dst = xmallocz(size);
+   dcp = dst;
+
+   src = use_pack(p, w_curs, offset, avail);
+   scp = src;
+
+   sha1 = get_sha1ref(p, scp);
+   len = snprintf((char *)dcp, size, tree %s\n, sha1_to_hex(sha1));
+   dcp += len;
+   size -= len;
+
+   nb_parents = decode_varint(scp);
+   while (nb_parents--) {
+   sha1 = get_sha1ref(p, scp);
+   len = snprintf((char *)dcp, size, parent %s\n, 
sha1_to_hex(sha1));
+   if (len = size)
+   die(overflow in %s, __func__);
+   dcp += len;
+   size -= len;
+   }
+
+   name = get_nameref(p, scp);
+   tz = (name[0]  8) | name[1];
+   time = decode_varint(scp);
+   len = snprintf((char *)dcp, size, author %s %lu %+05d\n, name+2, 
time, tz);
+   if (len = size)
+   die(overflow in %s, __func__);
+   dcp += len;
+   size -= len;
+
+   name = get_nameref(p, scp);
+   tz = (name[0]  8) | name[1];
+   time = decode_varint(scp);
+   len = snprintf((char *)dcp, size, committer %s %lu %+05d\n, name+2, 
time, tz);
+   if (len = size)
+   die(overflow in %s, __func__);
+   dcp += len;
+   size -= len;
+
+   if (scp - src  avail)
+   die(overflow in %s, __func__);
+   offset += scp - src;
+
+   memset(stream, 0, sizeof(stream));
+   stream.next_out = dcp;
+   stream.avail_out = size + 1;
+   git_inflate_init(stream);
+   do {
+   src = use_pack(p, w_curs, offset, stream.avail_in);
+   stream.next_in = (unsigned char *)src;
+   st = git_inflate(stream, Z_FINISH);
+   offset += stream.next_in - src;
+   } while ((st == Z_OK || st == Z_BUF_ERROR)  stream.avail_out);
+   git_inflate_end(stream);
+   if (st != Z_STREAM_END || stream.total_out != size) {
+   free(dst);
+   return NULL;
+   }
+
+   return dst;
+}
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/38] pack v4: tree object encoding

2013-09-05 Thread Nicolas Pitre

This goes as follows:

- Number of tree entries: variable length encoded.

Then for each tree entry:

- Path component reference: variable length encoded index into the path
  dictionary table which also covers the entry mode. To make the overall
  encoding efficient, the path table is already sorted by usage frequency
  so the most used path names are first and require the shortest index
  encoding.

- SHA1 reference: either variable length encoding of the index into the
  SHA1 table or the literal SHA1 prefixed by 0 (see encode_sha1ref()).

Rationale: all the tree object data is densely encoded while requiring
no zlib inflate processing on access, and all SHA1 references are most
likely to be direct indices into the pack index file requiring no SHA1
search.  Path filtering can be accomplished on the path index directly
without any string comparison during the tree traversal.

Still lacking is some kind of delta encoding for multiple tree objects
with only small differences between them.  But that'll come later.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 66 +
 1 file changed, 66 insertions(+)

diff --git a/packv4-create.c b/packv4-create.c
index d4a79f4..b91ee0b 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -393,6 +393,72 @@ bad:
return NULL;
 }
 
+/*
+ * This converts a canonical tree object buffer into its
+ * tightly packed representation using the already populated
+ * and sorted tree_path_table dictionary.  The parsing is
+ * strict so to ensure the canonical version may always be
+ * regenerated and produce the same hash.
+ */
+void *pv4_encode_tree(void *_buffer, unsigned long *sizep)
+{
+   unsigned long size = *sizep;
+   unsigned char *in, *out, *end, *buffer = _buffer;
+   struct tree_desc desc;
+   struct name_entry name_entry;
+   int nb_entries;
+
+   if (!size)
+   return NULL;
+
+   /*
+* We can't make sure the result will always be smaller than the
+* input. The smallest possible entry is 0 x\040 byte SHA1
+* or 44 bytes.  The output entry may have a realistic path index
+* encoding using up to 3 bytes, and a non indexable SHA1 meaning
+* 41 bytes.  And the output data already has the nb_entries
+* headers.  In practice the output size will be significantly
+* smaller but for now let's make it simple.
+*/
+   in = buffer;
+   out = xmalloc(size + 48);
+   end = out + size + 48;
+   buffer = out;
+
+   /* let's count how many entries there are */
+   init_tree_desc(desc, in, size);
+   nb_entries = 0;
+   while (tree_entry(desc, name_entry))
+   nb_entries++;
+   out += encode_varint(nb_entries, out);
+
+   init_tree_desc(desc, in, size);
+   while (tree_entry(desc, name_entry)) {
+   int pathlen, index;
+
+   if (end - out  48) {
+   unsigned long sofar = out - buffer;
+   buffer = xrealloc(buffer, (sofar + 48)*2);
+   end = buffer + (sofar + 48)*2;
+   out = buffer + sofar;
+   }
+
+   pathlen = tree_entry_len(name_entry);
+   index = dict_add_entry(tree_path_table, name_entry.mode,
+  name_entry.path, pathlen);
+   if (index  0) {
+   error(missing tree dict entry);
+   free(buffer);
+   return NULL;
+   }
+   out += encode_varint(index, out);
+   out += encode_sha1ref(name_entry.sha1, out);
+   }
+
+   *sizep = out - buffer;
+   return buffer;
+}
+
 static struct pack_idx_entry *get_packed_object_list(struct packed_git *p)
 {
unsigned i, nr_objects = p-num_objects;
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 24/38] packv4-create: add progress display

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/packv4-create.c b/packv4-create.c
index c23c791..fd16222 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -13,6 +13,7 @@
 #include tree-walk.h
 #include pack.h
 #include pack-revindex.h
+#include progress.h
 #include varint.h
 
 
@@ -627,8 +628,10 @@ static struct pack_idx_entry **sort_objs_by_offset(struct 
pack_idx_entry *list,
 static int create_pack_dictionaries(struct packed_git *p,
struct pack_idx_entry **obj_list)
 {
+   struct progress *progress_state;
unsigned int i;
 
+   progress_state = start_progress(Scanning objects, p-num_objects);
for (i = 0; i  p-num_objects; i++) {
struct pack_idx_entry *obj = obj_list[i];
void *data;
@@ -637,6 +640,8 @@ static int create_pack_dictionaries(struct packed_git *p,
struct object_info oi = {};
int (*add_dict_entries)(void *, unsigned long);
 
+   display_progress(progress_state, i+1);
+
oi.typep = type;
oi.sizep = size;
if (packed_object_info(p, obj-offset, oi)  0)
@@ -666,6 +671,7 @@ static int create_pack_dictionaries(struct packed_git *p,
free(data);
}
 
+   stop_progress(progress_state);
return 0;
 }
 
@@ -1009,6 +1015,7 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
off_t written = 0;
char *packname;
unsigned char pack_sha1[20];
+   struct progress *progress_state;
 
p = open_pack(src_pack);
if (!p)
@@ -1030,6 +1037,7 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
written += packv4_write_tables(f, nr_objects, objs);
 
/* Let's write objects out, updating the object index list in place */
+   progress_state = start_progress(Writing objects, nr_objects);
all_objs = objs;
all_objs_nr = nr_objects;
for (i = 0; i  nr_objects; i++) {
@@ -1039,7 +1047,9 @@ static void process_one_pack(char *src_pack, char 
*dst_pack)
written += packv4_write_object(f, p, obj);
obj-offset = obj_pos;
obj-crc32 = crc32_end(f);
+   display_progress(progress_state, i+1);
}
+   stop_progress(progress_state);
 
sha1close(f, pack_sha1, CSUM_CLOSE | CSUM_FSYNC);
 
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/38] pack v4: split the object list and dictionary creation

2013-09-05 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---
 packv4-create.c | 58 +++--
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/packv4-create.c b/packv4-create.c
index 5c08871..20d97a4 100644
--- a/packv4-create.c
+++ b/packv4-create.c
@@ -261,7 +261,7 @@ static int sort_by_offset(const void *e1, const void *e2)
return 0;
 }
 
-static int create_pack_dictionaries(struct packed_git *p)
+static struct idx_entry *get_packed_object_list(struct packed_git *p)
 {
uint32_t nr_objects, i;
struct idx_entry *objects;
@@ -275,7 +275,15 @@ static int create_pack_dictionaries(struct packed_git *p)
}
qsort(objects, nr_objects, sizeof(*objects), sort_by_offset);
 
-   for (i = 0; i  nr_objects; i++) {
+   return objects;
+}
+
+static int create_pack_dictionaries(struct packed_git *p,
+   struct idx_entry *objects)
+{
+   unsigned int i;
+
+   for (i = 0; i  p-num_objects; i++) {
void *data;
enum object_type type;
unsigned long size;
@@ -310,20 +318,21 @@ static int create_pack_dictionaries(struct packed_git *p)
typename(type), sha1_to_hex(objects[i].sha1));
free(data);
}
-   free(objects);
 
return 0;
 }
 
-static int process_one_pack(const char *path)
+static struct packed_git *open_pack(const char *path)
 {
char arg[PATH_MAX];
int len;
struct packed_git *p;
 
len = strlcpy(arg, path, PATH_MAX);
-   if (len = PATH_MAX)
-   return error(name too long: %s, path);
+   if (len = PATH_MAX) {
+   error(name too long: %s, path);
+   return NULL;
+   }
 
/*
 * In addition to foo.idx we accept foo.pack and foo;
@@ -333,8 +342,10 @@ static int process_one_pack(const char *path)
strcpy(arg + len - 5, .idx);
len--;
} else if (!has_extension(arg, .idx)) {
-   if (len + 4 = PATH_MAX)
-   return error(name too long: %s.idx, arg);
+   if (len + 4 = PATH_MAX) {
+   error(name too long: %s.idx, arg);
+   return NULL;
+   }
strcpy(arg + len, .idx);
len += 4;
}
@@ -345,17 +356,36 @@ static int process_one_pack(const char *path)
 */
if (len + 1 = PATH_MAX) {
arg[len - 4] = '\0';
-   return error(name too long: %s.pack, arg);
+   error(name too long: %s.pack, arg);
+   return NULL;
}
 
p = add_packed_git(arg, len, 1);
-   if (!p)
-   return error(packfile %s not found., arg);
+   if (!p) {
+   error(packfile %s not found., arg);
+   return NULL;
+   }
 
install_packed_git(p);
-   if (open_pack_index(p))
-   return error(packfile %s index not opened, p-pack_name);
-   return create_pack_dictionaries(p);
+   if (open_pack_index(p)) {
+   error(packfile %s index not opened, p-pack_name);
+   return NULL;
+   }
+
+   return p;
+}
+
+static void process_one_pack(char *src_pack)
+{
+   struct packed_git *p;
+   struct idx_entry *objs;
+
+   p = open_pack(src_pack);
+   if (!p)
+   die(unable to open source pack);
+
+   objs = get_packed_object_list(p);
+   create_pack_dictionaries(p, objs);
 }
 
 int main(int argc, char *argv[])
-- 
1.8.4.38.g317e65b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What's cooking in git.git (Sep 2013, #01; Tue, 3)

2013-09-05 Thread Matthieu Moy

Jens Lehmann jens.lehm...@web.de writes:

 Am 04.09.2013 21:19, schrieb Junio C Hamano:
 Matthieu Moy matthieu@grenoble-inp.fr writes:
 
 Junio C Hamano gits...@pobox.com writes:

 * mm/status-without-comment-char (2013-08-29) 5 commits
  - status: introduce status.displayCommentChar to disable display of #
  - SQUASH: do not fprintf() random strings
  - get rid of git submodule summary --for-status
  - wt-status: use argv_array API
  - builtin/stripspace.c: fix broken indentation

  Allow git status to omit the prefix to make its output a comment
  in a commit log editor, which is not necessary for human
  consumption.

 I'm waiting for the situation of git submodule --for-status to be
 settled to send a reroll. Don't merge this for now, and let
 bc/submodule-status-ignored converge.
 
 Thanks.  I had an impression that bc/submodule-status-ignored was
 still being discussed.  Has what I have in 'pu' settled?

 Almost (see my other mail concerning bc/submodule-status-ignored).
 If the removal of the --for-status option is reverted from patch #3
 here (which I outlined in $gmane/233764) this series can be merged
 without really depending on bc/submodule-status-ignored (even though
 the temporarily unused --for-status option may look a bit strange
 until the latter is merged).

I'll also need to reword the commit messages.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2] submodule: don't print status output with ignore=all

2013-09-05 Thread Matthieu Moy

Jens Lehmann jens.lehm...@web.de writes:

 Am 04.09.2013 08:31, schrieb Matthieu Moy:
 brian m. carlson sand...@crustytoothpaste.net writes:
 
 Tests are included which verify that this change has no effect on git 
 submodule
 summary without the --for-status option.
 
 I still don't understand why this is needed.

 To avoid a change in behavior for git submodule summary, as that
 never honored the submodule.*.ignore nor the diff.ignoreSubmodules
 setting (and I don't think it ever should).

I don't get it. If the goal is to keep the old behavior, then git
status shouldn't be changed either. Fixing bugs needs to change the
behavior.

IOW, why was it a bug that git status showed ignored submodules and
not a bug that git submodule summary did the same?

 Fine by me, what would you propose to clarify that? (Though I have the
 suspicion that the explanation will be three years late ;-)

I have no idea, as I do not understand the reason myself yet. I'm not a
frequent user of submodules and not a user of the ignore option at all,
so I can't tell what's best. I'd just like the new behavior to be
justified somewhere.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] git-gui: Modify push dialog to support Gerrit review

2013-09-05 Thread Johannes Sixt

Am 9/2/2013 10:54, schrieb Joergen Edelbo:
 Changes done:
 
 Remove selection of branches to push - push always HEAD.
 This can be justified by the fact that this far the most
 common thing to do.

What are your plans to support a topic-based workflow? Far the most
common thing to happen is that someone forgets to push completed topics.
With this change, aren't those people forced to relinguish their current
work because they have to checkout the completed topics to push them?

-- Hannes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 31/38] sha1_file.c: make use of decode_varint()

2013-09-05 Thread SZEDER Gábor

On Thu, Sep 05, 2013 at 02:19:54AM -0400, Nicolas Pitre wrote:
 ... replacing the equivalent open coded loop.
 
 Signed-off-by: Nicolas Pitre n...@fluxnic.net
 ---
  sha1_file.c | 14 +++---
  1 file changed, 3 insertions(+), 11 deletions(-)
 
 diff --git a/sha1_file.c b/sha1_file.c
 index a298933..67eb903 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -1687,20 +1687,12 @@ static off_t get_delta_base(struct packed_git *p,
* is stupid, as then a REF_DELTA would be smaller to store.
*/
   if (type == OBJ_OFS_DELTA) {
 - unsigned used = 0;
 - unsigned char c = base_info[used++];
 - base_offset = c  127;
 - while (c  128) {
 - base_offset += 1;
 - if (!base_offset || MSB(base_offset, 7))
 - return 0;  /* overflow */
 - c = base_info[used++];
 - base_offset = (base_offset  7) + (c  127);
 - }
 + const unsigned char *cp = base_info;
 + base_offset = decode_varint(cp);
   base_offset = delta_obj_offset - base_offset;
   if (base_offset = 0 || base_offset = delta_obj_offset)
   return 0;  /* out of bound */
 - *curpos += used;
 + *curpos += cp - base_info;
   } else if (type == OBJ_REF_DELTA) {
   /* The base entry _must_ be in the same pack */
   base_offset = find_pack_entry_one(base_info, p);
 -- 
 1.8.4.38.g317e65b

This patch seems to be a cleanup independent from pack v4, it applies
cleanly on master and passes all tests in itself.

Best,
Gábor

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] git p4: implement view spec wildcards with p4 where

2013-09-05 Thread kazuki saitoh

2013/9/4 Junio C Hamano gits...@pobox.com:
 kazuki saitoh ksaitoh...@gmail.com writes:

 Currently, git p4 does not support many of the view wildcards,
 such as * and %%n.  It only knows the common ... mapping, and
 exclusions.

 Redo the entire wildcard code around the idea of
 directly querying the p4 server for the mapping.  For each
 commit, invoke p4 where with committed file paths as args and use
 the client mapping to decide where the file goes in git.

 This simplifies a lot of code, and adds support for all
 wildcards supported by p4.
 Downside is that there is probably a 20%-ish slowdown with this approach.

 [pw: redo code and tests]

 Signed-off-by: Kazuki Saitoh ksaitoh...@gmail.com
 Signed-off-by: Pete Wyckoff p...@padd.com

 This was whitespace-damaged in the message I received, so what is
 queued is after manual fix-up.  Please double check what will be
 queued on 'pu' later and Ack, and then I'll move it to 'next' and
 'master'.
Indeed, some line is broken by line wrapping.
I use Gmail web interface and didn't read git format-patch --help.
Sorry for the inconvenience.

I checked 'pu' branch and it looks correct.

Acked-by: Kazuki Saitoh ksaitoh...@gmail.com


 Thanks.

 ---
  git-p4.py | 223 
 +-
  1 file changed, 59 insertions(+), 164 deletions(-)

 diff --git a/git-p4.py b/git-p4.py
 index 31e71ff..1793e86 100755
 --- a/git-p4.py
 +++ b/git-p4.py
 @@ -780,11 +780,14 @@ def getClientSpec():
  # dictionary of all client parameters
  entry = specList[0]

 +# the //client/ name
 +client_name = entry[Client]
 +
  # just the keys that start with View
  view_keys = [ k for k in entry.keys() if k.startswith(View) ]

  # hold this new View
 -view = View()
 +view = View(client_name)

  # append the lines, in order, to the view
  for view_num in range(len(view_keys)):
 @@ -1555,8 +1558,8 @@ class P4Submit(Command, P4UserMap):
  for b in body:
  labelTemplate += \t + b + \n
  labelTemplate += View:\n
 -for mapping in clientSpec.mappings:
 -labelTemplate += \t%s\n % mapping.depot_side.path
 +for depot_side in clientSpec.mappings:
 +labelTemplate += \t%s\n % depot_side

  if self.dry_run:
  print Would create p4 label %s for tag % name
 @@ -1568,7 +1571,7 @@ class P4Submit(Command, P4UserMap):

  # Use the label
  p4_system([tag, -l, name] +
 -  [%s@%s % (mapping.depot_side.path,
 changelist) for mapping in clientSpec.mappings])
 +  [%s@%s % (depot_side, changelist) for
 depot_side in clientSpec.mappings])

  if verbose:
  print created p4 label for tag %s % name
 @@ -1796,117 +1799,16 @@ class View(object):
  Represent a p4 view (p4 help views), and map files in a
 repo according to the view.

 -class Path(object):
 -A depot or client path, possibly containing wildcards.
 -   The only one supported is ... at the end, currently.
 -   Initialize with the full path, with //depot or //client.
 -
 -def __init__(self, path, is_depot):
 -self.path = path
 -self.is_depot = is_depot
 -self.find_wildcards()
 -# remember the prefix bit, useful for relative mappings
 -m = re.match((//[^/]+/), self.path)
 -if not m:
 -die(Path %s does not start with //prefix/ % self.path)
 -prefix = m.group(1)
 -if not self.is_depot:
 -# strip //client/ on client paths
 -self.path = self.path[len(prefix):]
 -
 -def find_wildcards(self):
 -Make sure wildcards are valid, and set up internal
 -   variables.
 -
 -self.ends_triple_dot = False
 -# There are three wildcards allowed in p4 views
 -# (see p4 help views).  This code knows how to
 -# handle ... (only at the end), but cannot deal with
 -# %%n or *.  Only check the depot_side, as p4 should
 -# validate that the client_side matches too.
 -if re.search(r'%%[1-9]', self.path):
 -die(Can't handle %%n wildcards in view: %s % self.path)
 -if self.path.find(*) = 0:
 -die(Can't handle * wildcards in view: %s % self.path)
 -triple_dot_index = self.path.find(...)
 -if triple_dot_index = 0:
 -if triple_dot_index != len(self.path) - 3:
 -die(Can handle only single ... wildcard, at end: %s %
 -self.path)
 -self.ends_triple_dot = True
 -
 -def ensure_compatible(self, other_path):
 -Make sure the wildcards agree.
 -if self.ends_triple_dot != other_path.ends_triple_dot:
 -

Re: [PATCH 0/3] Reject non-ff pulls by default

2013-09-05 Thread John Keeping

On Wed, Sep 04, 2013 at 03:59:18PM -0700, Junio C Hamano wrote:
 Are there cases where you do not want to either rebase nor merge?
 If so what do you want to do after git pull fetches from the other
 side?  Nothing?

One other thing that I can see being useful occasionally is:

git rebase @{u}@{1} --onto @{u}

which allows local commits to be replayed onto a rewritten upstream
branch.

Although I agree with your side note below that people doing this may be
better off fetching and then updating their local branch, particularly
if @{1} is not the correct reflog entry for the upstream when they
created the branch.

   Side note: a knee-jerk response to a yes answer to the
   last question from me has always been then why are you
   running 'git pull' in the first place. The next paragraph is
   my attempt to extend my imagination a bit, stepping outside
   that reaction.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2] submodule: don't print status output with ignore=all

2013-09-05 Thread Matthieu Moy

Matthieu Moy matthieu@grenoble-inp.fr writes:

 Jens Lehmann jens.lehm...@web.de writes:

 Fine by me, what would you propose to clarify that? (Though I have the
 suspicion that the explanation will be three years late ;-)

 I have no idea, as I do not understand the reason myself yet. I'm not a
 frequent user of submodules and not a user of the ignore option at all,
 so I can't tell what's best. I'd just like the new behavior to be
 justified somewhere.

I also think that the documentation in Documentation/config.txt could be
updated. Perhaps adding something like

To view the summary for submodules ignored by 'git status', one
can use 'git submodules summary' which shows a similar output
but does not honor this option.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] git-gui: Modify push dialog to support Gerrit review

2013-09-05 Thread Jørgen Edelbo

 -Original Message-
 From: Johannes Sixt [mailto:j.s...@viscovery.net]
 Sent: 5. september 2013 10:57
 
 Please do not top-post.
 
 Am 9/5/2013 10:29, schrieb Jørgen Edelbo:
  -Original Message- From: Johannes Sixt
  Am 9/2/2013 10:54, schrieb Joergen Edelbo:
  Changes done:
 
  Remove selection of branches to push - push always HEAD. This can be
  justified by the fact that this far the most common thing to do.
 
  What are your plans to support a topic-based workflow? Far the most
  common thing to happen is that someone forgets to push completed
  topics. With this change, aren't those people forced to relinguish
  their current work because they have to checkout the completed topics
  to push them?
 
  I am not quite sure what your concern is.
 
 When I have completed topics A and B, but forgot to push them, and now I
 am working on topic C, how do I push topics A and B?
 
 You say I can only push HEAD. I understand this that I have to stop work on C
 (perhaps commit or stash any unfinished work), then checkout A, push it,
 checkout B, push it, checkout C and unstash the unfinished work. If my
 understanding is correct, the new restriction is a no-go.

Ok, this way of working is not supported. It just never occurred to me that
you would work this way. Forgetting to push something that you have just 
completed is very far from what I am used to. I think it comes most natural
to push what you have done before changing topic. The reason I make a commit
is to get it out of the door.

 
 -- Hannes

- Jørgen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] git-gui: Modify push dialog to support Gerrit review

2013-09-05 Thread Johannes Sixt

Am 9/5/2013 11:18, schrieb Jørgen Edelbo:
 Forgetting to push something that you have just 
 completed is very far from what I am used to.

Forgetting to push is just one of many reasons why a branch that is not
equal to HEAD is not yet pushed... The new restriction is just too tight.

-- Hannes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 2/5] wt-status: use argv_array API

2013-09-05 Thread Matthieu Moy

No behavior change, but two slight code reorganization: argv_array_push
doesn't accept NULL strings, and duplicates its argument hence
summary_limit must be written to before being inserted into argv.

Signed-off-by: Matthieu Moy matthieu@imag.fr
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 wt-status.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index cb24f1f..958a53c 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -8,6 +8,7 @@
 #include diffcore.h
 #include quote.h
 #include run-command.h
+#include argv-array.h
 #include remote.h
 #include refs.h
 #include submodule.h
@@ -663,29 +664,30 @@ static void wt_status_print_submodule_summary(struct 
wt_status *s, int uncommitt
char summary_limit[64];
char index[PATH_MAX];
const char *env[] = { NULL, NULL };
-   const char *argv[8];
-
-   env[0] =index;
-   argv[0] =   submodule;
-   argv[1] =   summary;
-   argv[2] =   uncommitted ? --files : --cached;
-   argv[3] =   --for-status;
-   argv[4] =   --summary-limit;
-   argv[5] =   summary_limit;
-   argv[6] =   uncommitted ? NULL : (s-amend ? HEAD^ : HEAD);
-   argv[7] =   NULL;
+   struct argv_array argv = ARGV_ARRAY_INIT;
 
sprintf(summary_limit, %d, s-submodule_summary);
snprintf(index, sizeof(index), GIT_INDEX_FILE=%s, s-index_file);
 
+   env[0] = index;
+   argv_array_push(argv, submodule);
+   argv_array_push(argv, summary);
+   argv_array_push(argv, uncommitted ? --files : --cached);
+   argv_array_push(argv, --for-status);
+   argv_array_push(argv, --summary-limit);
+   argv_array_push(argv, summary_limit);
+   if (!uncommitted)
+   argv_array_push(argv, s-amend ? HEAD^ : HEAD);
+
memset(sm_summary, 0, sizeof(sm_summary));
-   sm_summary.argv = argv;
+   sm_summary.argv = argv.argv;
sm_summary.env = env;
sm_summary.git_cmd = 1;
sm_summary.no_stdin = 1;
fflush(s-fp);
sm_summary.out = dup(fileno(s-fp));/* run_command closes it */
run_command(sm_summary);
+   argv_array_clear(argv);
 }
 
 static void wt_status_print_other(struct wt_status *s,
-- 
1.8.4.4.g70bf5e8.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 5/5] tests: don't set status.oldStyle file-wide

2013-09-05 Thread Matthieu Moy

The previous commit set status.oldStyle file-wide in t7060-wtstatus.sh
and t7508-status.sh to make the patch small. However, now that
status.oldStyle is not the default, it is better to disable it in tests
so that the most common situation is also the most tested.

While we're there, move the cat  expect  EOF blocks inside the
tests.

Signed-off-by: Matthieu Moy matthieu@imag.fr
---
 t/t7060-wtstatus.sh | 113 ---
 t/t7508-status.sh   | 889 ++--
 2 files changed, 493 insertions(+), 509 deletions(-)

diff --git a/t/t7060-wtstatus.sh b/t/t7060-wtstatus.sh
index ddebd28..7d467c0 100755
--- a/t/t7060-wtstatus.sh
+++ b/t/t7060-wtstatus.sh
@@ -4,10 +4,6 @@ test_description='basic work tree status reporting'
 
 . ./test-lib.sh
 
-test_expect_success 'use status.oldStyle by default ' '
-   git config --global status.oldStyle true
-'
-
 test_expect_success setup '
git config --global advice.statusuoption false 
test_commit A 
@@ -33,20 +29,19 @@ test_expect_success 'Report new path with conflict' '
test_cmp expect actual
 '
 
-cat expect EOF
-# On branch side
-# You have unmerged paths.
-#   (fix conflicts and run git commit)
-#
-# Unmerged paths:
-#   (use git add/rm file... as appropriate to mark resolution)
-#
-#  deleted by us:  foo
-#
+test_expect_success 'M/D conflict does not segfault' '
+   cat expect EOF 
+On branch side
+You have unmerged paths.
+  (fix conflicts and run git commit)
+
+Unmerged paths:
+  (use git add/rm file... as appropriate to mark resolution)
+
+   deleted by us:  foo
+
 no changes added to commit (use git add and/or git commit -a)
 EOF
-
-test_expect_success 'M/D conflict does not segfault' '
mkdir mdconflict 
(
cd mdconflict 
@@ -139,19 +134,19 @@ test_expect_success 'status when conflicts with add and 
rm advice (deleted by th
test_commit on_second main.txt on_second 
test_commit master conflict.txt master 
test_must_fail git merge second_branch 
-   cat expected -\EOF 
-   # On branch master
-   # You have unmerged paths.
-   #   (fix conflicts and run git commit)
-   #
-   # Unmerged paths:
-   #   (use git add/rm file... as appropriate to mark resolution)
-   #
-   #   both added: conflict.txt
-   #   deleted by them:main.txt
-   #
-   no changes added to commit (use git add and/or git commit -a)
-   EOF
+   cat expected \EOF 
+On branch master
+You have unmerged paths.
+  (fix conflicts and run git commit)
+
+Unmerged paths:
+  (use git add/rm file... as appropriate to mark resolution)
+
+   both added: conflict.txt
+   deleted by them:main.txt
+
+no changes added to commit (use git add and/or git commit -a)
+EOF
git status --untracked-files=no actual 
test_i18ncmp expected actual
 '
@@ -172,20 +167,20 @@ test_expect_success 'prepare for conflicts' '
 
 test_expect_success 'status when conflicts with add and rm advice (both 
deleted)' '
test_must_fail git merge conflict 
-   cat expected -\EOF 
-   # On branch conflict_second
-   # You have unmerged paths.
-   #   (fix conflicts and run git commit)
-   #
-   # Unmerged paths:
-   #   (use git add/rm file... as appropriate to mark resolution)
-   #
-   #   both deleted:   main.txt
-   #   added by them:  sub_master.txt
-   #   added by us:sub_second.txt
-   #
-   no changes added to commit (use git add and/or git commit -a)
-   EOF
+   cat expected \EOF 
+On branch conflict_second
+You have unmerged paths.
+  (fix conflicts and run git commit)
+
+Unmerged paths:
+  (use git add/rm file... as appropriate to mark resolution)
+
+   both deleted:   main.txt
+   added by them:  sub_master.txt
+   added by us:sub_second.txt
+
+no changes added to commit (use git add and/or git commit -a)
+EOF
git status --untracked-files=no actual 
test_i18ncmp expected actual
 '
@@ -196,22 +191,22 @@ test_expect_success 'status when conflicts with only rm 
advice (both deleted)' '
test_must_fail git merge conflict 
git add sub_master.txt 
git add sub_second.txt 
-   cat expected -\EOF 
-   # On branch conflict_second
-   # You have unmerged paths.
-   #   (fix conflicts and run git commit)
-   #
-   # Changes to be committed:
-   #
-   #   new file:   sub_master.txt
-   #
-   # Unmerged paths:
-   #   (use git rm file... to mark resolution)
-   #
-   #   both deleted:   main.txt
-   #
-   # Untracked files not listed (use -u option to show untracked files)
-   EOF
+   cat expected \EOF 
+On branch conflict_second
+You have unmerged paths.
+  (fix conflicts and run git commit)
+
+Changes to be committed:
+
+   new file:   sub_master.txt
+

[PATCH v4 0/5] Disable git status comment prefix

2013-09-05 Thread Matthieu Moy

Compared to v2, this changes essentially:

* The prefix is actually disabled by default in this version. As a
  consequence, the option is renamed to status.oldStyle.

* Since this is the default, the tests are updated to test the new
  defaults. In a first patch, I'm setting status.oldStyle=true in test
  files that require it (to keep the patch short), and the last patch
  actually updates the test expected results. This was actually useful
  as I did find (and fix) a few bugs updating the tests:

  - the --columns option was still showing the comment prefix

  - git commit --dry-run and failed git commit were still
displaying comments to stdout.

* The --for-status option is kept as a no-op.

Matthieu Moy (5):
  builtin/stripspace.c: fix broken indentation
  wt-status: use argv_array API
  submodule summary: ignore --for-status option
  status: disable display of '#' comment prefix by default
  tests: don't set status.oldStyle file-wide

 Documentation/config.txt   |   7 +
 builtin/commit.c   |  10 +
 builtin/stripspace.c   |   8 +-
 git-submodule.sh   |  13 +-
 t/t3001-ls-files-others-exclude.sh |   2 +-
 t/t7060-wtstatus.sh| 109 +++--
 t/t7401-submodule-summary.sh   |  12 +-
 t/t7508-status.sh  | 944 +++--
 wt-status.c|  85 +++-
 wt-status.h|   1 +
 10 files changed, 640 insertions(+), 551 deletions(-)

-- 
1.8.4.4.g70bf5e8.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Reject non-ff pulls by default

2013-09-05 Thread John Keeping

On Thu, Sep 05, 2013 at 07:01:03AM -0400, John Szakmeister wrote:
 On Wed, Sep 4, 2013 at 6:59 PM, Junio C Hamano gits...@pobox.com wrote:
 [snip]
  When git pull stops because what was fetched in FETCH_HEAD does
  not fast-forward, then what did _you_ do (and with the knowledge you
  currently have, what would you do)?  In a single project, would you
  choose to sometimes rebase and sometimes merge, and if so, what is
  the choice depend on?  When I am on these selected branches, I want
  to merge, but on other branches I want to rebase?
 
 Our team isn't quite proficient enough yet to have a completely rebase
 workflow... though we might have less of a problem if we did.  So,
 several interesting points.  Most of the time, `git pull` would be a
 fast-forward merge.  We typically perform the merges of topic branches
 server-side--we have a build server who checks to make sure the result
 would be successful--and we just hit the big green button on the Merge
 button for the pull request (we use GitHub Enterprise at the moment).
 
 However, nearly as often, we just merge the branch locally because
 someone on the team is doing some manual testing, and it's just
 convenient to finish the process on the command line.  What
 occasionally happens is that you merge the topic locally, but someone
 else has introduced a new commit to master.  We try to preserve the
 mainline ordering of commits, so `git pull` doing a merge underneath
 the hood is undesirable (it moves the newly introduced commit off to
 the side).  Rebasing your current master branch is not the answer
 either, because it picks up the commits introduced by the topic branch
 and rebases those to--at least with the -p option, and without it, the
 results are just as bad).  Instead, we want to unfold our work,
 fast-forward merge the upstream, and the replay our actions--namely
 remerge the topic branch.  It often ends up translating to this:
 
$ git reset --hard HEAD~1
$ git merge --ff-only @{u}
$ git merge topic
$ git push
 
 So what I really want isn't quite rebase.  I'm not sure any of the
 proposed solutions would work.  It'd be really nice to replay only the
 mainline commits, without affecting commits introduced from a topic
 branch.

Does git rebase --preserve-merges do what you want here?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/5] Disable git status comment prefix

2013-09-05 Thread Matthieu Moy

Oops, this series forgot to update t7512-status-help.sh, which now
fails.

I'll send a reroll that updates it later (patch below).

diff --git a/t/t7512-status-help.sh b/t/t7512-status-help.sh
index 31a798f..0688d58 100755
--- a/t/t7512-status-help.sh
+++ b/t/t7512-status-help.sh
@@ -25,18 +25,18 @@ test_expect_success 'prepare for conflicts' '
 
 test_expect_success 'status when conflicts unresolved' '
test_must_fail git merge master 
-   cat expected -\EOF 
-   # On branch conflicts
-   # You have unmerged paths.
-   #   (fix conflicts and run git commit)
-   #
-   # Unmerged paths:
-   #   (use git add file... to mark resolution)
-   #
-   #   both modified:  main.txt
-   #
-   no changes added to commit (use git add and/or git commit -a)
-   EOF
+   cat expected \EOF 
+On branch conflicts
+You have unmerged paths.
+  (fix conflicts and run git commit)
+
+Unmerged paths:
+  (use git add file... to mark resolution)
+
+   both modified:  main.txt
+
+no changes added to commit (use git add and/or git commit -a)
+EOF
git status --untracked-files=no actual 
test_i18ncmp expected actual
 '
@@ -47,17 +47,17 @@ test_expect_success 'status when conflicts resolved before 
commit' '
test_must_fail git merge master 
echo one main.txt 
git add main.txt 
-   cat expected -\EOF 
-   # On branch conflicts
-   # All conflicts fixed but you are still merging.
-   #   (use git commit to conclude merge)
-   #
-   # Changes to be committed:
-   #
-   #   modified:   main.txt
-   #
-   # Untracked files not listed (use -u option to show untracked files)
-   EOF
+   cat expected \EOF 
+On branch conflicts
+All conflicts fixed but you are still merging.
+  (use git commit to conclude merge)
+
+Changes to be committed:
+
+   modified:   main.txt
+
+Untracked files not listed (use -u option to show untracked files)
+EOF
git status --untracked-files=no actual 
test_i18ncmp expected actual
 '
@@ -76,21 +76,21 @@ test_expect_success 'status when rebase in progress before 
resolving conflicts'
test_when_finished git rebase --abort 
ONTO=$(git rev-parse --short HEAD^^) 
test_must_fail git rebase HEAD^ --onto HEAD^^ 
-   cat expected -EOF 
-   # rebase in progress; onto $ONTO
-   # You are currently rebasing branch '\''rebase_conflicts'\'' on 
'\''$ONTO'\''.
-   #   (fix conflicts and then run git rebase --continue)
-   #   (use git rebase --skip to skip this patch)
-   #   (use git rebase --abort to check out the original branch)
-   #
-   # Unmerged paths:
-   #   (use git reset HEAD file... to unstage)
-   #   (use git add file... to mark resolution)
-   #
-   #   both modified:  main.txt
-   #
-   no changes added to commit (use git add and/or git commit -a)
-   EOF
+   cat expected EOF 
+rebase in progress; onto $ONTO
+You are currently rebasing branch '\''rebase_conflicts'\'' on '\''$ONTO'\''.
+  (fix conflicts and then run git rebase --continue)
+  (use git rebase --skip to skip this patch)
+  (use git rebase --abort to check out the original branch)
+
+Unmerged paths:
+  (use git reset HEAD file... to unstage)
+  (use git add file... to mark resolution)
+
+   both modified:  main.txt
+
+no changes added to commit (use git add and/or git commit -a)
+EOF
git status --untracked-files=no actual 
test_i18ncmp expected actual
 '
@@ -103,18 +103,18 @@ test_expect_success 'status when rebase in progress 
before rebase --continue' '
test_must_fail git rebase HEAD^ --onto HEAD^^ 
echo three main.txt 
git add main.txt 
-   cat expected -EOF 
-   # rebase in progress; onto $ONTO
-   # You are currently rebasing branch '\''rebase_conflicts'\'' on 
'\''$ONTO'\''.
-   #   (all conflicts fixed: run git rebase --continue)
-   #
-   # Changes to be committed:
-   #   (use git reset HEAD file... to unstage)
-   #
-   #   modified:   main.txt
-   #
-   # Untracked files not listed (use -u option to show untracked files)
-   EOF
+   cat expected EOF 
+rebase in progress; onto $ONTO
+You are currently rebasing branch '\''rebase_conflicts'\'' on '\''$ONTO'\''.
+  (all conflicts fixed: run git rebase --continue)
+
+Changes to be committed:
+  (use git reset HEAD file... to unstage)
+
+   modified:   main.txt
+
+Untracked files not listed (use -u option to show untracked files)
+EOF
git status --untracked-files=no actual 
test_i18ncmp expected actual
 '
@@ -135,21 +135,21 @@ test_expect_success 'status during rebase -i when 
conflicts unresolved' '
test_when_finished git rebase --abort 
ONTO=$(git rev-parse --short rebase_i_conflicts) 
test_must_fail git rebase -i rebase_i_conflicts 
-   cat expected -EOF 
-   # rebase

Re: [PATCH v4 06/11] replace: bypass the type check if -f option is used

2013-09-05 Thread Christian Couder

On Wed, Sep 4, 2013 at 10:44 PM, Junio C Hamano gits...@pobox.com wrote:
 Christian Couder chrisc...@tuxfamily.org writes:

 If -f option, which means '--force', is used, we can allow an object
 to be replaced with one of a different type, as the user should know
 what (s)he is doing.

 Signed-off-by: Christian Couder chrisc...@tuxfamily.org
 ---

 This does not matter in a larger picture, but between 1/11 and this
 patch, there is a window where an operation that has been useful in
 some workflows becomes unavailable to the user.

 For future reference, it would be better to do this as a part of
 1/11, to make sure that there always is an escape hatch available to
 the users.

Ok, I will squash patchs 6/11, 7/11 and 8/11 with 1/11, 2/11 and 3/11
respectively.

Thanks,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 10/11] Documentation/replace: list long option names

2013-09-05 Thread Christian Couder

On Wed, Sep 4, 2013 at 10:45 PM, Junio C Hamano gits...@pobox.com wrote:
 Christian Couder chrisc...@tuxfamily.org writes:

 Signed-off-by: Christian Couder chrisc...@tuxfamily.org
 ---
  Documentation/git-replace.txt | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
 index a2bd2ee..414000e 100644
 --- a/Documentation/git-replace.txt
 +++ b/Documentation/git-replace.txt
 @@ -54,13 +54,16 @@ achieve the same effect as the `--no-replace-objects` 
 option.
  OPTIONS
  ---
  -f::
 +--force::
   If an existing replace ref for the same object exists, it will
   be overwritten (instead of failing).

  -d::
 +--delete::
   Delete existing replace refs for the given objects.

  -l pattern::
 +--list pattern::
   List replace refs for objects that match the given pattern (or
   all if no pattern is given).
   Typing git replace without arguments, also lists all replace

 This should be in the same commit as what adds them.

Ok, I will squash 10/11 into 9/11.

Thanks,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Reject non-ff pulls by default

2013-09-05 Thread Greg Troxel


Junio C Hamano gits...@pobox.com writes:

 Peff already covered (1)---it is highly doubtful that a merge is
 almost always wrong.  In fact, if that _were_ the case, we should
 simply be defaulting to rebase, not failing the command and asking
 between merge and rebase like jc/pull-training-wheel topic did.

 We simply do not know what the user wants, as it heavily depends on
 the project, so we ask the user to choose one (and stick to it).

From my experience leading the first large project using git at BBN,
evolving a workflow (most work on topic branches, which are rebased,
banning 'git pull'-created merge commits, and explicit merge commits to
preserve --first-parent, basically), and seeing many people struggle to
learn all this, my take is that a user who does not understand non-ff
merge vs ff-merge vs rebase will end up doing the wrong thing.  So two
thoughts:

  In the glorious future, perhaps git could have a way to express a
  machine-parseable representation of the workflow and rules for a repo,
  so that these choices could be made accordingly.

  In the current world, I think it makes sense to error out when there
  are multiple reasonable choices depending on workflow.

One of my team members, Richard Hansen, has argued to us that 'git pull'
is harmful, essentially because it creates non-ff merges sometimes,
while our rules say those should be rebased out.  So we use

[alias]
up = !git remote update -p  git merge --ff-only @{u}

which acts like pull if ff merge works, and otherwise errors out.

I think the key question is: can a user who doesn't really understand
the implications of ff vs non-ff merges and the local workflow rules
actually function ok, or do they need to stop and go back and
understand.  I'm in the you just have to take the time to understand
camp, which led to us having a semi-custom syllabus from github training
covering rebase, explicit vs ff merges and the consequences for
first-parent history, etc.

Therefore, I think git pull should do the update (perhaps of just the
remote corresponding to @{u}, perhaps without -p) and a --ff-only merge,
absent a configuration asking for non-ff merge or rebase.  (Arguably, an
ff merge is a degenerate case of rebase and also of the ff/non-ff merge,
so it's safe with either policy.)

Greg


pgpT4Be5FrhuZ.pgp
Description: PGP signature

Zero padded file modes...

2013-09-05 Thread John Szakmeister

I went to clone a repository from GitHub today and discovered
something interesting:

:: git clone https://github.com/liebke/incanter.git
Cloning into 'incanter'...
remote: Counting objects: 10457, done.
remote: Compressing objects: 100% (3018/3018), done.
error: object 4946e1ba09ba5655202a7a5d81ae106b08411061:contains
zero-padded file modes
fatal: Error in object
fatal: index-pack failed

At first, it surprised me that no one has seen the issue before,
but then I remembered I have transfer.fsckObjects=true in my
config.  Turning it off, I was able to clone.  Running `git
fsck` I see:

:: git fsck
Checking object directories: 100% (256/256), done.
warning in tree 4946e1ba09ba5655202a7a5d81ae106b08411061: contains
zero-padded file modes
warning in tree 553c5e006e53a8360126f053c3ade3d1d063c2f5: contains
zero-padded file modes
warning in tree 0a2e7f55d7f8e1fa5469e6d83ff20365881eed1a: contains
zero-padded file modes
Checking objects: 100% (10560/10560), done.

So there appears to be several instances of the issue in the
tree.  Looking in the archives, I ran across this thread:

http://comments.gmane.org/gmane.comp.version-control.git/143288

In there, Nicolas Pitre says:

 This is going to screw up pack v4 (yes, someday I'll have the
 time to make it real).

I don't know if this is still true, but given that patches are
being sent out about it, I thought it relevant.

Also, searching on the issue, you'll find that a number of
repositories have suffered from this problem, and it appears the
only fix will result in different commit ids.  Given all of
that, should Git be updated to cope with zero padded modes?  Or
is there no some way of fixing the issue that doesn't involve
changing commit ids?

Thanks!

-John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] cherry-pick: allow - as abbreviation of '@{-1}'

2013-09-05 Thread Hiroshige Umino

- abbreviation is handy for cherry-pick like checkout and merge.

It's also good for uniformity that a - stands as
the name of the previous branch where a branch name is
accepted and it could not mean any other things like stdin.

Signed-off-by: Hiroshige Umino hiroshig...@gmail.com
---
 builtin/revert.c  |  2 ++
 t/t3500-cherry.sh | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/builtin/revert.c b/builtin/revert.c
index 8e87acd..52c35e7 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -202,6 +202,8 @@ int cmd_cherry_pick(int argc, const char **argv, const char 
*prefix)
memset(opts, 0, sizeof(opts));
opts.action = REPLAY_PICK;
git_config(git_default_config, NULL);
+   if (!strcmp(argv[1], -))
+   argv[1] = @{-1};
parse_args(argc, argv, opts);
res = sequencer_pick_revisions(opts);
if (res  0)
diff --git a/t/t3500-cherry.sh b/t/t3500-cherry.sh
index f038f34..547dbf8 100755
--- a/t/t3500-cherry.sh
+++ b/t/t3500-cherry.sh
@@ -55,4 +55,19 @@ test_expect_success \
  expr $(echo $(git cherry master my-topic-branch) ) : + [^ ]* - .*
 '
 
+test_expect_success \
+'cherry-pick - does not work initially' \
+'test_must_fail git cherry-pick -
+'
+
+test_expect_success \
+'cherry-pick the commit in the previous branch' \
+'git branch other 
+ test_commit commit-to-pick newfile content 
+ echo content expected 
+ git checkout other 
+ git cherry-pick - 
+ test_cmp expected newfile
+'
+
 test_done
-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Reject non-ff pulls by default

2013-09-05 Thread Richard Hansen

On 2013-09-04 18:59, Junio C Hamano wrote:
 Philip Oakley philipoak...@iee.org writes:
 
 From: Junio C Hamano gits...@pobox.com
 John Keeping j...@keeping.me.uk writes:

 I think there are two distinct uses for pull, which boil down to:

 (1) git pull
 ...
 Peff already covered (1)---it is highly doubtful that a merge is
 almost always wrong.  In fact, if that _were_ the case, we should
 simply be defaulting to rebase, not failing the command and asking
 between merge and rebase like jc/pull-training-wheel topic did.

 We simply do not know what the user wants, as it heavily depends on
 the project, so we ask the user to choose one (and stick to it).

 We only offer a limited list. It won't be sufficient for all use
 cases. It wasn't for me.
 
 Very interesting. Tell us more.

I'm a bit late to the discussion, but I wanted to chime in.  I detest
'git pull' and discourage everyone I meet from using it.  See:
http://stackoverflow.com/questions/15316601/why-is-git-pull-considered-harmful
for my reasons.

Instead, I encourage people to do this:

   git config --global alias.up '!git remote update -p; git merge
--ff-only @{u}'

and tell them to run 'git up' whenever they would be tempted to use a
plain 'git pull'.

I usually work with a central repository with topic branches.  I follow
this rule of thumb:
  * When merging a same-named branch (e.g., origin/foo into foo, foo
into origin/foo), it should always be a fast-forward.  This may
require rebasing.
  * When merging a differently-named branch (e.g., feature.xyz into
master), it should never be a fast-forward.

In distributed workflows, I think of 'git pull collaborator-repo
their-branch' as merging a differently-named branch (I wouldn't be
merging if they hadn't told me that a separate feature they were working
on is complete), so I generally want the merge commit.  But when I do a
'git pull' without extra arguments, I'm updating a same-named branch so
I never want a merge.

When merging a differently-named branch, I prefer the merge --no-ff to
be preceded by a rebase to get a nice, pretty graph:

   * merge feature.xyz  - master
   |\
   | * xyz part 3/3
   | * xyz part 2/3
   | * xyz part 1/3
   |/
   * merge feature.foo
   |\
   | * foo part 2/2
   | * foo part 1/2
   |/
   * merge feature.bar
   |\
   ...

The explicit merge has several benefits:
  * It clearly communicates to others that the feature is done.
  * It makes it easier to revert the entire feature by reverting the
merge if necessary.
  * It allows our continuous integration tool to skip over the
work-in-progress commits and test only complete features.
  * It makes it easier to review the entire feature in one diff.
  * 'git log --first-parent' shows a high-level summary of the changes
over time, while a normal 'git log' shows the details.

 
 When git pull stops because what was fetched in FETCH_HEAD does
 not fast-forward, then what did _you_ do (and with the knowledge you
 currently have, what would you do)?

I stop and review what's going on, then make a decision:
  * usually it's a rebase
  * sometimes it's a rebase --onto (because the branch was
force-updated to undo a particularly bad commit)
  * sometimes it's a rebase -p (because there's an explicit merge of a
different branch that I want to keep)
  * sometimes it's a reset --hard (my changes were made obsolete by a
different upstream change)
  * sometimes it's a merge
  * sometimes I do nothing.  This is a fairly regular pattern:  I'm in
the middle of working on something that I know will conflict with
some changes that were just pushed upstream, and I want to finish
my changes before starting the rebase.  My collaborator contacts me
and asks, Would you take a look at the changes I just pushed?  If
I type 'git pull' out of habit to get the commits, then I'll make a
mess of my work-in-progress work tree.  If I type 'git up' out of
habit, then the merge --ff-only will fail as expected and I can
quickly review the commits without messing with my work tree or
HEAD.

Even if I always rebase or always merge, I want to briefly review what
changed in the remote branch *before* I start the rebase.  This helps me
understand the conflicts I might encounter.

Thus, ff-only always works for me.  I might have to type a second merge
or rebase command, but that's OK -- it gives me an opportunity to think
about what I want first.  Non-ff merges are rare enough that the
interruption isn't annoying at all.

 In a single project, would you
 choose to sometimes rebase and sometimes merge, and if so, what is
 the choice depend on?  When I am on these selected branches, I want
 to merge, but on other branches I want to rebase?

My choice depends on the circumstances of the divergence.  It's never as
simple as branch X always has this policy while branch Y has that policy.

 Are there cases where you do not want to either

[PATCH] Documentation/git-merge.txt: fix formatting of example block

2013-09-05 Thread Andreas Schwab

You need at least four dashes in a line to have it recognized as listing
block delimiter by asciidoc.

Signed-off-by: Andreas Schwab sch...@linux-m68k.org
---
 Documentation/git-merge.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
index 8c7f2f6..a74c371 100644
--- a/Documentation/git-merge.txt
+++ b/Documentation/git-merge.txt
@@ -186,11 +186,11 @@ In such a case, you can unwrap the tag yourself before 
feeding it
 to `git merge`, or pass `--ff-only` when you do not have any work on
 your own. e.g.
 

+
 git fetch origin
 git merge v1.2.3^0
 git merge --ff-only v1.2.3

+
 
 
 HOW CONFLICTS ARE PRESENTED
-- 
1.8.4

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread Jeff King

On Thu, Sep 05, 2013 at 10:00:39AM -0400, John Szakmeister wrote:

 I went to clone a repository from GitHub today and discovered
 something interesting:
 
 :: git clone https://github.com/liebke/incanter.git
 Cloning into 'incanter'...
 remote: Counting objects: 10457, done.
 remote: Compressing objects: 100% (3018/3018), done.
 error: object 4946e1ba09ba5655202a7a5d81ae106b08411061:contains
 zero-padded file modes
 fatal: Error in object
 fatal: index-pack failed

Yep. These were mostly caused by a bug in Grit that is long-fixed.  But
the objects remain in many histories. It would have painful to rewrite
them back then, and it would be even more painful now.

  This is going to screw up pack v4 (yes, someday I'll have the
  time to make it real).
 
 I don't know if this is still true, but given that patches are
 being sent out about it, I thought it relevant.

I haven't looked carefully at the pack v4 patches yet, but I suspect
that yes, it's still a problem. The premise of pack v4 is that we can do
better by not storing the raw git object bytes, but rather storing
specialized representations of the various components. For example, by
using an integer to store the mode rather than the ascii representation.
But that representation does not represent the oops, I have a 0-padded
mode quirk. And we have to be able to recover the original object, byte
for byte, from the v4 representation (to verify sha1, or to generate a
loose object or v2 pack).

There are basically two solutions:

  1. Add a single-bit flag for I am 0-padded in the real data. We
 could probably even squeeze it into the same integer.

  2. Have a classic section of the pack that stores the raw object
 bytes. For objects which do not match our expectations, store them
 raw instead of in v4 format. They will not get the benefit of v4
 optimizations, but if they are the minority of objects, that will
 only end up with a slight slow-down.

As I said, I have not looked carefully at the v4 patches, so maybe they
handle this case already. But of the two solutions, I prefer (2). Doing
(1) can solve _this_ problem, but it complicates the format, and does
nothing for any future compatibility issues. Whereas (2) is easy to
implement, since it is basically just pack v2 (and implementations would
need a pack v2 reader anyway).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread Duy Nguyen

On Thu, Sep 5, 2013 at 10:36 PM, Jeff King p...@peff.net wrote:
  This is going to screw up pack v4 (yes, someday I'll have the
  time to make it real).

 I don't know if this is still true, but given that patches are
 being sent out about it, I thought it relevant.

 I haven't looked carefully at the pack v4 patches yet, but I suspect
 that yes, it's still a problem. The premise of pack v4 is that we can do
 better by not storing the raw git object bytes, but rather storing
 specialized representations of the various components. For example, by
 using an integer to store the mode rather than the ascii representation.
 But that representation does not represent the oops, I have a 0-padded
 mode quirk. And we have to be able to recover the original object, byte
 for byte, from the v4 representation (to verify sha1, or to generate a
 loose object or v2 pack).

 There are basically two solutions:

   1. Add a single-bit flag for I am 0-padded in the real data. We
  could probably even squeeze it into the same integer.

   2. Have a classic section of the pack that stores the raw object
  bytes. For objects which do not match our expectations, store them
  raw instead of in v4 format. They will not get the benefit of v4
  optimizations, but if they are the minority of objects, that will
  only end up with a slight slow-down.

3. Detect this situation and fall back to v2.

4. Update v4 to allow storing raw tree entries mixing with v4-encoded
tree entries. This is something between (1) and (2)

 As I said, I have not looked carefully at the v4 patches, so maybe they
 handle this case already. But of the two solutions, I prefer (2). Doing
 (1) can solve _this_ problem, but it complicates the format, and does
 nothing for any future compatibility issues. Whereas (2) is easy to
 implement, since it is basically just pack v2 (and implementations would
 need a pack v2 reader anyway).

I think (4) fits better in v4 design and probably not hard to do. Nico
recently added a code to embed a tree entry inline, but the mode must
be encoded (and can't contain leading zeros). We could have another
code to store mode in ascii. This also makes me wonder if we might
have similar problems with timezones, which are also specially encoded
in v4..

(3) is probably easiest. We need to scan through all tree entries
first when creating v4 anyway. If we detect any anomalies, just switch
back to v2 generation. The user will be force to rewrite history in
order to take full advantage of v4 (they can have a pack of weird
trees in v2 and the rest in v4 pack, but that's not optimal).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V3] check-ignore: Add option to ignore index contents

2013-09-05 Thread Dave Williams

check-ignore currently shows how .gitignore rules would treat untracked
paths. Tracked paths do not generate useful output.  This prevents
debugging of why a path became tracked unexpectedly unless that path is
first removed from the index with `git rm --cached path`.

This option (-i, --no-index) simply by-passes the check for the path
being in the index and hence allows tracked paths to be checked too.

Whilst this behaviour deviates from the characteristics of `git add` and
`git status` its use case is unlikely to cause any user confusion.

Test scripts are augmented to check this option against the standard
ignores to ensure correct behaviour.

Signed-off-by: Dave Williams d...@opensourcesolutions.co.uk
---
In V3 I have taken on board comments from Junio Hamano and Eric
Sunshine from V2 ($gmane/233660).

In particlar Junio queried my approach in builtin/git-check-ignores.c
that bypassed the functions that check a path is in the index as well as
avoiding reading the index in the first place.

In my view removing the bypass makes the code slightly less clear,
relies on the_index being initialized and the functions using it to exit
quickly when it has no content. Nevertheless I have bowed to his better
domain knowledge and after undertaking brief analysis to check the
assumptions I have removed the extra conditional steps.  This has
simplified the patch. The revised code appears to have the same
behaviour as before and passes the test script (t/t9-ignors.sh). 

Regarding the test script I have tidied up the variables containing the
separate option switches so they dont contain leading spaces, instead I
have added spaces as needed when formatting the command line.  This
not only improves my patch but also the existing code which was a little
inconsistent in this respect.

Finally I have rebased from the latest commmit on master to pick up
unrelated recent changes made to builtin/check-ignores.c and updated my
code to be consistent with this.

Hopefully I have put these patch notes in the right place now! Let me
know if not.

Dave


 Documentation/git-check-ignore.txt |  7 
 builtin/check-ignore.c |  6 ++-
 t/t0008-ignores.sh | 75 +-
 3 files changed, 77 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-check-ignore.txt 
b/Documentation/git-check-ignore.txt
index d2df487..96c591f 100644
--- a/Documentation/git-check-ignore.txt
+++ b/Documentation/git-check-ignore.txt
@@ -45,6 +45,13 @@ OPTIONS
not be possible to distinguish between paths which match a
pattern and those which don't.
 
+-i, --no-index::
+   Don't look in the index when undertaking the checks. This can
+   be used to debug why a path became tracked by e.g. `git add .`
+   and was not ignored by the rules as expected by the user or when
+   developing patterns including negation to match a path previously
+   added with `git add -f`.
+
 OUTPUT
 --
 
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 25aa2a5..f58f384 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -5,7 +5,7 @@
 #include pathspec.h
 #include parse-options.h
 
-static int quiet, verbose, stdin_paths, show_non_matching;
+static int quiet, verbose, stdin_paths, show_non_matching, no_index;
 static const char * const check_ignore_usage[] = {
 git check-ignore [options] pathname...,
 git check-ignore [options] --stdin  list-of-paths,
@@ -24,6 +24,8 @@ static const struct option check_ignore_options[] = {
 N_(terminate input and output records by a NUL character)),
OPT_BOOL('n', non-matching, show_non_matching,
 N_(show non-matching input paths)),
+   OPT_BOOL('i', no-index, no_index,
+N_(ignore index when checking)),
OPT_END()
 };
 
@@ -157,7 +159,7 @@ int cmd_check_ignore(int argc, const char **argv, const 
char *prefix)
die(_(--non-matching is only valid with --verbose));
 
/* read_cache() is only necessary so we can watch out for submodules. */
-   if (read_cache()  0)
+   if (!no_index  read_cache()  0)
die(_(index file corrupt));
 
memset(dir, 0, sizeof(dir));
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index c29342d..760c7bf 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -66,11 +66,11 @@ test_check_ignore () {
 
init_vars 
rm -f $HOME/stdout $HOME/stderr $HOME/cmd 
-   echo git $global_args check-ignore $quiet_opt $verbose_opt 
$non_matching_opt $args \
+   echo git $global_args check-ignore $quiet_opt $verbose_opt 
$non_matching_opt $no_index_opt $args \
$HOME/cmd 
echo $expect_code $HOME/expected-exit-code 
test_expect_code $expect_code \
-   git $global_args check-ignore $quiet_opt $verbose_opt 
$non_matching_opt $args \
+   git $global_args check-ignore $quiet_opt $verbose_opt

Re: Zero padded file modes...

2013-09-05 Thread A Large Angry SCM




On 09/05/2013 11:36 AM, Jeff King wrote:
[...]


I haven't looked carefully at the pack v4 patches yet, but I suspect
that yes, it's still a problem. The premise of pack v4 is that we can do
better by not storing the raw git object bytes, but rather storing
specialized representations of the various components. For example, by
using an integer to store the mode rather than the ascii representation.
But that representation does not represent the oops, I have a 0-padded
mode quirk. And we have to be able to recover the original object, byte
for byte, from the v4 representation (to verify sha1, or to generate a
loose object or v2 pack).

There are basically two solutions:

   1. Add a single-bit flag for I am 0-padded in the real data. We
  could probably even squeeze it into the same integer.

   2. Have a classic section of the pack that stores the raw object
  bytes. For objects which do not match our expectations, store them
  raw instead of in v4 format. They will not get the benefit of v4
  optimizations, but if they are the minority of objects, that will
  only end up with a slight slow-down.

As I said, I have not looked carefully at the v4 patches, so maybe they
handle this case already. But of the two solutions, I prefer (2). Doing
(1) can solve _this_ problem, but it complicates the format, and does
nothing for any future compatibility issues. Whereas (2) is easy to
implement, since it is basically just pack v2 (and implementations would
need a pack v2 reader anyway).


3. Keep those objects in v2 packs instead of the v4 pack. Transfers 
would have to be v3 or multi-pack transfers would need to be supported.


4. Don't use v4 packs with projects that have crufty objects. Projects 
with such objects may choose to pay the cost to upgrade to v4 
compatibility.


There's nothing that requires the next pack format to support all of the 
broken stuff that's happened over the years.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread Jeff King

On Thu, Sep 05, 2013 at 11:18:24PM +0700, Nguyen Thai Ngoc Duy wrote:

  There are basically two solutions:
 
1. Add a single-bit flag for I am 0-padded in the real data. We
   could probably even squeeze it into the same integer.
 
2. Have a classic section of the pack that stores the raw object
   bytes. For objects which do not match our expectations, store them
   raw instead of in v4 format. They will not get the benefit of v4
   optimizations, but if they are the minority of objects, that will
   only end up with a slight slow-down.
 
 3. Detect this situation and fall back to v2.
 
 4. Update v4 to allow storing raw tree entries mixing with v4-encoded
 tree entries. This is something between (1) and (2)

I wouldn't want to do (3). At some point pack v4 may become the standard
format, but there will be some repositories which will never be allowed
to adopt it.

For (4), yes, that could work. But like (1), it only solves problems in
tree entries. What happens if we have a quirky commit object that needs
the same treatment (e.g., a timezone that does not fit into the commit
name dictionary properly)?

 I think (4) fits better in v4 design and probably not hard to do. Nico
 recently added a code to embed a tree entry inline, but the mode must
 be encoded (and can't contain leading zeros). We could have another
 code to store mode in ascii. This also makes me wonder if we might
 have similar problems with timezones, which are also specially encoded
 in v4..

Yeah, that might be more elegant.

 (3) is probably easiest. We need to scan through all tree entries
 first when creating v4 anyway. If we detect any anomalies, just switch
 back to v2 generation. The user will be force to rewrite history in
 order to take full advantage of v4 (they can have a pack of weird
 trees in v2 and the rest in v4 pack, but that's not optimal).

Splitting across two packs isn't great, though. What if v4 eventually
becomes the normal on-the-wire format? I'd rather have some method for
just embedding what are essentially v2 objects into the v4 pack, which
would give us future room for handling these sorts of things.

But like I said, I haven't looked closely yet, so maybe there are
complications with that. In the meantime, I'll defer to the judgement of
people who know what they are talking about. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread John Szakmeister

On Thu, Sep 5, 2013 at 11:36 AM, Jeff King p...@peff.net wrote:
 On Thu, Sep 05, 2013 at 10:00:39AM -0400, John Szakmeister wrote:

 I went to clone a repository from GitHub today and discovered
 something interesting:

 :: git clone https://github.com/liebke/incanter.git
 Cloning into 'incanter'...
 remote: Counting objects: 10457, done.
 remote: Compressing objects: 100% (3018/3018), done.
 error: object 4946e1ba09ba5655202a7a5d81ae106b08411061:contains
 zero-padded file modes
 fatal: Error in object
 fatal: index-pack failed

 Yep. These were mostly caused by a bug in Grit that is long-fixed.  But
 the objects remain in many histories. It would have painful to rewrite
 them back then, and it would be even more painful now.

I guess there's still the other side of the question though.  Are
these repositories busted in the sense that something no longer works?
 I doesn't appear to be the case, but I've not used it extensively say
I can't say for certain one way or another.  In the sense that the
content is not strictly compliant, transfer.fsckObjects did its job,
but I wonder if fsck needs to be a little more tolerant now (at least
with respect to transfer objects)?

I can certainly cope with the issue--it's not a problem for me to flip
the flag on the command line.  I think it'd be nice to have
transer.fsckObjects be the default at some point, considering how
little people run fsck otherwise and how long these sorts of issues go
undiscovered.  Issues like the above seem to stand in the way of that
happening though.

-John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Document pack v4 format

2013-09-05 Thread Nicolas Pitre

On Thu, 5 Sep 2013, Duy Nguyen wrote:

 On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote:
  Now the pack index v3 probably needs to be improved a little, again to
  accommodate completion of thin packs.  Given that the main SHA1 table is
  now in the main pack file, it should be possible to still carry a small
  SHA1 table in the index file that corresponds to the appended objects
  only. This means that a SHA1 search will have to first use the main SHA1
  table in the pack file as it is done now, and if not found then use the
  SHA1 table in the index file if it exists.  And of course
  nth_packed_object_sha1() will have to be adjusted accordingly.
 
 What if the sender prepares the sha-1 table to contain missing objects
 in advance? The sender should know what base objects are missing. Then
 we only need to append objects at the receiving end and verify that
 all new objects are also present in the sha-1 table.

I do like this idea very much.  And that doesn't increase the thin pack 
size as the larger SHA1 table will be compensated by a smaller sha1ref 
encoding in those objects referring to the missing ones.



Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread Nicolas Pitre

On Thu, 5 Sep 2013, Jeff King wrote:

 There are basically two solutions:
 
   1. Add a single-bit flag for I am 0-padded in the real data. We
  could probably even squeeze it into the same integer.
 
   2. Have a classic section of the pack that stores the raw object
  bytes. For objects which do not match our expectations, store them
  raw instead of in v4 format. They will not get the benefit of v4
  optimizations, but if they are the minority of objects, that will
  only end up with a slight slow-down.

That is basically what I just suggested.  But instead of a special 
section, simply using a special object type number would do it.

I'm even wondering if that couldn't be used for fixing a thin pack 
instead of the special provision I just added last night.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Document pack v4 format

2013-09-05 Thread Duy Nguyen

On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote:
 Now the pack index v3 probably needs to be improved a little, again to
 accommodate completion of thin packs.  Given that the main SHA1 table is
 now in the main pack file, it should be possible to still carry a small
 SHA1 table in the index file that corresponds to the appended objects
 only. This means that a SHA1 search will have to first use the main SHA1
 table in the pack file as it is done now, and if not found then use the
 SHA1 table in the index file if it exists.  And of course
 nth_packed_object_sha1() will have to be adjusted accordingly.

What if the sender prepares the sha-1 table to contain missing objects
in advance? The sender should know what base objects are missing. Then
we only need to append objects at the receiving end and verify that
all new objects are also present in the sha-1 table.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread Nicolas Pitre

On Thu, 5 Sep 2013, Jeff King wrote:

 On Thu, Sep 05, 2013 at 11:18:24PM +0700, Nguyen Thai Ngoc Duy wrote:
 
   There are basically two solutions:
  
 1. Add a single-bit flag for I am 0-padded in the real data. We
could probably even squeeze it into the same integer.
  
 2. Have a classic section of the pack that stores the raw object
bytes. For objects which do not match our expectations, store them
raw instead of in v4 format. They will not get the benefit of v4
optimizations, but if they are the minority of objects, that will
only end up with a slight slow-down.
  
  3. Detect this situation and fall back to v2.
  
  4. Update v4 to allow storing raw tree entries mixing with v4-encoded
  tree entries. This is something between (1) and (2)
 
 I wouldn't want to do (3). At some point pack v4 may become the standard
 format, but there will be some repositories which will never be allowed
 to adopt it.
 
 For (4), yes, that could work. But like (1), it only solves problems in
 tree entries. What happens if we have a quirky commit object that needs
 the same treatment (e.g., a timezone that does not fit into the commit
 name dictionary properly)?
 
  I think (4) fits better in v4 design and probably not hard to do. Nico
  recently added a code to embed a tree entry inline, but the mode must
  be encoded (and can't contain leading zeros). We could have another
  code to store mode in ascii. This also makes me wonder if we might
  have similar problems with timezones, which are also specially encoded
  in v4..
 
 Yeah, that might be more elegant.
 
  (3) is probably easiest. We need to scan through all tree entries
  first when creating v4 anyway. If we detect any anomalies, just switch
  back to v2 generation. The user will be force to rewrite history in
  order to take full advantage of v4 (they can have a pack of weird
  trees in v2 and the rest in v4 pack, but that's not optimal).
 
 Splitting across two packs isn't great, though. What if v4 eventually
 becomes the normal on-the-wire format? I'd rather have some method for
 just embedding what are essentially v2 objects into the v4 pack, which
 would give us future room for handling these sorts of things.
 
 But like I said, I haven't looked closely yet, so maybe there are
 complications with that. In the meantime, I'll defer to the judgement of
 people who know what they are talking about. :)

None of the above is particularly appealing to me.

Pack v4 has to enforce some standardization in the object encoding to be 
efficient.  Some compromizes have been applied to accommodate the fixing 
of a thin pack, although I was initially tempted to simply dodge the 
issue and allow thin packs in a repository.

On this particular mode issue, I remember making a fuss at the time when 
this was discovered because the github implementation did generate such 
tree objects at the time.

So instead of compromizing the pack v4 object encoding further, I'd 
simply suggest adding a special object type which is in fact simply the 
pack v2 representation i.e. the canonical object version, deflated.  
Right now pack v4 encodes only 5 object types: commit, tree, blob, delta 
and tag.  Only the commit and tree objects have their representation 
transcoded.  So that means we only need to add native_commit and 
native_tree object types.

Then, anything that doesn't fit the strict expectation for transcoding a 
tree or a commit object is simply included as is without transcoding 
just like in pack v2.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/38] pack v4: add commit object parsing

2013-09-05 Thread Nicolas Pitre

On Thu, 5 Sep 2013, SZEDER Gábor wrote:

 Hi,
 
 
 On Thu, Sep 05, 2013 at 02:19:28AM -0400, Nicolas Pitre wrote:
  Let's create another dictionary table to hold the author and committer
  entries.  We use the same table format used for tree entries where the
  16 bit data prefix is conveniently used to store the timezone value.
  
  In order to copy straight from a commit object buffer, dict_add_entry()
  is modified to get the string length as the provided string pointer is
  not always be null terminated.
  
  Signed-off-by: Nicolas Pitre n...@fluxnic.net
  ---
   packv4-create.c | 98 
  +++--
   1 file changed, 89 insertions(+), 9 deletions(-)
  
  diff --git a/packv4-create.c b/packv4-create.c
  index eccd9fc..5c08871 100644
  --- a/packv4-create.c
  +++ b/packv4-create.c
  @@ -1,5 +1,5 @@
   /*
  - * packv4-create.c: management of dictionary tables used in pack v4
  + * packv4-create.c: creation of dictionary tables and objects used in pack 
  v4
*
* (C) Nicolas Pitre n...@fluxnic.net
*
  @@ -80,9 +80,9 @@ static void rehash_entries(struct dict_table *t)
  }
   }
   
  -int dict_add_entry(struct dict_table *t, int val, const char *str)
  +int dict_add_entry(struct dict_table *t, int val, const char *str, int 
  str_len)
   {
  -   int i, val_len = 2, str_len = strlen(str) + 1;
  +   int i, val_len = 2;
   
  if (t-ptr + val_len + str_len  t-size) {
 
 We need a +1 here on the left side, i.e.
 
 if (t-ptr + val_len + str_len + 1  t-size) {

Absolutely, good catch.

 Sidenote: couldn't we call the 'ptr' field something else, like
 end_offset or end_idx?  It took me some headscratching to figure out
 why is it OK to compare a pointer to an integer above, or use a
 pointer without dereferencing as an index into an array below (because
 ptr is, well, not a pointer after all).

Indeed.  This is a remnant of an earlier implementation which didn't use 
realloc() and therefore this used to be a real pointer.

Both issues now addressed in my tree.

Thanks


Nicolas

Re: [PATCH] git-gui: Modify push dialog to support Gerrit review

2013-09-05 Thread Junio C Hamano

Jørgen Edelbo j...@napatech.com writes:

 You say I can only push HEAD. I understand this that I have to stop work on C
 (perhaps commit or stash any unfinished work), then checkout A, push it,
 checkout B, push it, checkout C and unstash the unfinished work. If my
 understanding is correct, the new restriction is a no-go.

 Ok, this way of working is not supported. It just never occurred to me that
 you would work this way. Forgetting to push something that you have just 
 completed is very far from what I am used to. I think it comes most natural
 to push what you have done before changing topic. The reason I make a commit
 is to get it out of the door.

People work in very different ways, and mine and yours are extreme
opposites.  I almost never push out a commit that is just off the
press, I use topic branches heavily and work on multiple topics
(either related or unrelated) in parallel all the time, so it is
very usual for me to push out more than one branches with a single
push---by definition, if we support pushing only the current branch
out, working on more than one topics and after perfecting these,
push them together cannot be done.

If one sets push.default to something other than the matching, and
does not have any remote.*.push refspec in the configuration, J6t's
I forgot to push while I was on that branch and my I deliberately
did not push those branches out while I was on them, but now I know
I am ready to push them out cannot be done without explicit
refspecs on the command line.

But stepping back a bit, I think this I commit because I want to
get it out the door is coming from the same desire that led to a
possible design mistake that made various push.default settings push
only the current branch out.

The possible design mistake that has been disturbing me is the
following.

The push.default setting controls two unrelated things, and that
is one too many.  Between the matching modes and the rest, it tells
what branch(es) are pushed out (either all the branches with the
same name or the current branch).  That is one thing, and I tend
to agree that push only the current branch would be a sane default
many people would prefer (and that is the reason we made it the
default for Git 2.0).

Among the various non matching modes, it also determines where a
branch that is pushed out would go (current pushes to the same
name, upstream pushes to @{upstream}, etc.).  But this once I
choose what branch to push out, the branch being pushed out knows
where it wants to go, does not take effect if one explicitly names
what to push, e.g. in this sequence where one forgets to push
'frotz' out:

j6t$ git checkout frotz
... work work work ...
j6t$ git checkout xyzzy
... work work work ...
... realize 'frotz' is done
j6t$ git push origin frotz

The last push may work differently from the push in this sequence:

j6t$ git checkout frotz
... work work work ...
j6t$ git push ;# or git push origin

In the latter sequence, the niceties of push.upstream to push
'frotz' out to the frotz@{upstream} (and your git push origin
refs/heads/frotz:refs/for/frotz mapping done in git-gui) will take
effect, but in the former, the refspec frotz on the command line
is taken as a short-hand for frotz:frotz.

We may want to teach git push that the command line refspec that
names a local branch may not be pushing to the same name but wants
to go through the same mapping used when git push is done while
the branch is checked out, with some mechanism.

It is a separate but very related topic, I think.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Must Read

2013-09-05 Thread mslizawong

i have a business proposal for you, write me back for more info.
-Sent from my ipad.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] git-gui: Modify push dialog to support Gerrit review

2013-09-05 Thread Junio C Hamano

Joergen Edelbo j...@napatech.com writes:

 +proc get_remote_branch {} {
 + global push_branchtype push_branch push_new
 + set branch {}
 + switch -- $push_branchtype {
 + existing { set branch $push_branch }
 + create   { set branch $push_new }
 + }
 +   return $branch
 +}
 +
 +proc get_remote_ref_spec {} {
 + global gerrit_review
 + set push_branch [get_remote_branch]
 + if {$gerrit_review} {
 + return refs/for/$push_branch
 + } else {
 + return refs/heads/$push_branch
   }
 +}

I am puzzled.  This may be fine for those who use Git-GUI and
nothing else to push, but will not help whose who use both Git-GUI
and the command line.

Isn't the right way to improve the situation to let the command line
tools know how the user wants to push things out and just have
Git-GUI delegate the choice to the underlying git push?

For example, if you are working on a topic 'frotz', and if the
location you push is managed by Gerrit, isn't it the case that you
always want to push it to refs/for/frotz, whether you are pushing
via Git-GUI or from the command line?

I think we discussed during 1.8.4 cycle a configuration like this:

[branch frotz]
push = refs/heads/frotz:refs/for/frotz

as part of the triangular workflow topic that allows you to
specify that when 'frotz' is pushed out, it goes to
refs/for/frotz, or something like that, but I do not recall what
came out of that.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Documentation/git-merge.txt: fix formatting of example block

2013-09-05 Thread Junio C Hamano

Thanks.  It seems that this was broken from the beginning at
77c72780 (Documentation: merging a tag is a special case,
2013-03-21).

Will apply.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] pathspec: catch prepending :(prefix) on pathspec with short magic

2013-09-05 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

 :(prefix) is in the long form. Suppose people pass :!foo with '!'
 being the short form of magic 'bar', the code will happily turn it to
 :(prefix..)!foo, which makes '!' part of the path and no longer a magic.

 The correct form must be ':(prefix..,bar)foo', but as so far we
 haven't had any magic in short form yet (*), the code to convert from
 short form to long one will be inactive anyway. Let's postpone it
 until a real short form magic appears.

 (*) The short form magic '/' is a special case and won't be caught by
 this die(), which is correct. When '/' magic is detected, prefixlen is
 set back to 0 and the whole if (prefixlen..) block is skipped.

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
 ---
  fixes on top of nd/magic-pathspec.

  pathspec.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/pathspec.c b/pathspec.c
 index d9f4143..62fde50 100644
 --- a/pathspec.c
 +++ b/pathspec.c
 @@ -231,7 +231,9 @@ static unsigned prefix_pathspec(struct pathspec_item 
 *item,
   const char *start = elt;
   if (prefixlen  !literal_global) {
   /* Preserve the actual prefix length of each pattern */
 - if (long_magic_end) {
 + if (short_magic)
 + die(BUG: prefixing on short magic is not 
 supported);
 + else if (long_magic_end) {
   strbuf_add(sb, start, long_magic_end - start);
   strbuf_addf(sb, ,prefix:%d, prefixlen);
   start = long_magic_end;

Good.

I wonder if we should start thinking about removing the big
NEEDSWORK comment in front of this function.

Also the pathspec_magic[] array was the table-driven way that was
meant to be enhanced to fit future needs to parse all supported
types of pathspec magic, but it seems that prefix:12 magic is
implemented using a custom/special case code.  We may want to see if
we want to enrich the parser to fold this to match the table-driven
approach better someday---this is not urgent as we are not adding
any new pathspec magic now.

Will queue.  Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] git-svn: Fix termination issues for remote svn connections

2013-09-05 Thread Junio C Hamano

Uli Heller uli.hel...@daemons-point.com writes:

 When using git-svn in combination with serf-1.2.1 core dumps are
 created on termination. This is caused by a bug in serf, a fix for
 the bug exists (see https://code.google.com/p/serf/source/detail?r=2146).
 Nevertheless, I think it makes sense to fix the issue within the
 git perl module Ra.pm, too. The change frees the private copy of
 the remote access object on termination which prevents the error
 from happening.

 Note: Since subversion-1.8.0 and later do require serf-1.2.1 or later,
 the core dumps typically do show up when upgrading to a recent version
 of subversion.

 Credits: Jonathan Lambrechts for proposing a fix to Ra.pm.
 Evgeny Kotkov and Ivan Zhakov for fixing the issue in serf and
 pointing me to that fix.
 ---

Thanks.  Please sign-off your patch.

I am Cc'ing Kyle McKay who apparently had some experience working
with git-svn with newer svn that can only use serf, hoping that we
can get an independent opinion/test just to be sure.  Also Cc'ed is
Eric Wong who has been the official git-svn area expert, but I
understand that Eric hasn't needed to use git-svn for quite a while,
so it is perfectly fine if he does not have any comment on this one.

We may want to find a volunteer to move git svn forward as a new
area expert (aka subsystem maintainer), by the way.



  perl/Git/SVN/Ra.pm | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/perl/Git/SVN/Ra.pm b/perl/Git/SVN/Ra.pm
 index 75ecc42..78dd346 100644
 --- a/perl/Git/SVN/Ra.pm
 +++ b/perl/Git/SVN/Ra.pm
 @@ -32,6 +32,11 @@ BEGIN {
   }
  }

 +END {
 + $RA = undef;
 + $ra_invalid = 1;
 +}
 +
  sub _auth_providers () {
   my @rv = (
 SVN::Client::get_simple_provider(),
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread Jeff King

On Thu, Sep 05, 2013 at 01:09:34PM -0400, Nicolas Pitre wrote:

 On Thu, 5 Sep 2013, Jeff King wrote:
 
  There are basically two solutions:
  
1. Add a single-bit flag for I am 0-padded in the real data. We
   could probably even squeeze it into the same integer.
  
2. Have a classic section of the pack that stores the raw object
   bytes. For objects which do not match our expectations, store them
   raw instead of in v4 format. They will not get the benefit of v4
   optimizations, but if they are the minority of objects, that will
   only end up with a slight slow-down.
 
 That is basically what I just suggested.  But instead of a special 
 section, simply using a special object type number would do it.

Yeah, I think we are in agreement. I only suggested a separate section
because I hadn't carefully read the v4 patches yet, and didn't know if
there was room in the normal sequence. A special object number seems
much more elegant.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 37/38] pack v4: introduce escape hatches in the name and path indexes

2013-09-05 Thread Nicolas Pitre

On Thu, 5 Sep 2013, Nicolas Pitre wrote:

 If the path or name index is zero, this means the entry data is to be
 found inline rather than being located in the dictionary table. This is
 there to allow easy completion of thin packs without having to add new
 table entries which would have required a full rewrite of the pack data.
 
 Signed-off-by: Nicolas Pitre n...@fluxnic.net

I'm now dropping this patch.  Please also remove this from your 
documentation patch.

I think that we've found a way to better support thin packs.

You said:

 What if the sender prepares the sha-1 table to contain missing objects
 in advance? The sender should know what base objects are missing. Then
 we only need to append objects at the receiving end and verify that
 all new objects are also present in the sha-1 table.

So the SHA1 table is covered.

Missing objects in a thin pack cannot themselves be deltas.  We had 
their undeltified form at the end of a pack for the pack to be complete.  
Therefore those missing objects serve only as base objects for other 
deltas.

Although this is possible to have deltified commit objects in pack v2, I 
don't think this happens very often. There is no deltified commit 
objects in pack v4.

Blob objects are the same in pack v2 and pack v4.  No dictionary 
references are needed.

That leaves only tree objects.  And because we've also discussed the 
need to have non transcoded object representations for those odd cases 
such as zero padded file modes, we might as well simply use that for the 
appended tree objects already needed to complete a thin pack.  At least 
the strings in tree entries will be compressed that way.

Problem solved, and one less special case in the code.

What do you think?


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Reject non-ff pulls by default

2013-09-05 Thread John Keeping

On Thu, Sep 05, 2013 at 12:18:45PM -0700, Junio C Hamano wrote:
 John Keeping j...@keeping.me.uk writes:
 
  On Wed, Sep 04, 2013 at 03:59:18PM -0700, Junio C Hamano wrote:
  Are there cases where you do not want to either rebase nor merge?
  If so what do you want to do after git pull fetches from the other
  side?  Nothing?
 
  One other thing that I can see being useful occasionally is:
 
  git rebase @{u}@{1} --onto @{u}
 
  which allows local commits to be replayed onto a rewritten upstream
  branch.
 
 Sure, that would make sense.
 
 I somehow thought that rebase by default looked in the reflog to do
 exactly that. Perhaps I am not remembering correctly.

It just does @{upstream} by default, which tends to get messy if the
upstream has been rewritten.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Reject non-ff pulls by default

2013-09-05 Thread Junio C Hamano

John Keeping j...@keeping.me.uk writes:

 On Wed, Sep 04, 2013 at 03:59:18PM -0700, Junio C Hamano wrote:
 Are there cases where you do not want to either rebase nor merge?
 If so what do you want to do after git pull fetches from the other
 side?  Nothing?

 One other thing that I can see being useful occasionally is:

 git rebase @{u}@{1} --onto @{u}

 which allows local commits to be replayed onto a rewritten upstream
 branch.

Sure, that would make sense.

I somehow thought that rebase by default looked in the reflog to do
exactly that. Perhaps I am not remembering correctly.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/5] Disable git status comment prefix

2013-09-05 Thread Junio C Hamano

Matthieu Moy matthieu@imag.fr writes:

 Compared to v2, this changes essentially:

 * The prefix is actually disabled by default in this version. As a
   consequence, the option is renamed to status.oldStyle.

 * Since this is the default, the tests are updated to test the new
   defaults. In a first patch, I'm setting status.oldStyle=true in test
   files that require it (to keep the patch short), and the last patch
   actually updates the test expected results. This was actually useful
   as I did find (and fix) a few bugs updating the tests:

   - the --columns option was still showing the comment prefix

   - git commit --dry-run and failed git commit were still
 displaying comments to stdout.

 * The --for-status option is kept as a no-op.

Much nicer.  Will replace and queue.

One caveat, though.  The name oldStyle will become problematic,
when we want to remove some wart in the output format long after
this no comment prefix by default series lands.  Some people may
expect setting oldStyle=true would give output from 1.8.4 era, while
some others would expect that it would give output from 1.8.5 era
that does not have comment prefix but still has that wart we will
remove down the line.

It is a common mistake to make to think that one is making the final
change to the code and the result of that final change will stay the
latest forever.  I am sure I've done that for a few times and later
regretted them.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 03/11] t6050-replace: test that objects are of the same type

2013-09-05 Thread Christian Couder

From: Junio C Hamano gits...@pobox.com

 Christian Couder chrisc...@tuxfamily.org writes:
  
 +test_expect_success 'replaced and replacement objects must be of the same 
 type' '
 +test_must_fail git replace mytag $HASH1 2err 
 +grep mytag. points to a replaced object of type .tag err 
 +grep $HASH1. points to a replacement object of type .commit err 
 
 Hmm, would these messages ever get translated?  I think it is
 sufficient to make sure that the proposed replacement fails for
 these cases.

Ok, I will get rid of the grep statements.

Thanks,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Zero padded file modes...

2013-09-05 Thread Jeff King

On Thu, Sep 05, 2013 at 01:13:40PM -0400, John Szakmeister wrote:

  Yep. These were mostly caused by a bug in Grit that is long-fixed.  But
  the objects remain in many histories. It would have painful to rewrite
  them back then, and it would be even more painful now.
 
 I guess there's still the other side of the question though.  Are
 these repositories busted in the sense that something no longer works?

No, as far as I know, everything still works fine. However, some diffs
may be suboptimal, because we may have two different sha1s for the same
subtree (so we may descend into the tree unnecessarily only to find that
they are equivalent). And by the same token, any scripts doing
non-recursive diffs may erroneously mark the trees as differing, even
though they do not contain any differing files.

But neither is a big problem in practice. If you had two clients in
active use which were flip-flopping a sub-tree back and forth between
representations, it would be a problem. But we are talking about a few
isolated incidents far back in history.

 I doesn't appear to be the case, but I've not used it extensively say
 I can't say for certain one way or another.  In the sense that the
 content is not strictly compliant, transfer.fsckObjects did its job,
 but I wonder if fsck needs to be a little more tolerant now (at least
 with respect to transfer objects)?

Fsck actually treats this as a warning, not an error. It is
transfer.fsckObjects (via index-pack --strict) that actually treats
warnings as errors.

It's possible that this should be loosened to allow through problems
marked as FSCK_WARN (with a message, kind of like...a warning). Though
it may also make sense to revisit some of the classifications in fsck
(e.g., many of the warnings are indicative of seriously broken objects).

GitHub uses transfer.fsckObjects, rejecting all warnings[1]. In practice
it is not usually a big deal, as people are happy to fix up their
objects _before_ they get widely published. The biggest push-back we get
is when somebody tries to re-push history they got from another GitHub
repo, and then says But why are you complaining? You served this crappy
broken history? And it's a fair point. If you are forking (but not
joining the existing fork network) of an existing project with
irregularities in the history, it's not really an option to simply
rewrite the history you are basing on.

-Peff

[1] Actually, we do let through 0-padded modes with a warning,
explicitly because of the problem mentioned above.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/5] Disable git status comment prefix

2013-09-05 Thread Matthieu Moy

Junio C Hamano gits...@pobox.com writes:

 One caveat, though.  The name oldStyle will become problematic,
 when we want to remove some wart in the output format long after
 this no comment prefix by default series lands.  Some people may
 expect setting oldStyle=true would give output from 1.8.4 era, while
 some others would expect that it would give output from 1.8.5 era
 that does not have comment prefix but still has that wart we will
 remove down the line.

I'm fine with any name actually (since it is enabled by default, people
don't need to know the name to benefit from the new output). Maybe
status.displayCommentPrefix was the best name after all.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] git-svn: Fix termination issues for remote svn connections

2013-09-05 Thread Eric Wong

Junio C Hamano gits...@pobox.com wrote:
 Uli Heller uli.hel...@daemons-point.com writes:
  Nevertheless, I think it makes sense to fix the issue within the
  git perl module Ra.pm, too. The change frees the private copy of
  the remote access object on termination which prevents the error
  from happening.

 Thanks.  Please sign-off your patch.
 
 I am Cc'ing Kyle McKay who apparently had some experience working
 with git-svn with newer svn that can only use serf, hoping that we
 can get an independent opinion/test just to be sure.  Also Cc'ed is
 Eric Wong who has been the official git-svn area expert, but I
 understand that Eric hasn't needed to use git-svn for quite a while,
 so it is perfectly fine if he does not have any comment on this one.
 
 We may want to find a volunteer to move git svn forward as a new
 area expert (aka subsystem maintainer), by the way.

Correct, git-svn has the effect of being self-obsoleting.

I agree with adding a workaround for broken things, however
I suggest a code comment explaining why it is necessary.
The commit message is important, too, but might get harder to track
down if there's code movement/refactoring in the future.

  +END {
  +   $RA = undef;
  +   $ra_invalid = 1;
  +}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Document pack v4 format

2013-09-05 Thread Junio C Hamano

Nicolas Pitre n...@fluxnic.net writes:

 On Thu, 5 Sep 2013, Duy Nguyen wrote:

 On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote:
  Now the pack index v3 probably needs to be improved a little, again to
  accommodate completion of thin packs.  Given that the main SHA1 table is
  now in the main pack file, it should be possible to still carry a small
  SHA1 table in the index file that corresponds to the appended objects
  only. This means that a SHA1 search will have to first use the main SHA1
  table in the pack file as it is done now, and if not found then use the
  SHA1 table in the index file if it exists.  And of course
  nth_packed_object_sha1() will have to be adjusted accordingly.
 
 What if the sender prepares the sha-1 table to contain missing objects
 in advance? The sender should know what base objects are missing. Then
 we only need to append objects at the receiving end and verify that
 all new objects are also present in the sha-1 table.

 I do like this idea very much.  And that doesn't increase the thin pack 
 size as the larger SHA1 table will be compensated by a smaller sha1ref 
 encoding in those objects referring to the missing ones.

Let me see if I understand the proposal correctly.  Compared to a
normal pack-v4 stream, a thin pack-v4 stream:

 - has all the SHA-1 object names involved in the stream in its main
   object name table---most importantly, names of objects that
   thin optimization omits from the pack data body are included;

 - uses the SHA-1 object name table offset to refer to other
   objects, even to ones that thin stream will not transfer in the
   pack data body;

 - is completed at the receiving end by appending the data for the
   objects that were not transferred due to the thin optimization.

So the invariant all objects contained in the pack in:

 - A table of sorted SHA-1 object names for all objects contained in
   the pack.

that appears in Documentation/technical/pack-format.txt is still
kept at the end, and more importantly, any object that is mentioned
in this table can be reconstructed by using pack data in the same
packfile without referencing anything else.  Most importantly, if we
were to build a v2 .idx file for the resulting .pack, the list of
object names in the .idx file would be identical to the object names
in this table in the .pack file.

If that is the case, I too like this.

I briefly wondered if it makes sense to mention objects that are
often referred to that do not exist in the pack in this table
(e.g. new commits included in this pack refer to a tree object that
has not changed for ages---their trees mention this subtree using a
SHA-1 reference encoding and being able to name the old,
unchanging tree with an index to the object table may save space),
but that would break the above invariant in a big way---some objects
mentioned in the table may not exist in the packfile itself---and it
probably is not a good idea.  Unlike that broken idea, include
names of the objects that will be appended anyway approach to help
fattening a thin-pack makes very good sense to me.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-05 Thread Brad King

On 09/04/2013 05:27 PM, Junio C Hamano wrote:
 I am not saying the above is the best format, but the point is that
 the mode of the operation defines the structure

Great, thanks for your comments.  Based on that I've prototyped a
new format.  Rather than jumping straight to the patch, here is my
proposed format documentation for review:

-
With `--stdin`, update-ref reads instructions from standard input and
performs all modifications together.  Specify commands of the form:

create SP ref SP newvalue LF
update SP ref SP newvalue [SP oldvalue] LF
delete SP ref [SP oldvalue] LF
verify SP ref [SP oldvalue] LF
option SP opt LF

Quote fields containing whitespace as if they were strings in C source
code.  Alternatively, use `-z` to specify commands without quoting:

create SP ref NUL newvalue NUL
update SP ref NUL newvalue NUL [oldvalue] NUL
delete SP ref NUL [oldvalue] NUL
verify SP ref NUL [oldvalue] NUL
option SP opt NUL

Lines of any other format or a repeated ref produce an error.
Command meanings are:

create::
Create ref with newvalue only if it does not exist.

update::
Update ref to be newvalue, verifying oldvalue if given.
Specify a zero newvalue to delete a ref and/or a zero
oldvalue to make sure that a ref does not exist.

delete::
Delete ref, verifying oldvalue if given.

verify::
Verify ref against oldvalue but do not change it.  If
oldvalue zero or missing, the ref must not exist.

option::
Specify an option to take effect for following commands.
Valid options are `deref` and `no-deref` to specify whether
to dereference symbolic refs.

Use 40 0 or the empty string to specify a zero value, except that
with `-z` an empty oldvalue is considered missing.

If all refs can be locked with matching oldvalues
simultaneously, all modifications are performed.  Otherwise, no
modifications are performed.  Note that while each individual
ref is updated or deleted atomically, a concurrent reader may
still see a subset of the modifications.
-

Thanks,
-Brad
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Document pack v4 format

2013-09-05 Thread Nicolas Pitre

On Thu, 5 Sep 2013, Junio C Hamano wrote:

 Nicolas Pitre n...@fluxnic.net writes:
 
  On Thu, 5 Sep 2013, Duy Nguyen wrote:
 
  On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre n...@fluxnic.net wrote:
   Now the pack index v3 probably needs to be improved a little, again to
   accommodate completion of thin packs.  Given that the main SHA1 table is
   now in the main pack file, it should be possible to still carry a small
   SHA1 table in the index file that corresponds to the appended objects
   only. This means that a SHA1 search will have to first use the main SHA1
   table in the pack file as it is done now, and if not found then use the
   SHA1 table in the index file if it exists.  And of course
   nth_packed_object_sha1() will have to be adjusted accordingly.
  
  What if the sender prepares the sha-1 table to contain missing objects
  in advance? The sender should know what base objects are missing. Then
  we only need to append objects at the receiving end and verify that
  all new objects are also present in the sha-1 table.
 
  I do like this idea very much.  And that doesn't increase the thin pack 
  size as the larger SHA1 table will be compensated by a smaller sha1ref 
  encoding in those objects referring to the missing ones.
 
 Let me see if I understand the proposal correctly.  Compared to a
 normal pack-v4 stream, a thin pack-v4 stream:
 
  - has all the SHA-1 object names involved in the stream in its main
object name table---most importantly, names of objects that
thin optimization omits from the pack data body are included;
 
  - uses the SHA-1 object name table offset to refer to other
objects, even to ones that thin stream will not transfer in the
pack data body;
 
  - is completed at the receiving end by appending the data for the
objects that were not transferred due to the thin optimization.
 
 So the invariant all objects contained in the pack in:
 
  - A table of sorted SHA-1 object names for all objects contained in
the pack.
 
 that appears in Documentation/technical/pack-format.txt is still
 kept at the end, and more importantly, any object that is mentioned
 in this table can be reconstructed by using pack data in the same
 packfile without referencing anything else.  Most importantly, if we
 were to build a v2 .idx file for the resulting .pack, the list of
 object names in the .idx file would be identical to the object names
 in this table in the .pack file.

That is right.

 If that is the case, I too like this.
 
 I briefly wondered if it makes sense to mention objects that are
 often referred to that do not exist in the pack in this table
 (e.g. new commits included in this pack refer to a tree object that
 has not changed for ages---their trees mention this subtree using a
 SHA-1 reference encoding and being able to name the old,
 unchanging tree with an index to the object table may save space),
 but that would break the above invariant in a big way---some objects
 mentioned in the table may not exist in the packfile itself---and it
 probably is not a good idea.

Yet, if an old tree that doesn't change is often referred to, it should 
be possible to have only one such reference in the whole pack and all 
the other trees can use a delta copy sequence to refer to it.  At this 
point whether or not the tree being referred to is listed inline or in 
the SHA1 table doesn't make a big difference.

 Unlike that broken idea, include
 names of the objects that will be appended anyway approach to help
 fattening a thin-pack makes very good sense to me.
 
 
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RE: [PATCH] git-gui: Modify push dialog to support Gerrit review

2013-09-05 Thread Heiko Voigt

On Thu, Sep 05, 2013 at 09:18:25AM +, Jørgen Edelbo wrote:
  -Original Message-
  From: Johannes Sixt [mailto:j.s...@viscovery.net]
  Sent: 5. september 2013 10:57
  
  Please do not top-post.
  
  Am 9/5/2013 10:29, schrieb Jørgen Edelbo:
   -Original Message- From: Johannes Sixt
   Am 9/2/2013 10:54, schrieb Joergen Edelbo:
   Changes done:
  
   Remove selection of branches to push - push always HEAD. This can be
   justified by the fact that this far the most common thing to do.
  
   What are your plans to support a topic-based workflow? Far the most
   common thing to happen is that someone forgets to push completed
   topics. With this change, aren't those people forced to relinguish
   their current work because they have to checkout the completed topics
   to push them?
  
   I am not quite sure what your concern is.
  
  When I have completed topics A and B, but forgot to push them, and now I
  am working on topic C, how do I push topics A and B?
  
  You say I can only push HEAD. I understand this that I have to stop work on 
  C
  (perhaps commit or stash any unfinished work), then checkout A, push it,
  checkout B, push it, checkout C and unstash the unfinished work. If my
  understanding is correct, the new restriction is a no-go.
 
 Ok, this way of working is not supported. It just never occurred to me that
 you would work this way. Forgetting to push something that you have just 
 completed is very far from what I am used to. I think it comes most natural
 to push what you have done before changing topic. The reason I make a commit
 is to get it out of the door.

FWIW, I also think that we should keep the box which allows you to
select the branch to push. I did not realize that you were removing it
when I first glanced at your patch.

Even if your reasoning that pushing the currently checked out branch is
correct: This box has been around for too long, so it will annoy people
that got used to the fact that they can select the branch to push.

Another problem: It is not very intuitive to only select the branch to
push to. You can do that on the command line but IMO using

git push origin HEAD:refs/heads/branchname

is way less common than

git push origin branchname

and I think that should also be reflected in the gui. It might be more
common for a gerrit user but for the typical git user without gerrit it
is not.

So to make it easy for the user to transition from gui to commandline
and back with your patch I would expect: The user selects a branch
to push. The new Destination Branches section automatically selects/shows
the same name for the default case as destination (like the cli). So
if I only select the branch to push it behaves the same as before.

If you detect (I assume that is possible somehow) that the remote is a
gerrit remote: Push for Gerrit review would automatically be ticked and
the branch a git pull would merge (e.g. the one from branch.name.merge)
is selected as the destination branch under refs/for/... . If there is
no config for that, fallback to master.

This is what I would expect with no further extension of the current git
command line behavior and config options. So that way your patch will be
an *extension* and not a change of behavior.

Another unrelated thing that is currently left out: You can transport
the local branchname when pushing to the magical gerrit refs/for/... . I
would like to see that appended as well. But opposed to the branch
selection that is not a show stopper for the patch more a side note.

Cheers Heiko
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

2013-09-05 Thread Junio C Hamano

Brad King brad.k...@kitware.com writes:

 On 09/04/2013 05:27 PM, Junio C Hamano wrote:
 I am not saying the above is the best format, but the point is that
 the mode of the operation defines the structure

 Great, thanks for your comments.  Based on that I've prototyped a
 new format.  Rather than jumping straight to the patch, here is my
 proposed format documentation for review:

 -
 With `--stdin`, update-ref reads instructions from standard input and
 performs all modifications together.  Specify commands of the form:

   create SP ref SP newvalue LF
   update SP ref SP newvalue [SP oldvalue] LF
   delete SP ref [SP oldvalue] LF
   verify SP ref [SP oldvalue] LF
   option SP opt LF

 Quote fields containing whitespace as if they were strings in C source
 code.  Alternatively, use `-z` to specify commands without quoting:

   create SP ref NUL newvalue NUL
   update SP ref NUL newvalue NUL [oldvalue] NUL
   delete SP ref NUL [oldvalue] NUL
   verify SP ref NUL [oldvalue] NUL
   option SP opt NUL

That SP in '-z' format looks strange.  Was there a reason why NUL
was inappropriate?

 Lines of any other format or a repeated ref produce an error.
 Command meanings are:

 create::
   Create ref with newvalue only if it does not exist.

 update::
   Update ref to be newvalue, verifying oldvalue if given.
   Specify a zero newvalue to delete a ref and/or a zero
   oldvalue to make sure that a ref does not exist.

 delete::
   Delete ref, verifying oldvalue if given.

 verify::
   Verify ref against oldvalue but do not change it.  If
   oldvalue zero or missing, the ref must not exist.

 option::
   Specify an option to take effect for following commands.
   Valid options are `deref` and `no-deref` to specify whether
   to dereference symbolic refs.

This last one is somewhat peculiar, especially because it says
following command*s*.

How would I request to update refs HEAD and next in an all-or-none
fashion, while applying 'no-deref' only to HEAD but not next?

update refs/heads/next newvalue
option --no-deref
update HEAD newvalue

sounds somewhat convoluted.

When I said create or update in the message you are responding to,
I did not mean that we should have two separate commands.  The
regular command line does create-or-update; if it exists already, it
is an update, and if it does not, it is a create.

If we were to have two, I would say we should have:

create-or-update ref newvalue [oldvalue]
create-or-update-no-deref ref newvalue [oldvalue]

An old value of 0{40} would mean I should be the one creating; if
somebody else created it while I was preparing this request, please
fail.

Similarly for delete:

delete ref [oldvalue]
delete-no-deref ref [oldvalue]

Also how would one set the reflog message for the proposed update?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Reject non-ff pulls by default

2013-09-05 Thread Philip Oakley

From: Junio C Hamano gits...@pobox.com

Philip Oakley philipoak...@iee.org writes:

From: Junio C Hamano gits...@pobox.com

John Keeping j...@keeping.me.uk writes:

I think there are two distinct uses for pull, which boil down to:

(1) git pull

...
Peff already covered (1)---it is highly doubtful that a merge is
almost always wrong.  In fact, if that _were_ the case, we should
simply be defaulting to rebase, not failing the command and asking
between merge and rebase like jc/pull-training-wheel topic did.

We simply do not know what the user wants, as it heavily depends on
the project, so we ask the user to choose one (and stick to it).

We only offer a limited list. It won't be sufficient for all use
cases. It wasn't for me.

Very interesting. Tell us more.

What I do now is avoid Pull because of the hassle of fixing anything
that may have gone wrong.

Instead I now use a 'git fetch', followed by a 'push . (+etc:etc)' once 
I understand what I've got, or what I need to do different if wasn't a 
simple fast forward 'pull'.

When git pull stops because what was fetched in FETCH_HEAD does
not fast-forward, then what did _you_ do (and with the knowledge you
currently have, what would you do)?  In a single project, would you
choose to sometimes rebase and sometimes merge, and if so, what is
the choice depend on?  When I am on these selected branches, I want
to merge, but on other branches I want to rebase?

In my case I have two home machines (main Windows machine and an 
occasional Linux laptop, though not directly networked together) and 
github as my level group, and have MSysGit and git/git (on github) as 
true upstreams, though they haven't been named that way [Aside: we are 
short of a good name for one's 'across-stream server' that one uses for 
backup/transfer such as github].

I general now use a forced update to bring my local machine up to date 
relative to whatever is upstream or on my across stream server, such as 
when transferring development from one machine to the other (where 
overwrite is the desired action) - e.g. when testing on the Linux laptop 
and a few corrections, before patch preparation on the Windows machine 
(different levels of familiarity).

I occasionally will need to rebase my topic onto an updated git/master 
or git/pu if it is to be submitted upstream (patches to the list) or if 
upstream has moved, though I want to choose where I will rebase the 
topic onto. I don't need merging in that scenario, as I see those via 
your git repo ;-)

It's not clear to me that a single default that uses a merge or rebase, 
without a 'stop if' criteria would be of any help in my situation.

My thoughts are that the options on a fetch-pull are for the branch to 
be:

* Overwritte (--force) (i.e. a conflict scenario)
* Stop if not-ff (conflict scenario, this patch series)
* rebase existing onto tracked [not a conflict in terms of initiation]
* merge existing into tracked [not a conflict in terms of initiation]
* fast-forward (bring tracked onto existing) [desired]

Philip

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Problem setting up a shared git repository

2013-09-05 Thread Eyal Zinder

I hope it's not too inappropriate to send a random question your way, but I've 
exhausted all other means and am quite lost at the moment.. 

I'm trying to setup a distributed development repository with a central 
repository acting as the production copy.  I'm doing so on a Windows file share 
with no SSH / HTTP accessibility.  Basically each developer will have their own 
copy of the project, and the shared drive should have a copy of the master copy 
(prod/master branch), along with the work-tree files.  

The idea is that any developer should be able to do independent development, 
staged commits, etc.. then push to the central (origin) repository, and 
production scripts will reflect these changes upon a push.  

I got pretty close to this setup by creating a bare repository on the file 
share server (f:\GitDBs\foo.git), then cloning the bare repository onto the 
production path like so: 
git clone f:\GitDBs\foo.git foo  

I cloned the bare repository just the same onto my local dev path.. and 
proceeded with development. This worked fine, and I was able to push / pull 
changes into origin (bare repo), and then I would go to my prod (f:\foo) 
repository (clone of bare f:\GitDBs\foo.git), then pull the changes.. 

The problem I faced later on was in parallel development, when changes were 
made to a file in one repository, and at the same time other changes made to 
the same file in another repository..  I couldh't push changes from the dev\foo 
to prod\foo or to origin.. 

I'm completely lost at the moment.. I try to set --git-dir or --work-tree and I 
get mixed results.. either the setting is not allowed when working in bare 
repositories, or I can't run certain operations from the work-tree and 
git-dir.. like git status, which in a split git-dir / work-tree environment 
(on windows), returns an error which specifies the operation is invalid in a 
bare repository, or specifies that the work-tree is not recognized as a 
repository... :( 

Please help me.. 
I'm new to Git, but love it already.. I would hate it if I had to work without 
it.. 

Thank you in advance for all your help!


- Eyal
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 37/38] pack v4: introduce escape hatches in the name and path indexes

2013-09-05 Thread Nicolas Pitre

On Thu, 5 Sep 2013, Nicolas Pitre wrote:

 On Thu, 5 Sep 2013, Nicolas Pitre wrote:
 
  If the path or name index is zero, this means the entry data is to be
  found inline rather than being located in the dictionary table. This is
  there to allow easy completion of thin packs without having to add new
  table entries which would have required a full rewrite of the pack data.
  
  Signed-off-by: Nicolas Pitre n...@fluxnic.net
 
 I'm now dropping this patch.  Please also remove this from your 
 documentation patch.

Well... I couldn't resist another little change that has been nagging me 
for a while.

Both the author and committer time stamps are very closely related most 
of the time.  So the committer time stamp is now encoded as a difference 
against the author time stamp with the LSB indicating a negative 
difference.

On git.git this saves 0.3% on the pack size.  Not much, but still 
impressive for only a time stamp.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 07/11] Documentation/replace: tell that -f option bypasses the type check

2013-09-05 Thread Philip Oakley

From: Philip Oakley philipoak...@iee.org
Sent: Saturday, August 31, 2013 11:16 PM

From: Christian Couder chrisc...@tuxfamily.org

Signed-off-by: Christian Couder chrisc...@tuxfamily.org
---
Documentation/git-replace.txt | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-replace.txt 
b/Documentation/git-replace.txt

index 736b48c..a2bd2ee 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -21,10 +21,12 @@ replaced. The content of the 'replace' reference 
is the SHA-1 of the

replacement object.

The replaced object and the replacement object must be of the same 
type.

-There is no other restriction on them.
+This restriction can be bypassed using `-f`.

Unless `-f` is given, the 'replace' reference must not yet exist.

+There is no other restriction on the replaced and replacement 
objects.

Is this trying to allude to the fact that merge commits may be 
exchanged with non-merge commits? I strongly believe that this ability 
to exchange merge and non-merge commits should be stated _explicitly_ 
to counteract the false beliefs that are listed out on the internet.

It's probably better stated in a separate patch for that explicit 
purpose to avoid mixed messages within this commit.

Not sure how this method of preparing a comment patch will pan out..

---8

From a0c0e765cfd969c9c8a6ff3a2cb6b2f1391d2e7d Mon Sep 17 00:00:00 2001

From: Philip Oakley philipoak...@iee.org
Date: Thu, 5 Sep 2013 22:54:04 +0100
Subject: [PATCH] Doc: 'replace' merge and non-merge commits

Signed-off-by: Philip Oakley philipoak...@iee.org
---

This is supplemental to Christian Couder's 'replace' patch series
(2013-09-03 69dada4 (Christian Couder): t6050-replace: use some long
option names).

It adds the clarification that merge and non-merge commits are
replaceable.

Merges are often treated as special case objects so tell users that they 
are not special here.

---
Documentation/git-replace.txt | 1 +
1 file changed, 1 insertion(+)

diff --git a/Documentation/git-replace.txt 
b/Documentation/git-replace.txt

index 414000e..f373ab4 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -26,6 +26,7 @@ This restriction can be bypassed using `-f`.
Unless `-f` is given, the 'replace' reference must not yet exist.

There is no other restriction on the replaced and replacement objects.
+Merge commits can be replaced by non-merge commits and vice versa.

Replacement references will be used by default by all Git commands
except those doing reachability traversal (prune, pack transfer and
--
1.8.1.msysgit.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3] check-ignore: Add option to ignore index contents

2013-09-05 Thread Junio C Hamano

Dave Williams d...@opensourcesolutions.co.uk writes:

 check-ignore currently shows how .gitignore rules would treat untracked
 paths. Tracked paths do not generate useful output.  This prevents
 debugging of why a path became tracked unexpectedly unless that path is
 first removed from the index with `git rm --cached path`.

 This option (-i, --no-index) simply by-passes the check for the path
 being in the index and hence allows tracked paths to be checked too.

Now the long option name is --no-index, it makes me wonder if -i
is a good synonym for it, and the longer I stare at it, the more
certain I become convinced that it is a bad choice.

Do we even need a short-and-sweet one-letter option for this?  I'd
prefer starting with only the long option.

I came up with a squashable tweak to remove -i on top of this
patch; will tentatively queue this patch with it to 'pu'.

 In particlar Junio queried my approach in
 builtin/git-check-ignores.c that bypassed the functions that check
 a path is in the index as well as avoiding reading the index in
 the first place.

Thanks.

I think this version is cleaner without those if (!no_index) used
for special casing in the codeflow.  An empty index is a valid
state, in which the in-core index starts when any git program
begins.  Not reading the index should be the only thing necessary to
mimick the state in which nothing has been added to the index, and
if that is not the case, we have found a bug in the existing code.

 Regarding the test script I have tidied up the variables containing the
 separate option switches so they dont contain leading spaces, instead I
 have added spaces as needed when formatting the command line.  This
 not only improves my patch but also the existing code which was a little
 inconsistent in this respect.

Yeah, that's very much appreciated.

 Finally I have rebased from the latest commmit on master to pick up
 unrelated recent changes made to builtin/check-ignores.c and updated my
 code to be consistent with this.

 Hopefully I have put these patch notes in the right place now! Let me
 know if not.

 Dave

Nicely done.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] cherry-pick: allow - as abbreviation of '@{-1}'

2013-09-05 Thread Junio C Hamano

Hiroshige Umino hiroshig...@gmail.com writes:

 - abbreviation is handy for cherry-pick like checkout and merge.

 It's also good for uniformity that a - stands as
 the name of the previous branch where a branch name is
 accepted and it could not mean any other things like stdin.

 Signed-off-by: Hiroshige Umino hiroshig...@gmail.com
 ---

This makes sense to me.

The test t3500 is about git cherry command, so I came up with a
tweak to move it to t3501, which is about cherry-pick, on top of
this patch.  Will tentatively queue this patch with that tweak to
'pu'.

Thanks.

  builtin/revert.c  |  2 ++
  t/t3500-cherry.sh | 15 +++
  2 files changed, 17 insertions(+)

 diff --git a/builtin/revert.c b/builtin/revert.c
 index 8e87acd..52c35e7 100644
 --- a/builtin/revert.c
 +++ b/builtin/revert.c
 @@ -202,6 +202,8 @@ int cmd_cherry_pick(int argc, const char **argv, const 
 char *prefix)
   memset(opts, 0, sizeof(opts));
   opts.action = REPLAY_PICK;
   git_config(git_default_config, NULL);
 + if (!strcmp(argv[1], -))
 + argv[1] = @{-1};
   parse_args(argc, argv, opts);
   res = sequencer_pick_revisions(opts);
   if (res  0)
 diff --git a/t/t3500-cherry.sh b/t/t3500-cherry.sh
 index f038f34..547dbf8 100755
 --- a/t/t3500-cherry.sh
 +++ b/t/t3500-cherry.sh
 @@ -55,4 +55,19 @@ test_expect_success \
   expr $(echo $(git cherry master my-topic-branch) ) : + [^ ]* - .*
  '
  
 +test_expect_success \
 +'cherry-pick - does not work initially' \
 +'test_must_fail git cherry-pick -
 +'
 +
 +test_expect_success \
 +'cherry-pick the commit in the previous branch' \
 +'git branch other 
 + test_commit commit-to-pick newfile content 
 + echo content expected 
 + git checkout other 
 + git cherry-pick - 
 + test_cmp expected newfile
 +'
 +
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 07/11] Documentation/replace: tell that -f option bypasses the type check

2013-09-05 Thread Junio C Hamano

Philip Oakley philipoak...@iee.org writes:

 From: Philip Oakley philipoak...@iee.org
 Sent: Saturday, August 31, 2013 11:16 PM
 From: Christian Couder chrisc...@tuxfamily.org
 Signed-off-by: Christian Couder chrisc...@tuxfamily.org
 ---
 Documentation/git-replace.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/Documentation/git-replace.txt
 b/Documentation/git-replace.txt
 index 736b48c..a2bd2ee 100644
 --- a/Documentation/git-replace.txt
 +++ b/Documentation/git-replace.txt
 @@ -21,10 +21,12 @@ replaced. The content of the 'replace'
 reference is the SHA-1 of the
 replacement object.

 The replaced object and the replacement object must be of the same
 type.
 -There is no other restriction on them.
 +This restriction can be bypassed using `-f`.

 Unless `-f` is given, the 'replace' reference must not yet exist.

 +There is no other restriction on the replaced and replacement
 objects.

 Is this trying to allude to the fact that merge commits may be
 exchanged with non-merge commits? I strongly believe that this
 ability to exchange merge and non-merge commits should be stated
 _explicitly_ to counteract the false beliefs that are listed out on
 the internet.

 It's probably better stated in a separate patch for that explicit
 purpose to avoid mixed messages within this commit.

 Not sure how this method of preparing a comment patch will pan out..

 ---8

Make it --- 8 --- perhaps to balance out the perforation on both
sides.

 From a0c0e765cfd969c9c8a6ff3a2cb6b2f1391d2e7d Mon Sep 17 00:00:00 2001

Not needed nor wanted.

 From: Philip Oakley philipoak...@iee.org

Not needed but does not hurt.

 Date: Thu, 5 Sep 2013 22:54:04 +0100

Is OK but redundant given that your message has a timestamp when we
saw your patch for the first time anyway.

 Subject: [PATCH] Doc: 'replace' merge and non-merge commits

 Signed-off-by: Philip Oakley philipoak...@iee.org
 ---

 This is supplemental to Christian Couder's 'replace' patch series
 (2013-09-03 69dada4 (Christian Couder): t6050-replace: use some long
 option names).

 It adds the clarification that merge and non-merge commits are
 replaceable.

 Merges are often treated as special case objects so tell users that
 they are not special here.

I think the last paragraph deserves to be in the proposed commit log
message proper.  It explains why it is a good idea to have the added
line in the documentation very well.

 ---
 Documentation/git-replace.txt | 1 +
 1 file changed, 1 insertion(+)

 diff --git a/Documentation/git-replace.txt
 b/Documentation/git-replace.txt
 index 414000e..f373ab4 100644
 --- a/Documentation/git-replace.txt
 +++ b/Documentation/git-replace.txt
 @@ -26,6 +26,7 @@ This restriction can be bypassed using `-f`.
 Unless `-f` is given, the 'replace' reference must not yet exist.

 There is no other restriction on the replaced and replacement objects.
 +Merge commits can be replaced by non-merge commits and vice versa.

 Replacement references will be used by default by all Git commands
 except those doing reachability traversal (prune, pack transfer and
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] git-svn: Fix termination issues for remote svn connections

2013-09-05 Thread Junio C Hamano

Eric Wong normalper...@yhbt.net writes:

 Junio C Hamano gits...@pobox.com wrote:
 Uli Heller uli.hel...@daemons-point.com writes:
  Nevertheless, I think it makes sense to fix the issue within the
  git perl module Ra.pm, too. The change frees the private copy of
  the remote access object on termination which prevents the error
  from happening.

 Thanks.  Please sign-off your patch.
 
 I am Cc'ing Kyle McKay who apparently had some experience working
 with git-svn with newer svn that can only use serf, hoping that we
 can get an independent opinion/test just to be sure.  Also Cc'ed is
 Eric Wong who has been the official git-svn area expert, but I
 understand that Eric hasn't needed to use git-svn for quite a while,
 so it is perfectly fine if he does not have any comment on this one.
 
 We may want to find a volunteer to move git svn forward as a new
 area expert (aka subsystem maintainer), by the way.

 Correct, git-svn has the effect of being self-obsoleting.

 I agree with adding a workaround for broken things, however
 I suggest a code comment explaining why it is necessary.
 The commit message is important, too, but might get harder to track
 down if there's code movement/refactoring in the future.

Thanks for a good suggestion.  I agree that this addition is a good
example where in-code comment would really help the future readers.


  +END {
  +  $RA = undef;
  +  $ra_invalid = 1;
  +}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/5] Disable git status comment prefix

2013-09-05 Thread Junio C Hamano

Matthieu Moy matthieu@grenoble-inp.fr writes:

 Junio C Hamano gits...@pobox.com writes:

 One caveat, though.  The name oldStyle will become problematic,
 when we want to remove some wart in the output format long after
 this no comment prefix by default series lands.  Some people may
 expect setting oldStyle=true would give output from 1.8.4 era, while
 some others would expect that it would give output from 1.8.5 era
 that does not have comment prefix but still has that wart we will
 remove down the line.

 I'm fine with any name actually (since it is enabled by default, people
 don't need to know the name to benefit from the new output). Maybe
 status.displayCommentPrefix was the best name after all.

Yeah, I think so.  It makes it clear what kind of old behaviour the
user is asking for.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/5] Disable git status comment prefix

2013-09-05 Thread Jeff King

On Thu, Sep 05, 2013 at 09:36:47PM +0200, Matthieu Moy wrote:

 Junio C Hamano gits...@pobox.com writes:
 
  One caveat, though.  The name oldStyle will become problematic,
  when we want to remove some wart in the output format long after
  this no comment prefix by default series lands.  Some people may
  expect setting oldStyle=true would give output from 1.8.4 era, while
  some others would expect that it would give output from 1.8.5 era
  that does not have comment prefix but still has that wart we will
  remove down the line.
 
 I'm fine with any name actually (since it is enabled by default, people
 don't need to know the name to benefit from the new output). Maybe
 status.displayCommentPrefix was the best name after all.

FWIW, I had the same thought as Junio. I much prefer something like
status.displayCommentPrefix for clarity and future-proofing.

As for the feature itself, I am still undecided whether I like it. I've
tried looking at the output of the series, and it feels weird to me.

Part of it is undoubtedly that my brain is simply used to the other way.
But it also seems to drop some of the vertical whitespace, which makes
things feel very crowded. E.g., before:

  # On branch private
  # Your branch and 'origin/next' have diverged,
  # and have 472 and 59 different commits each, respectively.
  #
  # Untracked files:
  #   t/foo
  #   test-obj-pool
  #   test-string-pool
  #   test-treap
  #   test-url-normalize
  nothing added to commit but untracked files present

after:

  On branch private
  Your branch and 'origin/next' have diverged,
  and have 472 and 59 different commits each, respectively.
  Untracked files:
  t/foo
  test-obj-pool
  test-string-pool
  test-treap
  test-url-normalize
  nothing added to commit but untracked files present

The blank before Untracked files was dropped (was this intentional? I
didn't see any note of it in the discussion), and the bottom nothing
added line butts against the untracked list more obviously, because
they now all have the same comment indentation.

I wonder if it would look a little nicer as:

  On branch private
  Your branch and 'origin/next' have diverged,
  and have 472 and 59 different commits each, respectively.

  Untracked files:
  t/foo
  test-obj-pool
  test-string-pool
  test-treap
  test-url-normalize

  nothing added to commit but untracked files present

As an orthogonal thing to your patch, I feel like the first two items
(branch and ahead/behind) are kind of squished and oddly formatted (in
both the original and yours). Could we do something like:

  Your branch 'private' and its upstream 'origin/next' have diverged,
  and have 472 and 59 different commits each, respectively.

when we are going to print both?  That's 69 characters, which might
overrun 80 if you have long branch names, but we could also line-break
it differently.

That doesn't need to be part of your topic, but while we are talking
about the format of the message, maybe it is worth thinking about.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 123 matches

Mail list logo