date:20170808

Re: [PATCH] builtin/add: add detail to a 'cannot chmod' error message

2017-08-08 Thread Junio C Hamano

Ramsay Jones  writes:

> In addition to adding the missing newline, add the x-ecutable bit
> 'mode change' character to the error message. This message now has
> the same form as similar messages output by 'update-index'.
>
> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Junio,
>
> This is v2 of the earlier "add a newline" patch. Thanks!

Thanks; here is me reminding myself to apply this with Jonathan's
reviewed-by in the morning.

>
> ATB,
> Ramsay Jones
>
>  builtin/add.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index e888fb8c5..5d5773d5c 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -32,7 +32,7 @@ struct update_callback_data {
>   int add_errors;
>  };
>  
> -static void chmod_pathspec(struct pathspec *pathspec, int force_mode)
> +static void chmod_pathspec(struct pathspec *pathspec, char flip)
>  {
>   int i;
>  
> @@ -42,8 +42,8 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
> force_mode)
>   if (pathspec && !ce_path_match(ce, pathspec, NULL))
>   continue;
>  
> - if (chmod_cache_entry(ce, force_mode) < 0)
> - fprintf(stderr, "cannot chmod '%s'", ce->name);
> + if (chmod_cache_entry(ce, flip) < 0)
> + fprintf(stderr, "cannot chmod %cx '%s'\n", flip, 
> ce->name);
>   }
>  }

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Junio C Hamano

Jonathan Nieder  writes:

> Correction: the tutorial is now called gitcore-tutorial and mostly
> survives.  A search for -p --root failed because of v1.5.5.1~19^2
> (core-tutorial.txt: Fix showing the current behaviour, 2008-04-10).

Yeah, I was wondering why neither of you find it while reading your
exchanges on a phone.

> That said, the conclusion that this test has mostly bitrotted as far
> as its original purpose goes still applies.
>
> An alternative method of addressing the goal you described would be to
> fuzz object-id shaped things out of the sample output.  I don't have a
> strong preference, given how little this test contributes to coverage
> (as you mentioned) and how difficult it is to make it continue to
> match the documentation it is trying to test.
>
> Thanks and sorry for the confusion,

OK, so can one of you give the final version of the patch with
updated description?

It _would_ be nice to have an executable tutorial/documentation or
docs with built-in tests like some other systems have, but I realize
that it would be a different matter.  We'd need some toolchain to
mark up tests in our tutorial, extract and run them, and compare the
result, which we currently do not have and it won't fit under our
test framework very well anyway.

Thanks.

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Junio C Hamano

Jonathan Nieder  writes:

> I don't believe the force_mode without an 'x' provides a clear signal
> to the end user.  Perhaps you meant %cx?

Indeed you are right.  I think I saw Ramsay's v2 that has the 'x',
so let's use that version.

Thanks.

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Junio C Hamano

Jeff King  writes:

> On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote:
>
>> > Interesting.  I see that we still have the conditional code to call
>> > out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
>> > over there, I wonder?  Alternatively, as we have had the experimental
>> > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
>> > perhaps we should write it off as a failed experiment and retire it?
>> 
>> There is also sha1_pos(), which seems to have the same problem (and is
>> used in several places).
>
> Actually, I take it back. The problem happens when we enter the loop
> with no entries to look at. But both sha1_pos() and sha1_entry_pos()
> return early before hitting their do-while loops in that case.

Ah, I was not looking at that part of the code.  Thanks.

I still wonder if we want to retire that conditional invocation of
sha1_entry_pos(), though.

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

> Am 09.08.2017 um 00:26 schrieb Junio C Hamano:
>> ... but in the meantime, I think replacing the test with "0$" to
>> force the scanner to find either the end of line or the end of the
>> buffer may be a good workaround.  We do not have to care how many of
>> random bytes are in front of the last "0" in order to ensure that
>> the regexec_buf() does not overstep to 4097th byte, while seeing
>> that regexec() that does not know how long the haystack is has to do
>> so, no?
>
> Our regexec() calls strlen() (see my other reply).
>
> Using "0$" looks like the best option to me.

Yeah, it seems that way.  If we want to be close/faithful to the
original, we could do "^0*$", but the part that is essential to
trigger the old bug is not the "we have many zeroes" (or "we have
4096 zeroes") part, but "zero is at the end of the string" part, so
"0$" would be the minimal pattern that also would work for OBSD.

Dscho?

[PATCH v2 10/25] pack: move install_packed_git()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  4 ++--
 packfile.c  | 11 ++-
 sha1_file.c |  9 -
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/cache.h b/cache.h
index bf93477e8..41562dc0b 100644
--- a/cache.h
+++ b/cache.h
@@ -1611,7 +1611,6 @@ extern void (*report_garbage)(unsigned seen_bits, const 
char *path);
 
 extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
-extern void install_packed_git(struct packed_git *pack);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/pack.h b/pack.h
index c1f3ff32d..576c4fc7c 100644
--- a/pack.h
+++ b/pack.h
@@ -124,8 +124,6 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
-extern unsigned int pack_open_fds;
-
 extern void pack_report(void);
 
 /*
@@ -152,4 +150,6 @@ extern unsigned char *use_pack(struct packed_git *, struct 
pack_window **, off_t
 extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+extern void install_packed_git(struct packed_git *pack);
+
 #endif
diff --git a/packfile.c b/packfile.c
index efe0ed3e8..4eb65e460 100644
--- a/packfile.c
+++ b/packfile.c
@@ -28,7 +28,7 @@ static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
 static unsigned int pack_open_windows;
-unsigned int pack_open_fds;
+static unsigned int pack_open_fds;
 static unsigned int pack_max_fds;
 static size_t peak_pack_mapped;
 static size_t pack_mapped;
@@ -658,3 +658,12 @@ struct packed_git *add_packed_git(const char *path, size_t 
path_len, int local)
hashclr(p->sha1);
return p;
 }
+
+void install_packed_git(struct packed_git *pack)
+{
+   if (pack->pack_fd != -1)
+   pack_open_fds++;
+
+   pack->next = packed_git;
+   packed_git = pack;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 7f12b1ee0..b956ca0c9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,15 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void install_packed_git(struct packed_git *pack)
-{
-   if (pack->pack_fd != -1)
-   pack_open_fds++;
-
-   pack->next = packed_git;
-   packed_git = pack;
-}
-
 void (*report_garbage)(unsigned seen_bits, const char *path);
 
 static void report_helper(const struct string_list *list,
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 06/25] pack: move pack-closing functions

2017-08-08 Thread Jonathan Tan

The function close_pack_fd() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 builtin/am.c|  1 +
 builtin/clone.c |  1 +
 builtin/fetch.c |  1 +
 builtin/merge.c |  1 +
 cache.h |  8 
 pack.h  |  9 +
 packfile.c  | 54 ++
 sha1_file.c | 55 ---
 8 files changed, 67 insertions(+), 63 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c973bd96d..c38dd10a3 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -31,6 +31,7 @@
 #include "mailinfo.h"
 #include "apply.h"
 #include "string-list.h"
+#include "pack.h"
 
 /**
  * Returns 1 if the file is empty or does not exist, 0 otherwise.
diff --git a/builtin/clone.c b/builtin/clone.c
index 08b5cc433..53410a45d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -25,6 +25,7 @@
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
+#include "pack.h"
 
 /*
  * Overall FIXMEs:
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c87e59f3b..196a3bfc4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -17,6 +17,7 @@
 #include "connected.h"
 #include "argv-array.h"
 #include "utf8.h"
+#include "pack.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
diff --git a/builtin/merge.c b/builtin/merge.c
index 900bafdb4..9cff4b276 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -32,6 +32,7 @@
 #include "gpg-interface.h"
 #include "sequencer.h"
 #include "string-list.h"
+#include "pack.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
diff --git a/cache.h b/cache.h
index 5d6839525..25a21a61f 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,15 +1637,7 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * munmap the index file for the specified packfile (if it is
- * currently mmapped).
- */
-extern void close_pack_index(struct packed_git *);
-
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
-extern void close_pack_windows(struct packed_git *);
-extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/pack.h b/pack.h
index c16220586..fd4668528 100644
--- a/pack.h
+++ b/pack.h
@@ -147,4 +147,13 @@ extern int unuse_one_window(struct packed_git *current);
 
 extern void release_pack_memory(size_t);
 
+extern void close_pack_windows(struct packed_git *);
+extern int close_pack_fd(struct packed_git *);
+/*
+ * munmap the index file for the specified packfile (if it is
+ * currently mmapped).
+ */
+extern void close_pack_index(struct packed_git *);
+extern void close_all_packs(void);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 8daa74ad1..c8e2dbdee 100644
--- a/packfile.c
+++ b/packfile.c
@@ -257,3 +257,57 @@ void release_pack_memory(size_t need)
while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
; /* nothing */
 }
+
+void close_pack_windows(struct packed_git *p)
+{
+   while (p->windows) {
+   struct pack_window *w = p->windows;
+
+   if (w->inuse_cnt)
+   die("pack '%s' still has open windows to it",
+   p->pack_name);
+   munmap(w->base, w->len);
+   pack_mapped -= w->len;
+   pack_open_windows--;
+   p->windows = w->next;
+   free(w);
+   }
+}
+
+int close_pack_fd(struct packed_git *p)
+{
+   if (p->pack_fd < 0)
+   return 0;
+
+   close(p->pack_fd);
+   pack_open_fds--;
+   p->pack_fd = -1;
+
+   return 1;
+}
+
+void close_pack_index(struct packed_git *p)
+{
+   if (p->index_data) {
+   munmap((void *)p->index_data, p->index_size);
+   p->index_data = NULL;
+   }
+}
+
+static void close_pack(struct packed_git *p)
+{
+   close_pack_windows(p);
+   close_pack_fd(p);
+   close_pack_index(p);
+}
+
+void close_all_packs(void)
+{
+   struct packed_git *p;
+
+   for (p = packed_git; p; p = p->next)
+   if (p->do_not_close)
+   die("BUG: want to close pack marked 'do-not-close'");
+   else
+   close_pack(p);
+}
diff --git a/sha1_file.c b/sha1_file.c
index 644876e4e..e2927244f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,53 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void close_pack_windows(struct packed_git *p)
-{
-   while (p->windows) {
-   struct pack_window *w = p->windows;
-
-   if (w->inuse_cnt)
-   die("pack '%s' still has open windows to

[PATCH v2 11/25] pack: move {,re}prepare_packed_git and approximate_object_count

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/gc.c   |   1 +
 cache.h|  15 
 http-backend.c |   1 +
 pack.h |  15 
 packfile.c | 216 +
 path.c |   1 +
 server-info.c  |   1 +
 sha1_file.c| 214 
 8 files changed, 235 insertions(+), 229 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e6b84475a..f4fe023d3 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -19,6 +19,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "pack.h"
 
 #define FAILED_RUN "failed to run %s"
 
diff --git a/cache.h b/cache.h
index 41562dc0b..f020dfade 100644
--- a/cache.h
+++ b/cache.h
@@ -1603,21 +1603,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-/* A hook to report invalid files in pack directory */
-#define PACKDIR_FILE_PACK 1
-#define PACKDIR_FILE_IDX 2
-#define PACKDIR_FILE_GARBAGE 4
-extern void (*report_garbage)(unsigned seen_bits, const char *path);
-
-extern void prepare_packed_git(void);
-extern void reprepare_packed_git(void);
-
-/*
- * Give a rough count of objects in the repository. This sacrifices accuracy
- * for speed.
- */
-unsigned long approximate_object_count(void);
-
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
diff --git a/http-backend.c b/http-backend.c
index 519025d2c..12f7d421f 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -9,6 +9,7 @@
 #include "string-list.h"
 #include "url.h"
 #include "argv-array.h"
+#include "pack.h"
 
 static const char content_type[] = "Content-Type";
 static const char content_length[] = "Content-Length";
diff --git a/pack.h b/pack.h
index 576c4fc7c..cad5ed488 100644
--- a/pack.h
+++ b/pack.h
@@ -152,4 +152,19 @@ extern struct packed_git *add_packed_git(const char *path, 
size_t path_len, int
 
 extern void install_packed_git(struct packed_git *pack);
 
+/* A hook to report invalid files in pack directory */
+#define PACKDIR_FILE_PACK 1
+#define PACKDIR_FILE_IDX 2
+#define PACKDIR_FILE_GARBAGE 4
+extern void (*report_garbage)(unsigned seen_bits, const char *path);
+
+extern void prepare_packed_git(void);
+extern void reprepare_packed_git(void);
+
+/*
+ * Give a rough count of objects in the repository. This sacrifices accuracy
+ * for speed.
+ */
+unsigned long approximate_object_count(void);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 4eb65e460..a517172f7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,6 +1,8 @@
 #include "cache.h"
 #include "mru.h"
 #include "pack.h"
+#include "dir.h"
+#include "mergesort.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -667,3 +669,217 @@ void install_packed_git(struct packed_git *pack)
pack->next = packed_git;
packed_git = pack;
 }
+
+void (*report_garbage)(unsigned seen_bits, const char *path);
+
+static void report_helper(const struct string_list *list,
+ int seen_bits, int first, int last)
+{
+   if (seen_bits == (PACKDIR_FILE_PACK|PACKDIR_FILE_IDX))
+   return;
+
+   for (; first < last; first++)
+   report_garbage(seen_bits, list->items[first].string);
+}
+
+static void report_pack_garbage(struct string_list *list)
+{
+   int i, baselen = -1, first = 0, seen_bits = 0;
+
+   if (!report_garbage)
+   return;
+
+   string_list_sort(list);
+
+   for (i = 0; i < list->nr; i++) {
+   const char *path = list->items[i].string;
+   if (baselen != -1 &&
+   strncmp(path, list->items[first].string, baselen)) {
+   report_helper(list, seen_bits, first, i);
+   baselen = -1;
+   seen_bits = 0;
+   }
+   if (baselen == -1) {
+   const char *dot = strrchr(path, '.');
+   if (!dot) {
+   report_garbage(PACKDIR_FILE_GARBAGE, path);
+   continue;
+   }
+   baselen = dot - path + 1;
+   first = i;
+   }
+   if (!strcmp(path + baselen, "pack"))
+   seen_bits |= 1;
+   else if (!strcmp(path + baselen, "idx"))
+   seen_bits |= 2;
+   }
+   report_helper(list, seen_bits, first, list->nr);
+}
+
+static void prepare_packed_git_one(char *objdir, int local)
+{
+   struct strbuf path = STRBUF_INIT;
+   size_t dirnamelen;
+   DIR *dir;
+   struct dirent *de;
+   struct string_list garbage = STRING_LIST_INIT_DUP;
+
+   strbuf_addstr(, objdir);
+   strbuf_addstr(, "/pack");
+   dir = opendir(path.buf);
+   if (!dir) {
+   if (errno != ENOENT)
+   error_errno("unable to open

[PATCH v2 12/25] pack: move unpack_object_header()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  2 ++
 packfile.c  | 25 +
 sha1_file.c | 25 -
 4 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index f020dfade..9c70759a6 100644
--- a/cache.h
+++ b/cache.h
@@ -1661,7 +1661,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
diff --git a/pack.h b/pack.h
index cad5ed488..4a7f88a38 100644
--- a/pack.h
+++ b/pack.h
@@ -167,4 +167,6 @@ extern void reprepare_packed_git(void);
  */
 unsigned long approximate_object_count(void);
 
+extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
+
 #endif
diff --git a/packfile.c b/packfile.c
index a517172f7..6e4f1c6e3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -883,3 +883,28 @@ void reprepare_packed_git(void)
prepare_packed_git_run_once = 0;
prepare_packed_git();
 }
+
+unsigned long unpack_object_header_buffer(const unsigned char *buf,
+   unsigned long len, enum object_type *type, unsigned long *sizep)
+{
+   unsigned shift;
+   unsigned long size, c;
+   unsigned long used = 0;
+
+   c = buf[used++];
+   *type = (c >> 4) & 7;
+   size = c & 15;
+   shift = 4;
+   while (c & 0x80) {
+   if (len <= used || bitsizeof(long) <= shift) {
+   error("bad object header");
+   size = used = 0;
+   break;
+   }
+   c = buf[used++];
+   size += (c & 0x7f) << shift;
+   shift += 7;
+   }
+   *sizep = size;
+   return used;
+}
diff --git a/sha1_file.c b/sha1_file.c
index bbce60f1c..1f4b4ba2c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -913,31 +913,6 @@ void *map_sha1_file(const unsigned char *sha1, unsigned 
long *size)
return map_sha1_file_1(NULL, sha1, size);
 }
 
-unsigned long unpack_object_header_buffer(const unsigned char *buf,
-   unsigned long len, enum object_type *type, unsigned long *sizep)
-{
-   unsigned shift;
-   unsigned long size, c;
-   unsigned long used = 0;
-
-   c = buf[used++];
-   *type = (c >> 4) & 7;
-   size = c & 15;
-   shift = 4;
-   while (c & 0x80) {
-   if (len <= used || bitsizeof(long) <= shift) {
-   error("bad object header");
-   size = used = 0;
-   break;
-   }
-   c = buf[used++];
-   size += (c & 0x7f) << shift;
-   shift += 7;
-   }
-   *sizep = size;
-   return used;
-}
-
 static int unpack_sha1_short_header(git_zstream *stream,
unsigned char *map, unsigned long mapsize,
void *buffer, unsigned long bufsiz)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 09/25] pack: move add_packed_git()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 connected.c |  1 +
 pack.h  |  1 +
 packfile.c  | 53 +
 sha1_file.c | 61 -
 5 files changed, 55 insertions(+), 62 deletions(-)

diff --git a/cache.h b/cache.h
index 4812f3a63..bf93477e8 100644
--- a/cache.h
+++ b/cache.h
@@ -1638,7 +1638,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
 extern int odb_pack_keep(const char *name);
 
 extern void clear_delta_base_cache(void);
-extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
diff --git a/connected.c b/connected.c
index 136c2ac16..3e3f0148c 100644
--- a/connected.c
+++ b/connected.c
@@ -3,6 +3,7 @@
 #include "sigchain.h"
 #include "connected.h"
 #include "transport.h"
+#include "pack.h"
 
 /*
  * If we feed all the commits we want to verify to this command
diff --git a/pack.h b/pack.h
index 3876e9ae6..c1f3ff32d 100644
--- a/pack.h
+++ b/pack.h
@@ -150,5 +150,6 @@ extern int open_packed_git(struct packed_git *p);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
+extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 93526ea7b..efe0ed3e8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -605,3 +605,56 @@ void unuse_pack(struct pack_window **w_cursor)
*w_cursor = NULL;
}
 }
+
+static void try_to_free_pack_memory(size_t size)
+{
+   release_pack_memory(size);
+}
+
+struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
+{
+   static int have_set_try_to_free_routine;
+   struct stat st;
+   size_t alloc;
+   struct packed_git *p;
+
+   if (!have_set_try_to_free_routine) {
+   have_set_try_to_free_routine = 1;
+   set_try_to_free_routine(try_to_free_pack_memory);
+   }
+
+   /*
+* Make sure a corresponding .pack file exists and that
+* the index looks sane.
+*/
+   if (!strip_suffix_mem(path, _len, ".idx"))
+   return NULL;
+
+   /*
+* ".pack" is long enough to hold any suffix we're adding (and
+* the use xsnprintf double-checks that)
+*/
+   alloc = st_add3(path_len, strlen(".pack"), 1);
+   p = alloc_packed_git(alloc);
+   memcpy(p->pack_name, path, path_len);
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
+   if (!access(p->pack_name, F_OK))
+   p->pack_keep = 1;
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
+   if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
+   free(p);
+   return NULL;
+   }
+
+   /* ok, it looks sane as far as we can check without
+* actually mapping the pack file.
+*/
+   p->pack_size = st.st_size;
+   p->pack_local = local;
+   p->mtime = st.st_mtime;
+   if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
+   hashclr(p->sha1);
+   return p;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 12501ef06..7f12b1ee0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,67 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-static struct packed_git *alloc_packed_git(int extra)
-{
-   struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-   memset(p, 0, sizeof(*p));
-   p->pack_fd = -1;
-   return p;
-}
-
-static void try_to_free_pack_memory(size_t size)
-{
-   release_pack_memory(size);
-}
-
-struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
-{
-   static int have_set_try_to_free_routine;
-   struct stat st;
-   size_t alloc;
-   struct packed_git *p;
-
-   if (!have_set_try_to_free_routine) {
-   have_set_try_to_free_routine = 1;
-   set_try_to_free_routine(try_to_free_pack_memory);
-   }
-
-   /*
-* Make sure a corresponding .pack file exists and that
-* the index looks sane.
-*/
-   if (!strip_suffix_mem(path, _len, ".idx"))
-   return NULL;
-
-   /*
-* ".pack" is long enough to hold any suffix we're adding (and
-* the use xsnprintf double-checks that)
-*/
-   alloc = st_add3(path_len, strlen(".pack"), 1);
-   p = alloc_packed_git(alloc);
-   memcpy(p->pack_name, path, path_len);
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
-   if (!access(p->pack_name, F_OK))
-   p->pack_keep = 1;
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
-   if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
-

[PATCH v2 20/25] pack: move find_pack_entry_one(), is_pack_valid()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  8 --
 pack.h  | 10 ++--
 packfile.c  | 85 +
 sha1_file.c | 84 
 4 files changed, 93 insertions(+), 94 deletions(-)

diff --git a/cache.h b/cache.h
index 7686ccb30..b944aca69 100644
--- a/cache.h
+++ b/cache.h
@@ -1618,14 +1618,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * If the object named sha1 is present in the specified packfile,
- * return its offset within the packfile; otherwise, return 0.
- */
-extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
-
-extern int is_pack_valid(struct packed_git *);
-
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
diff --git a/pack.h b/pack.h
index e0e206e3c..f5bd94813 100644
--- a/pack.h
+++ b/pack.h
@@ -144,8 +144,6 @@ extern void close_pack_windows(struct packed_git *);
 extern void close_pack_index(struct packed_git *);
 extern void close_all_packs(void);
 
-extern int open_packed_git(struct packed_git *p);
-
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
@@ -212,4 +210,12 @@ extern void check_pack_index_ptr(const struct packed_git 
*p, const void *ptr);
  */
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
 
+/*
+ * If the object named sha1 is present in the specified packfile,
+ * return its offset within the packfile; otherwise, return 0.
+ */
+extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
+
+extern int is_pack_valid(struct packed_git *);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 94c8af991..71017d2ec 100644
--- a/packfile.c
+++ b/packfile.c
@@ -6,6 +6,7 @@
 #include "delta.h"
 #include "list.h"
 #include "streaming.h"
+#include "sha1-lookup.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -1698,3 +1699,87 @@ off_t nth_packed_object_offset(const struct packed_git 
*p, uint32_t n)
   ntohl(*((uint32_t *)(index + 4)));
}
 }
+
+off_t find_pack_entry_one(const unsigned char *sha1,
+ struct packed_git *p)
+{
+   const uint32_t *level1_ofs = p->index_data;
+   const unsigned char *index = p->index_data;
+   unsigned hi, lo, stride;
+   static int use_lookup = -1;
+   static int debug_lookup = -1;
+
+   if (debug_lookup < 0)
+   debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
+
+   if (!index) {
+   if (open_pack_index(p))
+   return 0;
+   level1_ofs = p->index_data;
+   index = p->index_data;
+   }
+   if (p->index_version > 1) {
+   level1_ofs += 2;
+   index += 8;
+   }
+   index += 4 * 256;
+   hi = ntohl(level1_ofs[*sha1]);
+   lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1]));
+   if (p->index_version > 1) {
+   stride = 20;
+   } else {
+   stride = 24;
+   index += 4;
+   }
+
+   if (debug_lookup)
+   printf("%02x%02x%02x... lo %u hi %u nr %"PRIu32"\n",
+  sha1[0], sha1[1], sha1[2], lo, hi, p->num_objects);
+
+   if (use_lookup < 0)
+   use_lookup = !!getenv("GIT_USE_LOOKUP");
+   if (use_lookup) {
+   int pos = sha1_entry_pos(index, stride, 0,
+lo, hi, p->num_objects, sha1);
+   if (pos < 0)
+   return 0;
+   return nth_packed_object_offset(p, pos);
+   }
+
+   do {
+   unsigned mi = (lo + hi) / 2;
+   int cmp = hashcmp(index + mi * stride, sha1);
+
+   if (debug_lookup)
+   printf("lo %u hi %u rg %u mi %u\n",
+  lo, hi, hi - lo, mi);
+   if (!cmp)
+   return nth_packed_object_offset(p, mi);
+   if (cmp > 0)
+   hi = mi;
+   else
+   lo = mi+1;
+   } while (lo < hi);
+   return 0;
+}
+
+int is_pack_valid(struct packed_git *p)
+{
+   /* An already open pack is known to be valid. */
+   if (p->pack_fd != -1)
+   return 1;
+
+   /* If the pack has one window completely covering the
+* file size, the pack is known to be valid even if
+* the descriptor is not currently open.
+*/
+   if (p->windows) {
+   struct pack_window *w = p->windows;
+
+   if (!w->offset && w->len ==

[PATCH v2 21/25] pack: move find_sha1_pack()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h   |  3 ---
 http-push.c   |  1 +
 http-walker.c |  1 +
 pack.h|  3 +++
 packfile.c| 13 +
 sha1_file.c   | 13 -
 6 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/cache.h b/cache.h
index b944aca69..06a8caae6 100644
--- a/cache.h
+++ b/cache.h
@@ -1600,9 +1600,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
-struct packed_git *packs);
-
 /*
  * Create a temporary file rooted in the object database directory, or
  * die on failure. The filename is taken from "pattern", which should have the
diff --git a/http-push.c b/http-push.c
index c91f40a61..4e8a227d1 100644
--- a/http-push.c
+++ b/http-push.c
@@ -11,6 +11,7 @@
 #include "list-objects.h"
 #include "sigchain.h"
 #include "argv-array.h"
+#include "pack.h"
 
 #ifdef EXPAT_NEEDS_XMLPARSE_H
 #include 
diff --git a/http-walker.c b/http-walker.c
index ee049cb13..d6f0af944 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -4,6 +4,7 @@
 #include "http.h"
 #include "list.h"
 #include "transport.h"
+#include "pack.h"
 
 struct alt_base {
char *base;
diff --git a/pack.h b/pack.h
index f5bd94813..0517d6542 100644
--- a/pack.h
+++ b/pack.h
@@ -218,4 +218,7 @@ extern off_t find_pack_entry_one(const unsigned char *sha1, 
struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 
+extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
+struct packed_git *packs);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 71017d2ec..f16b56262 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1783,3 +1783,16 @@ int is_pack_valid(struct packed_git *p)
/* Force the pack to open to prove its valid. */
return !open_packed_git(p);
 }
+
+struct packed_git *find_sha1_pack(const unsigned char *sha1,
+ struct packed_git *packs)
+{
+   struct packed_git *p;
+
+   for (p = packs; p; p = p->next) {
+   if (find_pack_entry_one(sha1, p))
+   return p;
+   }
+   return NULL;
+
+}
diff --git a/sha1_file.c b/sha1_file.c
index 75b9ceb39..229358663 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1126,19 +1126,6 @@ static int find_pack_entry(const unsigned char *sha1, 
struct pack_entry *e)
return 0;
 }
 
-struct packed_git *find_sha1_pack(const unsigned char *sha1,
- struct packed_git *packs)
-{
-   struct packed_git *p;
-
-   for (p = packs; p; p = p->next) {
-   if (find_pack_entry_one(sha1, p))
-   return p;
-   }
-   return NULL;
-
-}
-
 static int sha1_loose_object_info(const unsigned char *sha1,
  struct object_info *oi,
  int flags)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 19/25] pack: move check_pack_index_ptr(), nth_packed_object_offset()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 16 
 pack.h  | 16 
 packfile.c  | 33 +
 sha1_file.c | 33 -
 4 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/cache.h b/cache.h
index f083d532e..7686ccb30 100644
--- a/cache.h
+++ b/cache.h
@@ -1618,22 +1618,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * Make sure that a pointer access into an mmap'd index file is within bounds,
- * and can provide at least 8 bytes of data.
- *
- * Note that this is only necessary for variable-length segments of the file
- * (like the 64-bit extended offset table), as we compare the size to the
- * fixed-length parts when we open the file.
- */
-extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
-
-/*
- * Return the offset of the nth object within the specified packfile.
- * The index must already be opened.
- */
-extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
-
 /*
  * If the object named sha1 is present in the specified packfile,
  * return its offset within the packfile; otherwise, return 0.
diff --git a/pack.h b/pack.h
index 023c97b37..e0e206e3c 100644
--- a/pack.h
+++ b/pack.h
@@ -196,4 +196,20 @@ extern const unsigned char *nth_packed_object_sha1(struct 
packed_git *, uint32_t
  */
 extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
 
+/*
+ * Make sure that a pointer access into an mmap'd index file is within bounds,
+ * and can provide at least 8 bytes of data.
+ *
+ * Note that this is only necessary for variable-length segments of the file
+ * (like the 64-bit extended offset table), as we compare the size to the
+ * fixed-length parts when we open the file.
+ */
+extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
+
+/*
+ * Return the offset of the nth object within the specified packfile.
+ * The index must already be opened.
+ */
+extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
+
 #endif
diff --git a/packfile.c b/packfile.c
index b16cf648a..94c8af991 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1665,3 +1665,36 @@ const struct object_id *nth_packed_object_oid(struct 
object_id *oid,
hashcpy(oid->hash, hash);
return oid;
 }
+
+void check_pack_index_ptr(const struct packed_git *p, const void *vptr)
+{
+   const unsigned char *ptr = vptr;
+   const unsigned char *start = p->index_data;
+   const unsigned char *end = start + p->index_size;
+   if (ptr < start)
+   die(_("offset before start of pack index for %s (corrupt 
index?)"),
+   p->pack_name);
+   /* No need to check for underflow; .idx files must be at least 8 bytes 
*/
+   if (ptr >= end - 8)
+   die(_("offset beyond end of pack index for %s (truncated 
index?)"),
+   p->pack_name);
+}
+
+off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
+{
+   const unsigned char *index = p->index_data;
+   index += 4 * 256;
+   if (p->index_version == 1) {
+   return ntohl(*((uint32_t *)(index + 24 * n)));
+   } else {
+   uint32_t off;
+   index += 8 + p->num_objects * (20 + 4);
+   off = ntohl(*((uint32_t *)(index + 4 * n)));
+   if (!(off & 0x8000))
+   return off;
+   index += p->num_objects * 4 + (off & 0x7fff) * 8;
+   check_pack_index_ptr(p, index);
+   return (((uint64_t)ntohl(*((uint32_t *)(index + 0 << 32) |
+  ntohl(*((uint32_t *)(index + 4)));
+   }
+}
diff --git a/sha1_file.c b/sha1_file.c
index 4cd2b1809..0f4d68c5a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1073,39 +1073,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-void check_pack_index_ptr(const struct packed_git *p, const void *vptr)
-{
-   const unsigned char *ptr = vptr;
-   const unsigned char *start = p->index_data;
-   const unsigned char *end = start + p->index_size;
-   if (ptr < start)
-   die(_("offset before start of pack index for %s (corrupt 
index?)"),
-   p->pack_name);
-   /* No need to check for underflow; .idx files must be at least 8 bytes 
*/
-   if (ptr >= end - 8)
-   die(_("offset beyond end of pack index for %s (truncated 
index?)"),
-   p->pack_name);
-}
-
-off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
-{
-   const unsigned char *index = p->index_data;
-   index += 4 * 256;
-   if (p->index_version == 1) {
-   return ntohl(*((uint32_t *)(index + 24 * n)));
-   } else {
-

[PATCH v2 16/25] sha1_file: remove read_packed_sha1()

2017-08-08 Thread Jonathan Tan

Use read_object() in its place instead. This avoids duplication of code.

This makes force_object_loose() slightly slower (because of a redundant
check of loose object storage), but only in the error case.

Signed-off-by: Jonathan Tan 
---
 sha1_file.c | 26 +-
 1 file changed, 1 insertion(+), 25 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 9eadda388..9e5444334 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2091,30 +2091,6 @@ int sha1_object_info(const unsigned char *sha1, unsigned 
long *sizep)
return type;
 }
 
-static void *read_packed_sha1(const unsigned char *sha1,
- enum object_type *type, unsigned long *size)
-{
-   struct pack_entry e;
-   void *data;
-
-   if (!find_pack_entry(sha1, ))
-   return NULL;
-   data = cache_or_unpack_entry(e.p, e.offset, size, type);
-   if (!data) {
-   /*
-* We're probably in deep shit, but let's try to fetch
-* the required object anyway from another pack or loose.
-* This should happen only in the presence of a corrupted
-* pack, and is better than failing outright.
-*/
-   error("failed to read object %s at offset %"PRIuMAX" from %s",
- sha1_to_hex(sha1), (uintmax_t)e.offset, e.p->pack_name);
-   mark_bad_packed_object(e.p, sha1);
-   data = read_object(sha1, type, size);
-   }
-   return data;
-}
-
 int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
  unsigned char *sha1)
 {
@@ -2497,7 +2473,7 @@ int force_object_loose(const unsigned char *sha1, time_t 
mtime)
 
if (has_loose_object(sha1))
return 0;
-   buf = read_packed_sha1(sha1, , );
+   buf = read_object(sha1, , );
if (!buf)
return error("cannot read sha1_file for %s", sha1_to_hex(sha1));
hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(type), len) + 1;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 18/25] pack: move nth_packed_object_{sha1,oid}

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 14 --
 pack.h  | 14 ++
 packfile.c  | 31 +++
 sha1_file.c | 31 ---
 4 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/cache.h b/cache.h
index b14098bf1..f083d532e 100644
--- a/cache.h
+++ b/cache.h
@@ -1628,20 +1628,6 @@ extern int odb_pack_keep(const char *name);
  */
 extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
 
-/*
- * Return the SHA-1 of the nth object within the specified packfile.
- * Open the index if it is not already open.  The return value points
- * at the SHA-1 within the mmapped index.  Return NULL if there is an
- * error.
- */
-extern const unsigned char *nth_packed_object_sha1(struct packed_git *, 
uint32_t n);
-/*
- * Like nth_packed_object_sha1, but write the data into the object specified by
- * the the first argument.  Returns the first argument on success, and NULL on
- * error.
- */
-extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
-
 /*
  * Return the offset of the nth object within the specified packfile.
  * The index must already be opened.
diff --git a/pack.h b/pack.h
index 2e6f357c3..023c97b37 100644
--- a/pack.h
+++ b/pack.h
@@ -182,4 +182,18 @@ extern void *unpack_entry(struct packed_git *, off_t, enum 
object_type *, unsign
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
 extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
 
+/*
+ * Return the SHA-1 of the nth object within the specified packfile.
+ * Open the index if it is not already open.  The return value points
+ * at the SHA-1 within the mmapped index.  Return NULL if there is an
+ * error.
+ */
+extern const unsigned char *nth_packed_object_sha1(struct packed_git *, 
uint32_t n);
+/*
+ * Like nth_packed_object_sha1, but write the data into the object specified by
+ * the the first argument.  Returns the first argument on success, and NULL on
+ * error.
+ */
+extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
+
 #endif
diff --git a/packfile.c b/packfile.c
index a3745f9df..b16cf648a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1634,3 +1634,34 @@ void *unpack_entry(struct packed_git *p, off_t 
obj_offset,
 
return data;
 }
+
+const unsigned char *nth_packed_object_sha1(struct packed_git *p,
+   uint32_t n)
+{
+   const unsigned char *index = p->index_data;
+   if (!index) {
+   if (open_pack_index(p))
+   return NULL;
+   index = p->index_data;
+   }
+   if (n >= p->num_objects)
+   return NULL;
+   index += 4 * 256;
+   if (p->index_version == 1) {
+   return index + 24 * n + 4;
+   } else {
+   index += 8;
+   return index + 20 * n;
+   }
+}
+
+const struct object_id *nth_packed_object_oid(struct object_id *oid,
+ struct packed_git *p,
+ uint32_t n)
+{
+   const unsigned char *hash = nth_packed_object_sha1(p, n);
+   if (!hash)
+   return NULL;
+   hashcpy(oid->hash, hash);
+   return oid;
+}
diff --git a/sha1_file.c b/sha1_file.c
index fe7e0db76..4cd2b1809 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1073,37 +1073,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-const unsigned char *nth_packed_object_sha1(struct packed_git *p,
-   uint32_t n)
-{
-   const unsigned char *index = p->index_data;
-   if (!index) {
-   if (open_pack_index(p))
-   return NULL;
-   index = p->index_data;
-   }
-   if (n >= p->num_objects)
-   return NULL;
-   index += 4 * 256;
-   if (p->index_version == 1) {
-   return index + 24 * n + 4;
-   } else {
-   index += 8;
-   return index + 20 * n;
-   }
-}
-
-const struct object_id *nth_packed_object_oid(struct object_id *oid,
- struct packed_git *p,
- uint32_t n)
-{
-   const unsigned char *hash = nth_packed_object_sha1(p, n);
-   if (!hash)
-   return NULL;
-   hashcpy(oid->hash, hash);
-   return oid;
-}
-
 void check_pack_index_ptr(const struct packed_git *p, const void *vptr)
 {
const unsigned char *ptr = vptr;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 15/25] sha1_file: set whence in storage-specific info fn

2017-08-08 Thread Jonathan Tan

Move the setting of oi->whence to sha1_loose_object_info() and
packed_object_info(). This allows sha1_object_info_extended() to not
need to know about the delta base cache.

Signed-off-by: Jonathan Tan 
---
 sha1_file.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index f3bcdae17..9eadda388 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1473,6 +1473,9 @@ int packed_object_info(struct packed_git *p, off_t 
obj_offset,
hashclr(oi->delta_base_sha1);
}
 
+   oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
+ OI_PACKED;
+
 out:
unuse_pack(_curs);
return type;
@@ -2002,6 +2005,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
if (oi->sizep == _scratch)
oi->sizep = NULL;
strbuf_release();
+   oi->whence = OI_LOOSE;
return (status < 0) ? status : 0;
 }
 
@@ -2039,10 +2043,8 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
 
if (!find_pack_entry(real, )) {
/* Most likely it's a loose object. */
-   if (!sha1_loose_object_info(real, oi, flags)) {
-   oi->whence = OI_LOOSE;
+   if (!sha1_loose_object_info(real, oi, flags))
return 0;
-   }
 
/* Not a loose object; someone else may have just packed it. */
if (flags & OBJECT_INFO_QUICK) {
@@ -2065,10 +2067,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
if (rtype < 0) {
mark_bad_packed_object(e.p, real);
return sha1_object_info_extended(real, oi, 0);
-   } else if (in_delta_base_cache(e.p, e.offset)) {
-   oi->whence = OI_DBCACHED;
-   } else {
-   oi->whence = OI_PACKED;
+   } else if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 17/25] pack: move packed_object_info(), unpack_entry()

2017-08-08 Thread Jonathan Tan

Both sha1_file.c and packfile.c now need read_object(), so a copy of
read_object() was created in packfile.c.

This patch makes both mark_bad_packed_object() and has_packed_and_bad()
global. Unlike most of the other patches in this series, these 2
functions need to remain global.

Signed-off-by: Jonathan Tan 
---
 cache.h |   7 -
 pack.h  |  11 +
 packfile.c  | 660 ++
 sha1_file.c | 676 ++--
 4 files changed, 685 insertions(+), 669 deletions(-)

diff --git a/cache.h b/cache.h
index 1aea0..b14098bf1 100644
--- a/cache.h
+++ b/cache.h
@@ -1186,9 +1186,6 @@ extern void *map_sha1_file(const unsigned char *sha1, 
unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
 extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
 
-/* global flag to enable extra checks when accessing packed objects */
-extern int do_check_packed_object_crc;
-
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned 
long size, const char *type);
 
 extern int finalize_object_file(const char *tmpfile, const char *filename);
@@ -1621,8 +1618,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern void clear_delta_base_cache(void);
-
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
  * and can provide at least 8 bytes of data.
@@ -1660,7 +1655,6 @@ extern off_t nth_packed_object_offset(const struct 
packed_git *, uint32_t n);
 extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
 
 extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
 
 /*
  * Iterate over the files in the loose-object parts of the object
@@ -1771,7 +1765,6 @@ struct object_info {
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
-extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
 
 /* Dumb servers support */
 extern int update_server_info(int);
diff --git a/pack.h b/pack.h
index 5e3552392..2e6f357c3 100644
--- a/pack.h
+++ b/pack.h
@@ -171,4 +171,15 @@ extern unsigned long unpack_object_header_buffer(const 
unsigned char *buf, unsig
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
+extern void clear_delta_base_cache(void);
+
+/* global flag to enable extra checks when accessing packed objects */
+extern int do_check_packed_object_crc;
+
+extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
+extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
+
+extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
+extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
index a4db78ea0..a3745f9df 100644
--- a/packfile.c
+++ b/packfile.c
@@ -4,6 +4,8 @@
 #include "dir.h"
 #include "mergesort.h"
 #include "delta.h"
+#include "list.h"
+#include "streaming.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -974,3 +976,661 @@ int unpack_object_header(struct packed_git *p,
 
return type;
 }
+
+void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1)
+{
+   unsigned i;
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i))
+   return;
+   p->bad_object_sha1 = xrealloc(p->bad_object_sha1,
+ st_mult(GIT_MAX_RAWSZ,
+ st_add(p->num_bad_objects, 1)));
+   hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1);
+   p->num_bad_objects++;
+}
+
+const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
+{
+   struct packed_git *p;
+   unsigned i;
+
+   for (p = packed_git; p; p = p->next)
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
+   return p;
+   return NULL;
+}
+
+static off_t get_delta_base(struct packed_git *p,
+   struct pack_window **w_curs,
+   off_t *curpos,
+   enum object_type type,
+   off_t delta_obj_offset)
+{
+

[PATCH v2 23/25] pack: move has_sha1_pack()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/prune-packed.c | 1 +
 cache.h| 2 --
 diff.c | 1 +
 pack.h | 2 ++
 packfile.c | 6 ++
 revision.c | 1 +
 sha1_file.c| 6 --
 7 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index ac978ad40..79130aa2e 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -2,6 +2,7 @@
 #include "cache.h"
 #include "progress.h"
 #include "parse-options.h"
+#include "pack.h"
 
 static const char * const prune_packed_usage[] = {
N_("git prune-packed [-n | --dry-run] [-q | --quiet]"),
diff --git a/cache.h b/cache.h
index 06a8caae6..d96d36d50 100644
--- a/cache.h
+++ b/cache.h
@@ -1190,8 +1190,6 @@ extern int check_sha1_signature(const unsigned char 
*sha1, void *buf, unsigned l
 
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
-extern int has_sha1_pack(const unsigned char *sha1);
-
 /*
  * Open the loose object at path, check its sha1, and return the contents,
  * type, and size. If the object is a blob, then "contents" may return NULL,
diff --git a/diff.c b/diff.c
index 85e714f6c..6bbc46326 100644
--- a/diff.c
+++ b/diff.c
@@ -20,6 +20,7 @@
 #include "string-list.h"
 #include "argv-array.h"
 #include "graph.h"
+#include "pack.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
diff --git a/pack.h b/pack.h
index 1021a781c..ce0e15deb 100644
--- a/pack.h
+++ b/pack.h
@@ -223,4 +223,6 @@ extern struct packed_git *find_sha1_pack(const unsigned 
char *sha1,
 
 extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e);
 
+extern int has_sha1_pack(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 0f1e3338b..507f65236 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1849,3 +1849,9 @@ int find_pack_entry(const unsigned char *sha1, struct 
pack_entry *e)
}
return 0;
 }
+
+int has_sha1_pack(const unsigned char *sha1)
+{
+   struct pack_entry e;
+   return find_pack_entry(sha1, );
+}
diff --git a/revision.c b/revision.c
index 6603af944..2868c4fc8 100644
--- a/revision.c
+++ b/revision.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "cache-tree.h"
 #include "bisect.h"
+#include "pack.h"
 
 volatile show_early_output_fn_t show_early_output;
 
diff --git a/sha1_file.c b/sha1_file.c
index 1a505eae5..2610ea057 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1629,12 +1629,6 @@ int has_pack_index(const unsigned char *sha1)
return 1;
 }
 
-int has_sha1_pack(const unsigned char *sha1)
-{
-   struct pack_entry e;
-   return find_pack_entry(sha1, );
-}
-
 int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
if (!startup_info->have_repository)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 24/25] pack: move has_pack_index()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 2 --
 pack.h  | 2 ++
 packfile.c  | 8 
 sha1_file.c | 8 
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index d96d36d50..656b39d51 100644
--- a/cache.h
+++ b/cache.h
@@ -1225,8 +1225,6 @@ extern int has_object_file_with_flags(const struct 
object_id *oid, int flags);
  */
 extern int has_loose_object_nonlocal(const unsigned char *sha1);
 
-extern int has_pack_index(const unsigned char *sha1);
-
 extern void assert_sha1_type(const unsigned char *sha1, enum object_type 
expect);
 
 /* Helper to check and "touch" a file */
diff --git a/pack.h b/pack.h
index ce0e15deb..2c2a347ba 100644
--- a/pack.h
+++ b/pack.h
@@ -225,4 +225,6 @@ extern int find_pack_entry(const unsigned char *sha1, 
struct pack_entry *e);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 
+extern int has_pack_index(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 507f65236..28a16206c 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1855,3 +1855,11 @@ int has_sha1_pack(const unsigned char *sha1)
struct pack_entry e;
return find_pack_entry(sha1, );
 }
+
+int has_pack_index(const unsigned char *sha1)
+{
+   struct stat st;
+   if (stat(sha1_pack_index_name(sha1), ))
+   return 0;
+   return 1;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 2610ea057..8584f6cf2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1621,14 +1621,6 @@ int force_object_loose(const unsigned char *sha1, time_t 
mtime)
return ret;
 }
 
-int has_pack_index(const unsigned char *sha1)
-{
-   struct stat st;
-   if (stat(sha1_pack_index_name(sha1), ))
-   return 0;
-   return 1;
-}
-
 int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
if (!startup_info->have_repository)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 25/25] pack: move for_each_packed_object()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c |  1 +
 cache.h|  7 +--
 pack.h | 11 +++
 packfile.c | 40 
 reachable.c|  1 +
 sha1_file.c| 40 
 6 files changed, 54 insertions(+), 46 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 96b786e48..316ef5c98 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -12,6 +12,7 @@
 #include "streaming.h"
 #include "tree-walk.h"
 #include "sha1-array.h"
+#include "pack.h"
 
 struct batch_options {
int enabled;
diff --git a/cache.h b/cache.h
index 656b39d51..6c3822783 100644
--- a/cache.h
+++ b/cache.h
@@ -1660,17 +1660,12 @@ int for_each_loose_file_in_objdir_buf(struct strbuf 
*path,
  void *data);
 
 /*
- * Iterate over loose and packed objects in both the local
+ * Iterate over loose objects in both the local
  * repository and any alternates repositories (unless the
  * LOCAL_ONLY flag is set).
  */
 #define FOR_EACH_OBJECT_LOCAL_ONLY 0x1
-typedef int each_packed_object_fn(const struct object_id *oid,
- struct packed_git *pack,
- uint32_t pos,
- void *data);
 extern int for_each_loose_object(each_loose_object_fn, void *, unsigned flags);
-extern int for_each_packed_object(each_packed_object_fn, void *, unsigned 
flags);
 
 struct object_info {
/* Request */
diff --git a/pack.h b/pack.h
index 2c2a347ba..905b05be5 100644
--- a/pack.h
+++ b/pack.h
@@ -227,4 +227,15 @@ extern int has_sha1_pack(const unsigned char *sha1);
 
 extern int has_pack_index(const unsigned char *sha1);
 
+/*
+ * Iterate over packed objects in both the local
+ * repository and any alternates repositories (unless the
+ * FOR_EACH_OBJECT_LOCAL_ONLY flag, defined in cache.h, is set).
+ */
+typedef int each_packed_object_fn(const struct object_id *oid,
+ struct packed_git *pack,
+ uint32_t pos,
+ void *data);
+extern int for_each_packed_object(each_packed_object_fn, void *, unsigned 
flags);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 28a16206c..031a40828 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1863,3 +1863,43 @@ int has_pack_index(const unsigned char *sha1)
return 0;
return 1;
 }
+
+static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn 
cb, void *data)
+{
+   uint32_t i;
+   int r = 0;
+
+   for (i = 0; i < p->num_objects; i++) {
+   struct object_id oid;
+
+   if (!nth_packed_object_oid(, p, i))
+   return error("unable to get sha1 of object %u in %s",
+i, p->pack_name);
+
+   r = cb(, p, i, data);
+   if (r)
+   break;
+   }
+   return r;
+}
+
+int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned 
flags)
+{
+   struct packed_git *p;
+   int r = 0;
+   int pack_errors = 0;
+
+   prepare_packed_git();
+   for (p = packed_git; p; p = p->next) {
+   if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
+   continue;
+   if (open_pack_index(p)) {
+   pack_errors = 1;
+   continue;
+   }
+   r = for_each_object_in_pack(p, cb, data);
+   if (r)
+   break;
+   }
+   return r ? r : pack_errors;
+}
diff --git a/reachable.c b/reachable.c
index c62efbfd4..ef606ae17 100644
--- a/reachable.c
+++ b/reachable.c
@@ -9,6 +9,7 @@
 #include "cache-tree.h"
 #include "progress.h"
 #include "list-objects.h"
+#include "pack.h"
 
 struct connectivity_progress {
struct progress *progress;
diff --git a/sha1_file.c b/sha1_file.c
index 8584f6cf2..3f3f9174f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2013,46 +2013,6 @@ int for_each_loose_object(each_loose_object_fn cb, void 
*data, unsigned flags)
return foreach_alt_odb(loose_from_alt_odb, );
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn 
cb, void *data)
-{
-   uint32_t i;
-   int r = 0;
-
-   for (i = 0; i < p->num_objects; i++) {
-   struct object_id oid;
-
-   if (!nth_packed_object_oid(, p, i))
-   return error("unable to get sha1 of object %u in %s",
-i, p->pack_name);
-
-   r = cb(, p, i, data);
-   if (r)
-   break;
-   }
-   return r;
-}
-
-int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned 
flags)
-{
-   struct packed_git *p;
-   int r = 0;
-   int pack_errors = 0;
-
-   prepare_packed_git();
-

[PATCH v2 22/25] pack: move find_pack_entry() and make it global

2017-08-08 Thread Jonathan Tan

This function needs to be global as it is used by sha1_file.c and will
be used by packfile.c.

Signed-off-by: Jonathan Tan 
---
 pack.h  |  2 ++
 packfile.c  | 53 +
 sha1_file.c | 53 -
 3 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/pack.h b/pack.h
index 0517d6542..1021a781c 100644
--- a/pack.h
+++ b/pack.h
@@ -221,4 +221,6 @@ extern int is_pack_valid(struct packed_git *);
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
+extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e);
+
 #endif
diff --git a/packfile.c b/packfile.c
index f16b56262..0f1e3338b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1796,3 +1796,56 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
return NULL;
 
 }
+
+static int fill_pack_entry(const unsigned char *sha1,
+  struct pack_entry *e,
+  struct packed_git *p)
+{
+   off_t offset;
+
+   if (p->num_bad_objects) {
+   unsigned i;
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
+   return 0;
+   }
+
+   offset = find_pack_entry_one(sha1, p);
+   if (!offset)
+   return 0;
+
+   /*
+* We are about to tell the caller where they can locate the
+* requested object.  We better make sure the packfile is
+* still here and can be accessed before supplying that
+* answer, as it may have been deleted since the index was
+* loaded!
+*/
+   if (!is_pack_valid(p))
+   return 0;
+   e->offset = offset;
+   e->p = p;
+   hashcpy(e->sha1, sha1);
+   return 1;
+}
+
+/*
+ * Iff a pack file contains the object named by sha1, return true and
+ * store its location to e.
+ */
+int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
+{
+   struct mru_entry *p;
+
+   prepare_packed_git();
+   if (!packed_git)
+   return 0;
+
+   for (p = packed_git_mru->head; p; p = p->next) {
+   if (fill_pack_entry(sha1, e, p->item)) {
+   mru_mark(packed_git_mru, p);
+   return 1;
+   }
+   }
+   return 0;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 229358663..1a505eae5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1073,59 +1073,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-static int fill_pack_entry(const unsigned char *sha1,
-  struct pack_entry *e,
-  struct packed_git *p)
-{
-   off_t offset;
-
-   if (p->num_bad_objects) {
-   unsigned i;
-   for (i = 0; i < p->num_bad_objects; i++)
-   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
-   return 0;
-   }
-
-   offset = find_pack_entry_one(sha1, p);
-   if (!offset)
-   return 0;
-
-   /*
-* We are about to tell the caller where they can locate the
-* requested object.  We better make sure the packfile is
-* still here and can be accessed before supplying that
-* answer, as it may have been deleted since the index was
-* loaded!
-*/
-   if (!is_pack_valid(p))
-   return 0;
-   e->offset = offset;
-   e->p = p;
-   hashcpy(e->sha1, sha1);
-   return 1;
-}
-
-/*
- * Iff a pack file contains the object named by sha1, return true and
- * store its location to e.
- */
-static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
-{
-   struct mru_entry *p;
-
-   prepare_packed_git();
-   if (!packed_git)
-   return 0;
-
-   for (p = packed_git_mru->head; p; p = p->next) {
-   if (fill_pack_entry(sha1, e, p->item)) {
-   mru_mark(packed_git_mru, p);
-   return 1;
-   }
-   }
-   return 0;
-}
-
 static int sha1_loose_object_info(const unsigned char *sha1,
  struct object_info *oi,
  int flags)
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 01/25] pack: move pack name-related functions

2017-08-08 Thread Jonathan Tan

Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.

Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c, and designate pack.h to hold these
packfile-related functions.

In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.

Signed-off-by: Jonathan Tan 
---
 Makefile |  1 +
 builtin/pack-redundant.c |  1 +
 cache.h  | 23 ---
 pack.h   | 23 +++
 packfile.c   | 23 +++
 sha1_file.c  | 22 --
 6 files changed, 48 insertions(+), 45 deletions(-)
 create mode 100644 packfile.c

diff --git a/Makefile b/Makefile
index 461c845d3..5cdecaa17 100644
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += packfile.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index cb1df1c76..df36d10e7 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -7,6 +7,7 @@
 */
 
 #include "builtin.h"
+#include "pack.h"
 
 #define BLKSIZE 512
 
diff --git a/cache.h b/cache.h
index 71fe09264..1f0f47819 100644
--- a/cache.h
+++ b/cache.h
@@ -902,20 +902,6 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
-/*
- * Return the name of the (local) packfile with the specified sha1 in
- * its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_name(const unsigned char *sha1);
-
-/*
- * Return the name of the (local) pack index file with the specified
- * sha1 in its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_index_name(const unsigned char *sha1);
-
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
  * The result will be at least `len` characters long, and will be NUL
@@ -1648,15 +1634,6 @@ extern void pack_report(void);
  */
 extern int odb_mkstemp(struct strbuf *template, const char *pattern);
 
-/*
- * Generate the filename to be used for a pack file with checksum "sha1" and
- * extension "ext". The result is written into the strbuf "buf", overwriting
- * any existing contents. A pointer to buf->buf is returned as a convenience.
- *
- * Example: odb_pack_name(out, sha1, "idx") => 
".git/objects/pack/pack-1234..idx"
- */
-extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, 
const char *ext);
-
 /*
  * Create a pack .keep file named "name" (which should generally be the output
  * of odb_pack_name). Returns a file descriptor opened for writing, or -1 on
diff --git a/pack.h b/pack.h
index 8294341af..63bfde00c 100644
--- a/pack.h
+++ b/pack.h
@@ -101,4 +101,27 @@ extern int read_pack_header(int fd, struct pack_header *);
 extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
 extern void finish_tmp_packfile(struct strbuf *name_buffer, const char 
*pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, 
struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
+/*
+ * Generate the filename to be used for a pack file with checksum "sha1" and
+ * extension "ext". The result is written into the strbuf "buf", overwriting
+ * any existing contents. A pointer to buf->buf is returned as a convenience.
+ *
+ * Example: odb_pack_name(out, sha1, "idx") => 
".git/objects/pack/pack-1234..idx"
+ */
+extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, 
const char *ext);
+
+/*
+ * Return the name of the (local) packfile with the specified sha1 in
+ * its name.  The return value is a pointer to memory that is
+ * overwritten each time this function is called.
+ */
+extern char *sha1_pack_name(const unsigned char *sha1);
+
+/*
+ * Return the name of the (local) pack index file with the specified
+ * sha1 in its name.  The return value is a pointer to memory that is
+ * overwritten each time this function is called.
+ */
+extern char *sha1_pack_index_name(const unsigned char *sha1);
+
 #endif
diff --git a/packfile.c b/packfile.c
new file mode 100644
index 0..0d191dfd6
--- /dev/null
+++ b/packfile.c
@@ -0,0 +1,23 @@
+#include "cache.h"
+
+char *odb_pack_name(struct strbuf *buf,
+   const unsigned char *sha1,
+   const char *ext)
+{
+   strbuf_reset(buf);
+   strbuf_addf(buf, "%s/pack/pack-%s.%s", get_object_directory(),
+   sha1_to_hex(sha1), ext);
+   return buf->buf;
+}
+
+char *sha1_pack_name(const unsigned char *sha1)
+{
+

[PATCH v2 02/25] pack: move static state variables

2017-08-08 Thread Jonathan Tan

sha1_file.c declares some static variables that store packfile-related
state. Move them to packfile.c.

They are temporarily made global, but subsequent commits will restore
their scope back to static.

Signed-off-by: Jonathan Tan 
---
 pack.h  |  9 +
 packfile.c  | 14 ++
 sha1_file.c | 13 -
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/pack.h b/pack.h
index 63bfde00c..7fcd45f7b 100644
--- a/pack.h
+++ b/pack.h
@@ -124,4 +124,13 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
+extern unsigned int pack_used_ctr;
+extern unsigned int pack_mmap_calls;
+extern unsigned int peak_pack_open_windows;
+extern unsigned int pack_open_windows;
+extern unsigned int pack_open_fds;
+extern unsigned int pack_max_fds;
+extern size_t peak_pack_mapped;
+extern size_t pack_mapped;
+
 #endif
diff --git a/packfile.c b/packfile.c
index 0d191dfd6..0f46e0617 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "mru.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -21,3 +22,16 @@ char *sha1_pack_index_name(const unsigned char *sha1)
static struct strbuf buf = STRBUF_INIT;
return odb_pack_name(, sha1, "idx");
 }
+
+unsigned int pack_used_ctr;
+unsigned int pack_mmap_calls;
+unsigned int peak_pack_open_windows;
+unsigned int pack_open_windows;
+unsigned int pack_open_fds;
+unsigned int pack_max_fds;
+size_t peak_pack_mapped;
+size_t pack_mapped;
+struct packed_git *packed_git;
+
+static struct mru packed_git_mru_storage;
+struct mru *packed_git_mru = _git_mru_storage;
diff --git a/sha1_file.c b/sha1_file.c
index 7e511ce9e..4d95e21eb 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -682,19 +682,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static unsigned int pack_used_ctr;
-static unsigned int pack_mmap_calls;
-static unsigned int peak_pack_open_windows;
-static unsigned int pack_open_windows;
-static unsigned int pack_open_fds;
-static unsigned int pack_max_fds;
-static size_t peak_pack_mapped;
-static size_t pack_mapped;
-struct packed_git *packed_git;
-
-static struct mru packed_git_mru_storage;
-struct mru *packed_git_mru = _git_mru_storage;
-
 void pack_report(void)
 {
fprintf(stderr,
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-08 Thread Jonathan Tan

Here is the complete patch set. I have only moved the exported functions
that operate with packfiles and their static helpers - for example,
static functions like freshen_packed_object() that are used only by
non-pack-specific functions are not moved.

In the end, 3 functions needed to be made global. They are
find_pack_entry(), mark_bad_packed_object(), and has_packed_and_bad().

Of the 3, find_pack_entry() is probably legitimately promoted. But I
think that the latter two functions needing to be accessed from
sha1_file.c points to a design that could be improved - they are only
used when packed_object_info() detects corruption, and used for marking
as bad and printing messages to the user respectively, which
packed_object_info() should probably do itself. But I have not made this
change in this patch set.

(Other than the 3 functions above, there are some variables and
functions that are temporarily made global, but reduced back to static
when the wide scope is no longer needed.)

Jonathan Tan (25):
  pack: move pack name-related functions
  pack: move static state variables
  pack: move pack_report()
  pack: move open_pack_index(), parse_pack_index()
  pack: move release_pack_memory()
  pack: move pack-closing functions
  pack: move use_pack()
  pack: move unuse_pack()
  pack: move add_packed_git()
  pack: move install_packed_git()
  pack: move {,re}prepare_packed_git and approximate_object_count
  pack: move unpack_object_header()
  pack: move get_size_from_delta()
  pack: move unpack_object_header()
  sha1_file: set whence in storage-specific info fn
  sha1_file: remove read_packed_sha1()
  pack: move packed_object_info(), unpack_entry()
  pack: move nth_packed_object_{sha1,oid}
  pack: move check_pack_index_ptr(), nth_packed_object_offset()
  pack: move find_pack_entry_one(), is_pack_valid()
  pack: move find_sha1_pack()
  pack: move find_pack_entry() and make it global
  pack: move has_sha1_pack()
  pack: move has_pack_index()
  pack: move for_each_packed_object()

 Makefile |1 +
 builtin/am.c |1 +
 builtin/cat-file.c   |1 +
 builtin/clone.c  |1 +
 builtin/count-objects.c  |1 +
 builtin/fetch.c  |1 +
 builtin/gc.c |1 +
 builtin/merge.c  |1 +
 builtin/pack-redundant.c |1 +
 builtin/prune-packed.c   |1 +
 cache.h  |  122 +--
 connected.c  |1 +
 diff.c   |1 +
 git-compat-util.h|2 -
 http-backend.c   |1 +
 http-push.c  |1 +
 http-walker.c|1 +
 pack.h   |  137 +++
 packfile.c   | 1905 +++
 path.c   |1 +
 reachable.c  |1 +
 revision.c   |1 +
 server-info.c|1 +
 sha1_file.c  | 2484 ++
 sha1_name.c  |1 +
 streaming.c  |1 +
 26 files changed, 2350 insertions(+), 2321 deletions(-)
 create mode 100644 packfile.c

-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 08/25] pack: move unuse_pack()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 1 -
 pack.h  | 1 +
 packfile.c  | 9 +
 sha1_file.c | 9 -
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index dd9f9a9ae..4812f3a63 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,7 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
diff --git a/pack.h b/pack.h
index bf2b99bf9..3876e9ae6 100644
--- a/pack.h
+++ b/pack.h
@@ -149,5 +149,6 @@ extern void close_all_packs(void);
 extern int open_packed_git(struct packed_git *p);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
+extern void unuse_pack(struct pack_window **);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 85cb65558..93526ea7b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -596,3 +596,12 @@ unsigned char *use_pack(struct packed_git *p,
*left = win->len - xsize_t(offset);
return win->base + offset;
 }
+
+void unuse_pack(struct pack_window **w_cursor)
+{
+   struct pack_window *w = *w_cursor;
+   if (w) {
+   w->inuse_cnt--;
+   *w_cursor = NULL;
+   }
+}
diff --git a/sha1_file.c b/sha1_file.c
index 8f17a07e9..12501ef06 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,15 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void unuse_pack(struct pack_window **w_cursor)
-{
-   struct pack_window *w = *w_cursor;
-   if (w) {
-   w->inuse_cnt--;
-   *w_cursor = NULL;
-   }
-}
-
 static struct packed_git *alloc_packed_git(int extra)
 {
struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 14/25] pack: move unpack_object_header()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  1 +
 packfile.c  | 26 ++
 sha1_file.c | 26 --
 4 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index e29918c75..1aea0 100644
--- a/cache.h
+++ b/cache.h
@@ -1661,7 +1661,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 /*
  * Iterate over the files in the loose-object parts of the object
diff --git a/pack.h b/pack.h
index 69c92d8d2..5e3552392 100644
--- a/pack.h
+++ b/pack.h
@@ -169,5 +169,6 @@ unsigned long approximate_object_count(void);
 
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
+extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 511afad85..a4db78ea0 100644
--- a/packfile.c
+++ b/packfile.c
@@ -948,3 +948,29 @@ unsigned long get_size_from_delta(struct packed_git *p,
/* Read the result size */
return get_delta_hdr_size(, delta_head+sizeof(delta_head));
 }
+
+int unpack_object_header(struct packed_git *p,
+struct pack_window **w_curs,
+off_t *curpos,
+unsigned long *sizep)
+{
+   unsigned char *base;
+   unsigned long left;
+   unsigned long used;
+   enum object_type type;
+
+   /* use_pack() assures us we have [base, base + 20) available
+* as a range that we can look at.  (Its actually the hash
+* size that is assured.)  With our object header encoding
+* the maximum deflated object size is 2^137, which is just
+* insane, so we know won't exceed what we have been given.
+*/
+   base = use_pack(p, w_curs, *curpos, );
+   used = unpack_object_header_buffer(base, left, , sizep);
+   if (!used) {
+   type = OBJ_BAD;
+   } else
+   *curpos += used;
+
+   return type;
+}
diff --git a/sha1_file.c b/sha1_file.c
index 7d354d9b6..f3bcdae17 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1170,32 +1170,6 @@ static const unsigned char *get_delta_base_sha1(struct 
packed_git *p,
return NULL;
 }
 
-int unpack_object_header(struct packed_git *p,
-struct pack_window **w_curs,
-off_t *curpos,
-unsigned long *sizep)
-{
-   unsigned char *base;
-   unsigned long left;
-   unsigned long used;
-   enum object_type type;
-
-   /* use_pack() assures us we have [base, base + 20) available
-* as a range that we can look at.  (Its actually the hash
-* size that is assured.)  With our object header encoding
-* the maximum deflated object size is 2^137, which is just
-* insane, so we know won't exceed what we have been given.
-*/
-   base = use_pack(p, w_curs, *curpos, );
-   used = unpack_object_header_buffer(base, left, , sizep);
-   if (!used) {
-   type = OBJ_BAD;
-   } else
-   *curpos += used;
-
-   return type;
-}
-
 static int retry_bad_packed_offset(struct packed_git *p, off_t obj_offset)
 {
int type;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 03/25] pack: move pack_report()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  2 --
 pack.h  |  2 ++
 packfile.c  | 24 
 sha1_file.c | 24 
 4 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index 1f0f47819..c7f802e4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1624,8 +1624,6 @@ unsigned long approximate_object_count(void);
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
-extern void pack_report(void);
-
 /*
  * Create a temporary file rooted in the object database directory, or
  * die on failure. The filename is taken from "pattern", which should have the
diff --git a/pack.h b/pack.h
index 7fcd45f7b..6098bfe40 100644
--- a/pack.h
+++ b/pack.h
@@ -133,4 +133,6 @@ extern unsigned int pack_max_fds;
 extern size_t peak_pack_mapped;
 extern size_t pack_mapped;
 
+extern void pack_report(void);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 0f46e0617..60d9fc3b0 100644
--- a/packfile.c
+++ b/packfile.c
@@ -35,3 +35,27 @@ struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
 struct mru *packed_git_mru = _git_mru_storage;
+
+#define SZ_FMT PRIuMAX
+static inline uintmax_t sz_fmt(size_t s) { return s; }
+
+void pack_report(void)
+{
+   fprintf(stderr,
+   "pack_report: getpagesize()= %10" SZ_FMT "\n"
+   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
+   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
+   sz_fmt(getpagesize()),
+   sz_fmt(packed_git_window_size),
+   sz_fmt(packed_git_limit));
+   fprintf(stderr,
+   "pack_report: pack_used_ctr= %10u\n"
+   "pack_report: pack_mmap_calls  = %10u\n"
+   "pack_report: pack_open_windows= %10u / %10u\n"
+   "pack_report: pack_mapped  = "
+   "%10" SZ_FMT " / %10" SZ_FMT "\n",
+   pack_used_ctr,
+   pack_mmap_calls,
+   pack_open_windows, peak_pack_open_windows,
+   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
+}
diff --git a/sha1_file.c b/sha1_file.c
index 4d95e21eb..0de39f480 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -29,9 +29,6 @@
 #include "mergesort.h"
 #include "quote.h"
 
-#define SZ_FMT PRIuMAX
-static inline uintmax_t sz_fmt(size_t s) { return s; }
-
 const unsigned char null_sha1[20];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
@@ -682,27 +679,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-void pack_report(void)
-{
-   fprintf(stderr,
-   "pack_report: getpagesize()= %10" SZ_FMT "\n"
-   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
-   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
-   sz_fmt(getpagesize()),
-   sz_fmt(packed_git_window_size),
-   sz_fmt(packed_git_limit));
-   fprintf(stderr,
-   "pack_report: pack_used_ctr= %10u\n"
-   "pack_report: pack_mmap_calls  = %10u\n"
-   "pack_report: pack_open_windows= %10u / %10u\n"
-   "pack_report: pack_mapped  = "
-   "%10" SZ_FMT " / %10" SZ_FMT "\n",
-   pack_used_ctr,
-   pack_mmap_calls,
-   pack_open_windows, peak_pack_open_windows,
-   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
-}
-
 /*
  * Open and mmap the index file at path, perform a couple of
  * consistency checks, then record its information to p.  Return 0 on
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 05/25] pack: move release_pack_memory()

2017-08-08 Thread Jonathan Tan

The function unuse_one_window() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 git-compat-util.h |  2 --
 pack.h|  4 
 packfile.c| 49 +
 sha1_file.c   | 49 -
 4 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index db9c22de7..201056e2d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -749,8 +749,6 @@ const char *inet_ntop(int af, const void *src, char *dst, 
size_t size);
 extern int git_atexit(void (*handler)(void));
 #endif
 
-extern void release_pack_memory(size_t);
-
 typedef void (*try_to_free_t)(size_t);
 extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 
diff --git a/pack.h b/pack.h
index 5be0ed42a..c16220586 100644
--- a/pack.h
+++ b/pack.h
@@ -143,4 +143,8 @@ extern int open_pack_index(struct packed_git *);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
 
+extern int unuse_one_window(struct packed_git *current);
+
+extern void release_pack_memory(size_t);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 6edc43228..8daa74ad1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -208,3 +208,52 @@ struct packed_git *parse_pack_index(unsigned char *sha1, 
const char *idx_path)
 
return p;
 }
+
+static void scan_windows(struct packed_git *p,
+   struct packed_git **lru_p,
+   struct pack_window **lru_w,
+   struct pack_window **lru_l)
+{
+   struct pack_window *w, *w_l;
+
+   for (w_l = NULL, w = p->windows; w; w = w->next) {
+   if (!w->inuse_cnt) {
+   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
+   *lru_p = p;
+   *lru_w = w;
+   *lru_l = w_l;
+   }
+   }
+   w_l = w;
+   }
+}
+
+int unuse_one_window(struct packed_git *current)
+{
+   struct packed_git *p, *lru_p = NULL;
+   struct pack_window *lru_w = NULL, *lru_l = NULL;
+
+   if (current)
+   scan_windows(current, _p, _w, _l);
+   for (p = packed_git; p; p = p->next)
+   scan_windows(p, _p, _w, _l);
+   if (lru_p) {
+   munmap(lru_w->base, lru_w->len);
+   pack_mapped -= lru_w->len;
+   if (lru_l)
+   lru_l->next = lru_w->next;
+   else
+   lru_p->windows = lru_w->next;
+   free(lru_w);
+   pack_open_windows--;
+   return 1;
+   }
+   return 0;
+}
+
+void release_pack_memory(size_t need)
+{
+   size_t cur = pack_mapped;
+   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
+   ; /* nothing */
+}
diff --git a/sha1_file.c b/sha1_file.c
index 2e414f5f5..644876e4e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -679,55 +679,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static void scan_windows(struct packed_git *p,
-   struct packed_git **lru_p,
-   struct pack_window **lru_w,
-   struct pack_window **lru_l)
-{
-   struct pack_window *w, *w_l;
-
-   for (w_l = NULL, w = p->windows; w; w = w->next) {
-   if (!w->inuse_cnt) {
-   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
-   *lru_p = p;
-   *lru_w = w;
-   *lru_l = w_l;
-   }
-   }
-   w_l = w;
-   }
-}
-
-static int unuse_one_window(struct packed_git *current)
-{
-   struct packed_git *p, *lru_p = NULL;
-   struct pack_window *lru_w = NULL, *lru_l = NULL;
-
-   if (current)
-   scan_windows(current, _p, _w, _l);
-   for (p = packed_git; p; p = p->next)
-   scan_windows(p, _p, _w, _l);
-   if (lru_p) {
-   munmap(lru_w->base, lru_w->len);
-   pack_mapped -= lru_w->len;
-   if (lru_l)
-   lru_l->next = lru_w->next;
-   else
-   lru_p->windows = lru_w->next;
-   free(lru_w);
-   pack_open_windows--;
-   return 1;
-   }
-   return 0;
-}
-
-void release_pack_memory(size_t need)
-{
-   size_t cur = pack_mapped;
-   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
-   ; /* nothing */
-}
-
 static void mmap_limit_check(size_t length)
 {
static size_t limit = 0;
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 13/25] pack: move get_size_from_delta()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.h  |  1 +
 packfile.c  | 40 
 sha1_file.c | 39 ---
 4 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index 9c70759a6..e29918c75 100644
--- a/cache.h
+++ b/cache.h
@@ -1661,7 +1661,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 /*
diff --git a/pack.h b/pack.h
index 4a7f88a38..69c92d8d2 100644
--- a/pack.h
+++ b/pack.h
@@ -168,5 +168,6 @@ extern void reprepare_packed_git(void);
 unsigned long approximate_object_count(void);
 
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
+extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 6e4f1c6e3..511afad85 100644
--- a/packfile.c
+++ b/packfile.c
@@ -3,6 +3,7 @@
 #include "pack.h"
 #include "dir.h"
 #include "mergesort.h"
+#include "delta.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -908,3 +909,42 @@ unsigned long unpack_object_header_buffer(const unsigned 
char *buf,
*sizep = size;
return used;
 }
+
+unsigned long get_size_from_delta(struct packed_git *p,
+ struct pack_window **w_curs,
+ off_t curpos)
+{
+   const unsigned char *data;
+   unsigned char delta_head[20], *in;
+   git_zstream stream;
+   int st;
+
+   memset(, 0, sizeof(stream));
+   stream.next_out = delta_head;
+   stream.avail_out = sizeof(delta_head);
+
+   git_inflate_init();
+   do {
+   in = use_pack(p, w_curs, curpos, _in);
+   stream.next_in = in;
+   st = git_inflate(, Z_FINISH);
+   curpos += stream.next_in - in;
+   } while ((st == Z_OK || st == Z_BUF_ERROR) &&
+stream.total_out < sizeof(delta_head));
+   git_inflate_end();
+   if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
+   error("delta data unpack-initial failed");
+   return 0;
+   }
+
+   /* Examine the initial part of the delta to figure out
+* the result size.
+*/
+   data = delta_head;
+
+   /* ignore base size */
+   get_delta_hdr_size(, delta_head+sizeof(delta_head));
+
+   /* Read the result size */
+   return get_delta_hdr_size(, delta_head+sizeof(delta_head));
+}
diff --git a/sha1_file.c b/sha1_file.c
index 1f4b4ba2c..7d354d9b6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1099,45 +1099,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-unsigned long get_size_from_delta(struct packed_git *p,
- struct pack_window **w_curs,
- off_t curpos)
-{
-   const unsigned char *data;
-   unsigned char delta_head[20], *in;
-   git_zstream stream;
-   int st;
-
-   memset(, 0, sizeof(stream));
-   stream.next_out = delta_head;
-   stream.avail_out = sizeof(delta_head);
-
-   git_inflate_init();
-   do {
-   in = use_pack(p, w_curs, curpos, _in);
-   stream.next_in = in;
-   st = git_inflate(, Z_FINISH);
-   curpos += stream.next_in - in;
-   } while ((st == Z_OK || st == Z_BUF_ERROR) &&
-stream.total_out < sizeof(delta_head));
-   git_inflate_end();
-   if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
-   error("delta data unpack-initial failed");
-   return 0;
-   }
-
-   /* Examine the initial part of the delta to figure out
-* the result size.
-*/
-   data = delta_head;
-
-   /* ignore base size */
-   get_delta_hdr_size(, delta_head+sizeof(delta_head));
-
-   /* Read the result size */
-   return get_delta_hdr_size(, delta_head+sizeof(delta_head));
-}
-
 static off_t get_delta_base(struct packed_git *p,
struct pack_window **w_curs,
off_t *curpos,
-- 
2.14.0.434.g98096fd7a8-goog

[PATCH v2 04/25] pack: move open_pack_index(), parse_pack_index()

2017-08-08 Thread Jonathan Tan

alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.

Signed-off-by: Jonathan Tan 
---
 builtin/count-objects.c |   1 +
 cache.h |   8 ---
 pack.h  |   8 +++
 packfile.c  | 149 
 sha1_file.c | 140 -
 sha1_name.c |   1 +
 6 files changed, 159 insertions(+), 148 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 1d82e61f2..185d3190a 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -10,6 +10,7 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "quote.h"
+#include "pack.h"
 
 static unsigned long garbage;
 static off_t size_garbage;
diff --git a/cache.h b/cache.h
index c7f802e4a..5d6839525 100644
--- a/cache.h
+++ b/cache.h
@@ -1603,8 +1603,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
-
 /* A hook to report invalid files in pack directory */
 #define PACKDIR_FILE_PACK 1
 #define PACKDIR_FILE_IDX 2
@@ -1639,12 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * mmap the index file for the specified packfile (if it is not
- * already mmapped).  Return 0 on success.
- */
-extern int open_pack_index(struct packed_git *);
-
 /*
  * munmap the index file for the specified packfile (if it is
  * currently mmapped).
diff --git a/pack.h b/pack.h
index 6098bfe40..5be0ed42a 100644
--- a/pack.h
+++ b/pack.h
@@ -135,4 +135,12 @@ extern size_t pack_mapped;
 
 extern void pack_report(void);
 
+/*
+ * mmap the index file for the specified packfile (if it is not
+ * already mmapped).  Return 0 on success.
+ */
+extern int open_pack_index(struct packed_git *);
+
+extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
+
 #endif
diff --git a/packfile.c b/packfile.c
index 60d9fc3b0..6edc43228 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "mru.h"
+#include "pack.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -59,3 +60,151 @@ void pack_report(void)
pack_open_windows, peak_pack_open_windows,
sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
 }
+
+/*
+ * Open and mmap the index file at path, perform a couple of
+ * consistency checks, then record its information to p.  Return 0 on
+ * success.
+ */
+static int check_packed_git_idx(const char *path, struct packed_git *p)
+{
+   void *idx_map;
+   struct pack_idx_header *hdr;
+   size_t idx_size;
+   uint32_t version, nr, i, *index;
+   int fd = git_open(path);
+   struct stat st;
+
+   if (fd < 0)
+   return -1;
+   if (fstat(fd, )) {
+   close(fd);
+   return -1;
+   }
+   idx_size = xsize_t(st.st_size);
+   if (idx_size < 4 * 256 + 20 + 20) {
+   close(fd);
+   return error("index file %s is too small", path);
+   }
+   idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   close(fd);
+
+   hdr = idx_map;
+   if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) {
+   version = ntohl(hdr->idx_version);
+   if (version < 2 || version > 2) {
+   munmap(idx_map, idx_size);
+   return error("index file %s is version %"PRIu32
+" and is not supported by this binary"
+" (try upgrading GIT to a newer version)",
+path, version);
+   }
+   } else
+   version = 1;
+
+   nr = 0;
+   index = idx_map;
+   if (version > 1)
+   index += 2;  /* skip index header */
+   for (i = 0; i < 256; i++) {
+   uint32_t n = ntohl(index[i]);
+   if (n < nr) {
+   munmap(idx_map, idx_size);
+   return error("non-monotonic index %s", path);
+   }
+   nr = n;
+   }
+
+   if (version == 1) {
+   /*
+* Total size:
+*  - 256 index entries 4 bytes each
+*  - 24-byte entries * nr (20-byte sha1 + 4-byte offset)
+*  - 20-byte SHA1 of the packfile
+*  - 20-byte SHA1 file checksum
+*/
+   if (idx_size != 4*256 + nr * 24 + 20 + 20) {
+   munmap(idx_map, idx_size);
+   return error("wrong index v1 file size in %s", path);
+   }
+   } else if (version == 2) {
+   /*
+* Minimum size:
+

[PATCH v2 07/25] pack: move use_pack()

2017-08-08 Thread Jonathan Tan

The function open_packed_git() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 cache.h |   1 -
 pack.h  |  14 +--
 packfile.c  | 303 ++--
 sha1_file.c | 285 
 streaming.c |   1 +
 5 files changed, 299 insertions(+), 305 deletions(-)

diff --git a/cache.h b/cache.h
index 25a21a61f..dd9f9a9ae 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,7 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/pack.h b/pack.h
index fd4668528..bf2b99bf9 100644
--- a/pack.h
+++ b/pack.h
@@ -124,14 +124,7 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
-extern unsigned int pack_used_ctr;
-extern unsigned int pack_mmap_calls;
-extern unsigned int peak_pack_open_windows;
-extern unsigned int pack_open_windows;
 extern unsigned int pack_open_fds;
-extern unsigned int pack_max_fds;
-extern size_t peak_pack_mapped;
-extern size_t pack_mapped;
 
 extern void pack_report(void);
 
@@ -143,12 +136,9 @@ extern int open_pack_index(struct packed_git *);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
 
-extern int unuse_one_window(struct packed_git *current);
-
 extern void release_pack_memory(size_t);
 
 extern void close_pack_windows(struct packed_git *);
-extern int close_pack_fd(struct packed_git *);
 /*
  * munmap the index file for the specified packfile (if it is
  * currently mmapped).
@@ -156,4 +146,8 @@ extern int close_pack_fd(struct packed_git *);
 extern void close_pack_index(struct packed_git *);
 extern void close_all_packs(void);
 
+extern int open_packed_git(struct packed_git *p);
+
+extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
+
 #endif
diff --git a/packfile.c b/packfile.c
index c8e2dbdee..85cb65558 100644
--- a/packfile.c
+++ b/packfile.c
@@ -24,14 +24,14 @@ char *sha1_pack_index_name(const unsigned char *sha1)
return odb_pack_name(, sha1, "idx");
 }
 
-unsigned int pack_used_ctr;
-unsigned int pack_mmap_calls;
-unsigned int peak_pack_open_windows;
-unsigned int pack_open_windows;
+static unsigned int pack_used_ctr;
+static unsigned int pack_mmap_calls;
+static unsigned int peak_pack_open_windows;
+static unsigned int pack_open_windows;
 unsigned int pack_open_fds;
-unsigned int pack_max_fds;
-size_t peak_pack_mapped;
-size_t pack_mapped;
+static unsigned int pack_max_fds;
+static size_t peak_pack_mapped;
+static size_t pack_mapped;
 struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
@@ -228,7 +228,7 @@ static void scan_windows(struct packed_git *p,
}
 }
 
-int unuse_one_window(struct packed_git *current)
+static int unuse_one_window(struct packed_git *current)
 {
struct packed_git *p, *lru_p = NULL;
struct pack_window *lru_w = NULL, *lru_l = NULL;
@@ -274,7 +274,7 @@ void close_pack_windows(struct packed_git *p)
}
 }
 
-int close_pack_fd(struct packed_git *p)
+static int close_pack_fd(struct packed_git *p)
 {
if (p->pack_fd < 0)
return 0;
@@ -311,3 +311,288 @@ void close_all_packs(void)
else
close_pack(p);
 }
+
+/*
+ * The LRU pack is the one with the oldest MRU window, preferring packs
+ * with no used windows, or the oldest mtime if it has no windows allocated.
+ */
+static void find_lru_pack(struct packed_git *p, struct packed_git **lru_p, 
struct pack_window **mru_w, int *accept_windows_inuse)
+{
+   struct pack_window *w, *this_mru_w;
+   int has_windows_inuse = 0;
+
+   /*
+* Reject this pack if it has windows and the previously selected
+* one does not.  If this pack does not have windows, reject
+* it if the pack file is newer than the previously selected one.
+*/
+   if (*lru_p && !*mru_w && (p->windows || p->mtime > (*lru_p)->mtime))
+   return;
+
+   for (w = this_mru_w = p->windows; w; w = w->next) {
+   /*
+* Reject this pack if any of its windows are in use,
+* but the previously selected pack did not have any
+* inuse windows.  Otherwise, record that this pack
+* has windows in use.
+*/
+   if (w->inuse_cnt) {
+   if (*accept_windows_inuse)
+   has_windows_inuse = 1;
+   else
+

[PATCH] builtin/add: add detail to a 'cannot chmod' error message

2017-08-08 Thread Ramsay Jones


In addition to adding the missing newline, add the x-ecutable bit
'mode change' character to the error message. This message now has
the same form as similar messages output by 'update-index'.

Signed-off-by: Ramsay Jones 
---

Hi Junio,

This is v2 of the earlier "add a newline" patch. Thanks!

ATB,
Ramsay Jones

 builtin/add.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index e888fb8c5..5d5773d5c 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -32,7 +32,7 @@ struct update_callback_data {
int add_errors;
 };
 
-static void chmod_pathspec(struct pathspec *pathspec, int force_mode)
+static void chmod_pathspec(struct pathspec *pathspec, char flip)
 {
int i;
 
@@ -42,8 +42,8 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
force_mode)
if (pathspec && !ce_path_match(ce, pathspec, NULL))
continue;
 
-   if (chmod_cache_entry(ce, force_mode) < 0)
-   fprintf(stderr, "cannot chmod '%s'", ce->name);
+   if (chmod_cache_entry(ce, flip) < 0)
+   fprintf(stderr, "cannot chmod %cx '%s'\n", flip, 
ce->name);
}
 }
 
-- 
2.14.0

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Jonathan Nieder

Jonathan Nieder wrote:
> Stefan Beller wrote:
>>> Stefan Beller wrote:

 Nowadays there are better tutorials out there such as "Git from bottom up"
 or others, easily found online. Additionally to that a tutorial in our
 test suite is not as easy to discover as e.g. online tutorials.
[...]
>> 2ae6c70674 (Adapt tutorial to cygwin and add test case, 2005-10-13)
>> seemed to imply that it was testing some part for Documentation/tutorial.txt
>> though.
>
> Oh, good point.
>
> v1.2.0~121 (New tutorial, 2006-01-22) means that the test is no longer
> testing what is in the tutorial in any meaningful sense.  That's why
> my search for "git whatchanged -p --root" in manpages didn't find
> anything.

Correction: the tutorial is now called gitcore-tutorial and mostly
survives.  A search for -p --root failed because of v1.5.5.1~19^2
(core-tutorial.txt: Fix showing the current behaviour, 2008-04-10).

That said, the conclusion that this test has mostly bitrotted as far
as its original purpose goes still applies.

An alternative method of addressing the goal you described would be to
fuzz object-id shaped things out of the sample output.  I don't have a
strong preference, given how little this test contributes to coverage
(as you mentioned) and how difficult it is to make it continue to
match the documentation it is trying to test.

Thanks and sorry for the confusion,
Jonathan

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Jonathan Nieder

Stefan Beller wrote:
> On Tue, Aug 8, 2017 at 5:07 PM, Jonathan Nieder  wrote:
>> Stefan Beller wrote:

>>> Nowadays there are better tutorials out there such as "Git from bottom up"
>>> or others, easily found online. Additionally to that a tutorial in our
>>> test suite is not as easy to discover as e.g. online tutorials.
[...]
>>> Signed-off-by: Stefan Beller 
>>> ---
>>>  t/t1200-tutorial.sh | 268 
>>> 
>>>  1 file changed, 268 deletions(-)
>>>  delete mode 100755 t/t1200-tutorial.sh
>>
>> Interesting.  When I first saw the diffstat I assumed you were talking
>> about a test that validates the examples in some manpage are correct.
>> But this is not that.
[...]
> 2ae6c70674 (Adapt tutorial to cygwin and add test case, 2005-10-13)
> seemed to imply that it was testing some part for Documentation/tutorial.txt
> though.

Oh, good point.

v1.2.0~121 (New tutorial, 2006-01-22) means that the test is no longer
testing what is in the tutorial in any meaningful sense.  That's why
my search for "git whatchanged -p --root" in manpages didn't find
anything.

So what your patch does still seems reasonable (we have lived fine
without a test validating the examples in that tutorial, and if we
really want a test validating the examples then we should find a way
to automatically extract it), but the description is misleading.

With a corrected description, my Reviewed-by would apply.

Thanks for catching it.

Jonathan

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 5:07 PM, Jonathan Nieder  wrote:
> Hi,
>
> Stefan Beller wrote:
>
>> Nowadays there are better tutorials out there such as "Git from bottom up"
>> or others, easily found online. Additionally to that a tutorial in our
>> test suite is not as easy to discover as e.g. online tutorials.
>>
>> This test/tutorial was discovered by the patch author in the effort to
>> migrate our tests in preparation to switch the hashing function.
>> Transforming this tutorial to be agnostic of the underlying hash function
>> would hurt its readability, hence being even less useful as a tutorial.
>>
>> Instead delete this test as
>> (a) the functionality is tested elsewhere as well and
>> (b) reducing the test suite to its core improves performance, which
>> aids developers in keeping their development velocity.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>>  t/t1200-tutorial.sh | 268 
>> 
>>  1 file changed, 268 deletions(-)
>>  delete mode 100755 t/t1200-tutorial.sh
>
> Interesting.  When I first saw the diffstat I assumed you were talking
> about a test that validates the examples in some manpage are correct.
> But this is not that.
>
> There indeed appear to be other good tests for these commands, even
> "git whatchanged", so for what it's worth,
>
> Reviewed-by: Jonathan Nieder 
>
> Thanks.

2ae6c70674 (Adapt tutorial to cygwin and add test case, 2005-10-13)
seemed to imply that it was testing some part for Documentation/tutorial.txt
though.

Re: [PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Jonathan Nieder

Hi,

Stefan Beller wrote:

> Nowadays there are better tutorials out there such as "Git from bottom up"
> or others, easily found online. Additionally to that a tutorial in our
> test suite is not as easy to discover as e.g. online tutorials.
>
> This test/tutorial was discovered by the patch author in the effort to
> migrate our tests in preparation to switch the hashing function.
> Transforming this tutorial to be agnostic of the underlying hash function
> would hurt its readability, hence being even less useful as a tutorial.
>
> Instead delete this test as
> (a) the functionality is tested elsewhere as well and
> (b) reducing the test suite to its core improves performance, which
> aids developers in keeping their development velocity.
>
> Signed-off-by: Stefan Beller 
> ---
>  t/t1200-tutorial.sh | 268 
> 
>  1 file changed, 268 deletions(-)
>  delete mode 100755 t/t1200-tutorial.sh

Interesting.  When I first saw the diffstat I assumed you were talking
about a test that validates the examples in some manpage are correct.
But this is not that.

There indeed appear to be other good tests for these commands, even
"git whatchanged", so for what it's worth,

Reviewed-by: Jonathan Nieder 

Thanks.

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Shawn Pearce

On Tue, Aug 8, 2017 at 4:34 PM, Junio C Hamano  wrote:
> Shawn Pearce  writes:
>
>> Given that the index can now also be multi-level, I don't expect to
>> see a 2G index. A 2G index forces the reader to load the entire 2G to
>> take advantage of the restart table. It may be more efficient for such
>> a reader to have had the writer make a mutli-level index, instead of a
>> single monster index block. And so perhaps the writer shouldn't make a
>> 2G index block that she is forced to buffer. :)
>
> Ah, OK, then it is sensible to have all table blocks to have the
> same format, and restart at the beginning to help readers would be a
> fine choice.  For the same "let's make them as consistent" sake, I
> am tempted to suggest that we lift "the index block can be 2G" and
> have it also be within uint_24(), perhaps?  Otherwise the readers
> would have to read (or mmap) the whole 2G.

Gah. I just finished moving the restart table back to the end of the block. :)

However, I think I can agree with the index fitting into the uint24
size of 15M, and asking writers making an index that exceeds that to
use multi-level indexing.

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Ramsay Jones



On 08/08/17 22:45, René Scharfe wrote:
> Am 08.08.2017 um 23:36 schrieb Ramsay Jones:
>>
>> Signed-off-by: Ramsay Jones 
>> ---
>>
>> Hi Junio,
>>
>> I noticed this while looking into the t3700 failure on cygwin tonight.
>> Also, I couldn't decide whether or not to add the i18n '_()' brackets
>> around the message. In the end I didn't, but will happily add them
>> if you think I should.
>>
>> Thanks!
>>
>> ATB,
>> Ramsay Jones
>>
>>   builtin/add.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/builtin/add.c b/builtin/add.c
>> index e888fb8c5..385b53ae7 100644
>> --- a/builtin/add.c
>> +++ b/builtin/add.c
>> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
>> force_mode)
>>  continue;
>>   
>>  if (chmod_cache_entry(ce, force_mode) < 0)
>> -fprintf(stderr, "cannot chmod '%s'", ce->name);
>> +fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>>  }
>>   }
>>   
> 
> FYI: I brought this up yesterday in the original thread, along with a
> few other observations:
> 
>   https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/

Ah, I missed that.

Hmm, I just looked at the code in builtin/update-index.c. Yes, it
would probably be a good idea to harmonize the messages - but just
where did 'flip' come from? ;-)

ATB,
Ramsay Jones

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Stefan Beller

On Mon, Aug 7, 2017 at 11:30 AM, Shawn Pearce  wrote:
> On Mon, Aug 7, 2017 at 11:27 AM, Stefan Beller  wrote:
>> On Sun, Aug 6, 2017 at 6:47 PM, Shawn Pearce  wrote:
>>> 6th iteration of the reftable storage format.
>>>
>>> You can read a rendered version of this here:
>>> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>>>
>>> The index may be organized into a multi-level index, where ...
>>> which may in turn point to either index blocks (3rd level) or ref blocks 
>>> (leaf level).
>>
>> So we allow 3 levels at most?
>
> No, its just an example. Large ref sets with small block size need 4
> levels. Or more.

A malicious (or buggy) writer can produce indexes pointing to
each other producing a circle. (Who would do that?)

A reader should  - instead of segfaulting due to unbounded
recursion or being stuck in an infinite loop - ignore the indexes
in this case and fallback to the slow non-indexed behavior,
i.e. while the file format allows for unbounded levels, a reader
should not.

[PATCH] t1200: remove t1200-tutorial.sh

2017-08-08 Thread Stefan Beller

Nowadays there are better tutorials out there such as "Git from bottom up"
or others, easily found online. Additionally to that a tutorial in our
test suite is not as easy to discover as e.g. online tutorials.

This test/tutorial was discovered by the patch author in the effort to
migrate our tests in preparation to switch the hashing function.
Transforming this tutorial to be agnostic of the underlying hash function
would hurt its readability, hence being even less useful as a tutorial.

Instead delete this test as
(a) the functionality is tested elsewhere as well and
(b) reducing the test suite to its core improves performance, which
aids developers in keeping their development velocity.

Signed-off-by: Stefan Beller 
---
 t/t1200-tutorial.sh | 268 
 1 file changed, 268 deletions(-)
 delete mode 100755 t/t1200-tutorial.sh

diff --git a/t/t1200-tutorial.sh b/t/t1200-tutorial.sh
deleted file mode 100755
index 397ccb6909..00
--- a/t/t1200-tutorial.sh
+++ /dev/null
@@ -1,268 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2005 Johannes Schindelin
-#
-
-test_description='A simple turial in the form of a test case'
-
-. ./test-lib.sh
-
-test_expect_success 'blob'  '
-   echo "Hello World" > hello &&
-   echo "Silly example" > example &&
-
-   git update-index --add hello example &&
-
-   test blob = "$(git cat-file -t 557db03)"
-'
-
-test_expect_success 'blob 557db03' '
-   test "Hello World" = "$(git cat-file blob 557db03)"
-'
-
-echo "It's a new day for git" >>hello
-cat > diff.expect << EOF
-diff --git a/hello b/hello
-index 557db03..263414f 100644
 a/hello
-+++ b/hello
-@@ -1 +1,2 @@
- Hello World
-+It's a new day for git
-EOF
-
-test_expect_success 'git diff-files -p' '
-   git diff-files -p > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-test_expect_success 'git diff' '
-   git diff > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-test_expect_success 'tree' '
-   tree=$(git write-tree 2>/dev/null) &&
-   test 8988da15d077d4829fc51d8544c097def6644dbb = $tree
-'
-
-test_expect_success 'git diff-index -p HEAD' '
-   test_tick &&
-   tree=$(git write-tree) &&
-   commit=$(echo "Initial commit" | git commit-tree $tree) &&
-   git update-ref HEAD $commit &&
-   git diff-index -p HEAD > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-test_expect_success 'git diff HEAD' '
-   git diff HEAD > diff.output &&
-   test_cmp diff.expect diff.output
-'
-
-cat > whatchanged.expect << EOF
-commit VARIABLE
-Author: VARIABLE
-Date:   VARIABLE
-
-Initial commit
-
-diff --git a/example b/example
-new file mode 100644
-index 000..f24c74a
 /dev/null
-+++ b/example
-@@ -0,0 +1 @@
-+Silly example
-diff --git a/hello b/hello
-new file mode 100644
-index 000..557db03
 /dev/null
-+++ b/hello
-@@ -0,0 +1 @@
-+Hello World
-EOF
-
-test_expect_success 'git whatchanged -p --root' '
-   git whatchanged -p --root |
-   sed -e "1s/^\(.\{7\}\).\{40\}/\1VARIABLE/" \
-   -e "2,3s/^\(.\{8\}\).*$/\1VARIABLE/" \
-   > whatchanged.output &&
-   test_cmp whatchanged.expect whatchanged.output
-'
-
-test_expect_success 'git tag my-first-tag' '
-   git tag my-first-tag &&
-   test_cmp .git/refs/heads/master .git/refs/tags/my-first-tag
-'
-
-test_expect_success 'git checkout -b mybranch' '
-   git checkout -b mybranch &&
-   test_cmp .git/refs/heads/master .git/refs/heads/mybranch
-'
-
-cat > branch.expect < branch.output &&
-   test_cmp branch.expect branch.output
-'
-
-test_expect_success 'git resolve now fails' '
-   git checkout mybranch &&
-   echo "Work, work, work" >>hello &&
-   test_tick &&
-   git commit -m "Some work." -i hello &&
-
-   git checkout master &&
-
-   echo "Play, play, play" >>hello &&
-   echo "Lots of fun" >>example &&
-   test_tick &&
-   git commit -m "Some fun." -i hello example &&
-
-   test_must_fail git merge -m "Merge work in mybranch" mybranch
-'
-
-cat > hello << EOF
-Hello World
-It's a new day for git
-Play, play, play
-Work, work, work
-EOF
-
-cat > show-branch.expect << EOF
-* [master] Merge work in mybranch
- ! [mybranch] Some work.
---
--  [master] Merge work in mybranch
-*+ [mybranch] Some work.
-*  [master^] Some fun.
-EOF
-
-test_expect_success 'git show-branch' '
-   test_tick &&
-   git commit -m "Merge work in mybranch" -i hello &&
-   git show-branch --topo-order --more=1 master mybranch \
-   > show-branch.output &&
-   test_cmp show-branch.expect show-branch.output
-'
-
-cat > resolve.expect << EOF
-Updating VARIABLE..VARIABLE
-FASTFORWARD (no commit created; -m option ignored)
- example | 1 +
- hello   | 1 +
- 2 files changed, 2 insertions(+)
-EOF
-
-test_expect_success 'git resolve' '
-   git checkout mybranch &&
-   git merge -m "Merge upstream changes." master |

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Junio C Hamano

Shawn Pearce  writes:

> Given that the index can now also be multi-level, I don't expect to
> see a 2G index. A 2G index forces the reader to load the entire 2G to
> take advantage of the restart table. It may be more efficient for such
> a reader to have had the writer make a mutli-level index, instead of a
> single monster index block. And so perhaps the writer shouldn't make a
> 2G index block that she is forced to buffer. :)

Ah, OK, then it is sensible to have all table blocks to have the
same format, and restart at the beginning to help readers would be a
fine choice.  For the same "let's make them as consistent" sake, I
am tempted to suggest that we lift "the index block can be 2G" and
have it also be within uint_24(), perhaps?  Otherwise the readers
would have to read (or mmap) the whole 2G.

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jeff King

On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote:

> > Interesting.  I see that we still have the conditional code to call
> > out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
> > over there, I wonder?  Alternatively, as we have had the experimental
> > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
> > perhaps we should write it off as a failed experiment and retire it?
> 
> There is also sha1_pos(), which seems to have the same problem (and is
> used in several places).

Actually, I take it back. The problem happens when we enter the loop
with no entries to look at. But both sha1_pos() and sha1_entry_pos()
return early before hitting their do-while loops in that case.

-Peff

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jeff King

On Tue, Aug 08, 2017 at 03:43:13PM -0700, Junio C Hamano wrote:

> > @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
> > hi = mi;
> > else
> > lo = mi+1;
> > -   } while (lo < hi);
> > +   }
> > return 0;
> >  }
> 
> Interesting.  I see that we still have the conditional code to call
> out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
> over there, I wonder?  Alternatively, as we have had the experimental
> sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
> perhaps we should write it off as a failed experiment and retire it?

There is also sha1_pos(), which seems to have the same problem (and is
used in several places).

-Peff

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

> find_pack_entry_one() uses the fan-out table of pack indexes to find out
> which entries match the first byte of the searched hash and does a
> binary search on this subset of the main index table.
>
> If there are no matching entries then lo and hi will have the same
> value.  The binary search still starts and compares the hash of the
> following entry (which has a non-matching first byte, so won't cause any
> trouble), or whatever comes after the sorted list of entries.
>
> The probability of that stray comparison matching by mistake is low, but
> let's not take any chances and check when entering the binary search
> loop if we're actually done already.
>
> Signed-off-by: Rene Scharfe 
> ---
>  sha1_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15f70..11ee69a99d 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2799,7 +2799,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   return nth_packed_object_offset(p, pos);
>   }
>  
> - do {
> + while (lo < hi) {
>   unsigned mi = (lo + hi) / 2;
>   int cmp = hashcmp(index + mi * stride, sha1);
>  
> @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   hi = mi;
>   else
>   lo = mi+1;
> - } while (lo < hi);
> + }
>   return 0;
>  }

Interesting.  I see that we still have the conditional code to call
out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
over there, I wonder?  Alternatively, as we have had the experimental
sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
perhaps we should write it off as a failed experiment and retire it?

Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jonathan Nieder

René Scharfe wrote:

> find_pack_entry_one() uses the fan-out table of pack indexes to find out
> which entries match the first byte of the searched hash and does a
> binary search on this subset of the main index table.
>
> If there are no matching entries then lo and hi will have the same
> value.  The binary search still starts and compares the hash of the
> following entry (which has a non-matching first byte, so won't cause any
> trouble), or whatever comes after the sorted list of entries.
>
> The probability of that stray comparison matching by mistake is low, but
> let's not take any chances and check when entering the binary search
> loop if we're actually done already.
>
> Signed-off-by: Rene Scharfe 
> ---
>  sha1_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks for a clear explanation.

Sanity checking: is this correct in the sha1[0] == 0 case?  In that
case, we have lo = 0, hi = the 0th offset from the fanout table.  The
offsets in the fanout table are defined as "the number of objects in
the corresponding pack, the first byte of whose object name is less
than or equal to N."  So hi == lo would mean there are no objects with
id starting with 0, as hoped.

Or in other words, the [lo, hi) interval we're trying to search is
indeed a half-open interval.

Reviewed-by: Jonathan Nieder

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread René Scharfe

Am 09.08.2017 um 00:26 schrieb Junio C Hamano:
> Junio C Hamano  writes:
> 
>> So I find Dscho's concern quite valid, even though I do believe you
>> when you say the code somehow segfaults.  I just can not tell
>> how/why it would segfault, though---it is possible that regexec()
>> implementation is stupid and does not realize that it can return early
>> reporting success without looking at the rest of the buffer, but
>> somehow I find it unlikely.
>>
>> Puzzled.
>>
>>> You get different results?  How is that possible?  The search string is
>>> NUL-terminated in each case, while the point of the test is that the
>>> file contents isn't, right?
> 
> Intellectual curiosity tells me we may want to find out why it
> fails, but in the meantime, I think replacing the test with "0$" to
> force the scanner to find either the end of line or the end of the
> buffer may be a good workaround.  We do not have to care how many of
> random bytes are in front of the last "0" in order to ensure that
> the regexec_buf() does not overstep to 4097th byte, while seeing
> that regexec() that does not know how long the haystack is has to do
> so, no?

Our regexec() calls strlen() (see my other reply).

Using "0$" looks like the best option to me.

René

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Jonathan Nieder

Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 

Reviewed-by: Jonathan Nieder 

Thanks for catching it.

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Jonathan Nieder

Junio C Hamano wrote:
> René Scharfe  writes:

>>> diff --git a/builtin/add.c b/builtin/add.c
>>> index e888fb8c5..385b53ae7 100644
>>> --- a/builtin/add.c
>>> +++ b/builtin/add.c
>>> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
>>> force_mode)
>>> continue;
>>>   
>>> if (chmod_cache_entry(ce, force_mode) < 0)
>>> -   fprintf(stderr, "cannot chmod '%s'", ce->name);
>>> +   fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>>> }
>>>   }
>>
>> FYI: I brought this up yesterday in the original thread, along with a
>> few other observations:
>>
>>   https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/
>>
>> Not sure if the discussion can or should be revived after all this
>> time, though; just sending patches like yours might be the way to go.
>
> Thanks, so it should become
>
>   fprintf(stderr, "cannot chmod %c '%s'\n", force_mode, ce->name);
>
> in the final version to be queued?

I don't believe the force_mode without an 'x' provides a clear signal
to the end user.  Perhaps you meant %cx?

Thanks,
Jonathan

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Shawn Pearce

On Tue, Aug 8, 2017 at 12:25 PM, Junio C Hamano  wrote:
> Shawn Pearce  writes:
>
>> For `log_type = 0x4..0x7` the `log_chained` section is used instead to
>> compress information that already appeared in a prior log record.  The
>> `log_chained` always includes `old_id` for this record, as `new_id` is
>> implied by the prior (by file order, more recent) record's `old_id`.
>>
>> The `not_same_committer` block appears if `log_type & 0x1` is true,
>> `not_same_message` block appears if `log_type & 0x2` is true.  When
>> one of these blocks is missing, its values are implied by the prior
>> (more recent) log record.
>
> Two comments.
>
>  * not-same-committer would be what I would use when I switch
>timezones, even if I stay to be me, right?

Correct. This is based on the theory that the timezone in a reflog is
actually the system timezone, not your timezone. If you push to a
remote system, that system's reflog will be using that system's
timezone, not your timezone. So you aren't really that different, and
we can compress the timezone part away. Also, if you do move
timezones, you are likely to remain in that timezone for some period
of time, and such we can compress many log records again with the same
timezone+name+email.

Its ancient history from my research with "pack v4", but people don't
really change timezones very often in the Git committer data. I
suspect its even more true with reflog data.

>  I am just wondering
>if it is clear to everybody that "committer" in that phrase is a
>short-hand for "committer information other than the timestamp".

Maybe not. I will try to come up with another shorthand name for this.

>  * Should the set of entries that are allowed to use of "chained"
>log be related to the set of entries that appear in the restart
>table in any way?  For a reader that scans starting at a restart
>point, it would be very cumbersome if the entry were chained from
>the previous entry, as it would force it to backtrack entries to
>find the first non-chained log entry.  A simple "log_chained must
>not be used for an entry that appear in the restart table" rule
>would solve that, but I didn't see it in the document.

Good catch!  This is implemented as you described in JGit (for the
reasons you described), but not documented. I'll fix it.

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread René Scharfe

Am 09.08.2017 um 00:09 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> Am 08.08.2017 um 16:49 schrieb Johannes Schindelin:
>>> Hi René,
>>>
>>> On Tue, 8 Aug 2017, René Scharfe wrote:
>>>
 OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255.
 That's the minimum acceptable value according to POSIX.  In t4062 we use
 4096 repetitions in the test "-G matches", though, causing it to fail.

 Do the same as the test "-S --pickaxe-regex" in the same file and search
 for a single zero instead.  That still suffices to trigger the buffer
 overrun in older versions (checked with b7d36ffca02^ and --valgrind on
 Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit.
>>>
>>> I am afraid not. The 4096 is precisely the page size required to trigger
>>> the bug on Windows against which this regression test tries to safeguard.
>>
>> Checked with b7d36ffca02^ on MinGW now as well and found that it
>> segfaults with the proposed change ten out of ten times.
> 
> That is a strange result but I can believe it.
> 
> The reason why I find it strange is that the test wants to run
> diff_grep() in diffcore-pickaxe.c with one == NULL (because we are
> looking at difference between an initial empty commit and the
> current commit that adds 4096-zeroes.txt file), which makes the
> current blob (i.e. a page of '0' that may be mmap(2)ed without
> trailing NUL to terminate it) scanned via regexec() to look for the
> search string.
> 
> I can understand why Dscho originally did "^0{4096}$"; it is to
> ensure that the whole page is scanned for 4096 zeroes and then the
> code would somehow make sure that there is no more byte until the
> end of line, which will force regexec (but not regexec_buf that knows
> where the buffer ends) to look at the 4097th byte that does not exist.
> 
> If you change the pattern to just "0" that is not anchored, I'd expect
> regexec() that does not know how big the haystack is to just find "0"
> at the first byte and happily return without segfaulting (because it
> does not even have to scan the remainder of the buffer).
> 
> So I find Dscho's concern quite valid, even though I do believe you
> when you say the code somehow segfaults.  I just can not tell
> how/why it would segfault, though---it is possible that regexec()
> implementation is stupid and does not realize that it can return early
> reporting success without looking at the rest of the buffer, but
> somehow I find it unlikely.
> 
> Puzzled.

Good point.  Valgrind reports:

==57466== Process terminating with default action of signal 11 (SIGSEGV): 
dumping core
==57466==  Access not within mapped region at address 0x4027000
==57466==at 0x4C2EDF4: strlen (vg_replace_strmem.c:458)
==57466==by 0x59D9F76: regexec@@GLIBC_2.3.4 (regexec.c:240)
==57466==by 0x54D96E: diff_grep (diffcore-pickaxe.c:0)
==57466==by 0x54DAC3: pickaxe_match (diffcore-pickaxe.c:149)

And you can see in our version in compat/regex/regexec.c:241 that the
first thing regexec() does is calling strlen().

So to avoid depending on that implementation detail we'd need to use
a search string that won't be found (e.g. "1") or with unlimited
repetition (e.g. "0*"), right?

René

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Shawn Pearce

On Tue, Aug 8, 2017 at 12:01 PM, Junio C Hamano  wrote:
> I notice that you raised the location of restart table within a
> block in this iteration (or maybe it happened in v5).
>
> This forces you to hold all contents in core before the first byte
> is written out.  You start from the first entry (which will become
> the first restart entry), emit a handful as prefix compressed
> entries, emit a full entry (which will become the next restart
> entry), ... until you have enough to fill both the data and the
> restart table, then start writing from the header (which needs the
> length of the block), restart table and then data.
>
> I think it is OK to do so for the blocks whose size is limited to
> 16M, but I wonder if it is sensible to do the same for the index
> block whose limit is 2G.  If you keep the restart table after the
> data, then you could stream out the entries as you emit, write the
> restart table, and then seek back to fix the length in the header,
> without holding the 2G in core, no?

Yes. I'm torn on the ordering: restart table first or restart table last.

The advantage of it first is the reader can immediately work with it,
without necessarily touching the rest of the block. The disadvantage
is a writer can only stream at block sizes, as the writer is forced to
buffer the entire block. As it happens my implementation in JGit
buffers the entire block anyway, so this didn't really factor as an
issue for me.

Given that the index can now also be multi-level, I don't expect to
see a 2G index. A 2G index forces the reader to load the entire 2G to
take advantage of the restart table. It may be more efficient for such
a reader to have had the writer make a mutli-level index, instead of a
single monster index block. And so perhaps the writer shouldn't make a
2G index block that she is forced to buffer. :)

Perhaps I'll move it back to the tail of the block. I can see the
streaming writer code is maybe more straightforward that way.

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread Junio C Hamano

Junio C Hamano  writes:

> So I find Dscho's concern quite valid, even though I do believe you
> when you say the code somehow segfaults.  I just can not tell
> how/why it would segfault, though---it is possible that regexec()
> implementation is stupid and does not realize that it can return early
> reporting success without looking at the rest of the buffer, but
> somehow I find it unlikely.
>
> Puzzled.
>
>> You get different results?  How is that possible?  The search string is
>> NUL-terminated in each case, while the point of the test is that the
>> file contents isn't, right?

Intellectual curiosity tells me we may want to find out why it
fails, but in the meantime, I think replacing the test with "0$" to
force the scanner to find either the end of line or the end of the
buffer may be a good workaround.  We do not have to care how many of
random bytes are in front of the last "0" in order to ensure that
the regexec_buf() does not overstep to 4097th byte, while seeing
that regexec() that does not know how long the haystack is has to do
so, no?

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

>> diff --git a/builtin/add.c b/builtin/add.c
>> index e888fb8c5..385b53ae7 100644
>> --- a/builtin/add.c
>> +++ b/builtin/add.c
>> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
>> force_mode)
>>  continue;
>>   
>>  if (chmod_cache_entry(ce, force_mode) < 0)
>> -fprintf(stderr, "cannot chmod '%s'", ce->name);
>> +fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>>  }
>>   }
>
> FYI: I brought this up yesterday in the original thread, along with a
> few other observations:
>
>   https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/
>
> Not sure if the discussion can or should be revived after all this
> time, though; just sending patches like yours might be the way to go.

Thanks, so it should become

fprintf(stderr, "cannot chmod %c '%s'\n", force_mode, ce->name);

in the final version to be queued?

Re: [PATCH] t4062: stop using repetition in regex

2017-08-08 Thread Junio C Hamano

René Scharfe  writes:

> Am 08.08.2017 um 16:49 schrieb Johannes Schindelin:
>> Hi René,
>> 
>> On Tue, 8 Aug 2017, René Scharfe wrote:
>> 
>>> OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255.
>>> That's the minimum acceptable value according to POSIX.  In t4062 we use
>>> 4096 repetitions in the test "-G matches", though, causing it to fail.
>>>
>>> Do the same as the test "-S --pickaxe-regex" in the same file and search
>>> for a single zero instead.  That still suffices to trigger the buffer
>>> overrun in older versions (checked with b7d36ffca02^ and --valgrind on
>>> Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit.
>> 
>> I am afraid not. The 4096 is precisely the page size required to trigger
>> the bug on Windows against which this regression test tries to safeguard.
>
> Checked with b7d36ffca02^ on MinGW now as well and found that it
> segfaults with the proposed change ten out of ten times.

That is a strange result but I can believe it.

The reason why I find it strange is that the test wants to run
diff_grep() in diffcore-pickaxe.c with one == NULL (because we are
looking at difference between an initial empty commit and the
current commit that adds 4096-zeroes.txt file), which makes the
current blob (i.e. a page of '0' that may be mmap(2)ed without
trailing NUL to terminate it) scanned via regexec() to look for the
search string.

I can understand why Dscho originally did "^0{4096}$"; it is to
ensure that the whole page is scanned for 4096 zeroes and then the
code would somehow make sure that there is no more byte until the
end of line, which will force regexec (but not regexec_buf that knows
where the buffer ends) to look at the 4097th byte that does not exist.

If you change the pattern to just "0" that is not anchored, I'd expect
regexec() that does not know how big the haystack is to just find "0"
at the first byte and happily return without segfaulting (because it
does not even have to scan the remainder of the buffer).

So I find Dscho's concern quite valid, even though I do believe you
when you say the code somehow segfaults.  I just can not tell
how/why it would segfault, though---it is possible that regexec()
implementation is stupid and does not realize that it can return early
reporting success without looking at the rest of the buffer, but
somehow I find it unlikely.

Puzzled.

> You get different results?  How is that possible?  The search string is
> NUL-terminated in each case, while the point of the test is that the
> file contents isn't, right?

[PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread René Scharfe

find_pack_entry_one() uses the fan-out table of pack indexes to find out
which entries match the first byte of the searched hash and does a
binary search on this subset of the main index table.

If there are no matching entries then lo and hi will have the same
value.  The binary search still starts and compares the hash of the
following entry (which has a non-matching first byte, so won't cause any
trouble), or whatever comes after the sorted list of entries.

The probability of that stray comparison matching by mistake is low, but
let's not take any chances and check when entering the binary search
loop if we're actually done already.

Signed-off-by: Rene Scharfe 
---
 sha1_file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15f70..11ee69a99d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2799,7 +2799,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
return nth_packed_object_offset(p, pos);
}
 
-   do {
+   while (lo < hi) {
unsigned mi = (lo + hi) / 2;
int cmp = hashcmp(index + mi * stride, sha1);
 
@@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
hi = mi;
else
lo = mi+1;
-   } while (lo < hi);
+   }
return 0;
 }
 
-- 
2.14.0

Re: [PATCH] Convert size datatype to size_t

2017-08-08 Thread Junio C Hamano

Martin Koegler  writes:

> diff --git a/apply.c b/apply.c
> index f2d5991..ea97fd2 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -3085,7 +3085,7 @@ static int apply_binary_fragment(struct apply_state 
> *state,
>struct patch *patch)
>  {
>   struct fragment *fragment = patch->fragments;
> - unsigned long len;
> + size_t len;
>   void *dst;
>  
>   if (!fragment)

This variable is made size_t because it receives the result size of
patch_delta().  And it later is assigned to img->len field, which
already is size_t before this patch.

Curiously, the patch_delta() invocation with this patch reads like
this:

dst = patch_delta(img->buf, img->len, fragment->patch,
  fragment->size, );

where patch_delta() is updated (correctly) to:

void *patch_delta(const void *src_buf, size_t src_size,
  const void *delta_buf, size_t delta_size,
  size_t *dst_size)

with this patch.  But "size" field in "struct fragment" is still
"int".  So we'd need to update it as well, not necessarily in this
patch (which is already too big) but as part of a larger whole.

> @@ -3174,7 +3174,7 @@ static int apply_binary(struct apply_state *state,
>   if (has_sha1_file(oid.hash)) {
>   /* We already have the postimage */
>   enum object_type type;
> - unsigned long size;
> + size_t size;
>   char *result;
>  
>   result = read_sha1_file(oid.hash, , );

This is to receive the resulting size from read_sha1_file().  It is
assigned to img->len, which is already size_t, so all is good here.

> @@ -3236,7 +3236,7 @@ static int read_blob_object(struct strbuf *buf, const 
> struct object_id *oid, uns
>   strbuf_addf(buf, "Subproject commit %s\n", oid_to_hex(oid));
>   } else {
>   enum object_type type;
> - unsigned long sz;
> + size_t sz;
>   char *result;
>  
>   result = read_sha1_file(oid->hash, , );

By reading the remainder of this function, this conversion also is
good.  sz that is now size_t is used as the size attached to an
existing strbuf like so:

result = read_sha1_file(oid->hash, , );
if (!result)
return -1;
/* XXX read_sha1_file NUL-terminates */
strbuf_attach(buf, result, sz, sz + 1);

in the part beyond the post context of this hunk.  In the longer
term, sz+1 we see here may want to become the overflow-safe variant
st_add().

As you said in the comment after three-dashes in the patch, a lot
more work is needed and your patch is a good starting point.

I am not sure if we can split the patch somehow to make it easier to
review.  The deceptively small part of your patch, i.e.

 apply.c  |  6 +++---

needs the above analysis to see if they are correct and what more
work is necessary, and there are 65 more files with ~190 lines
changed.

Re: [PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread René Scharfe

Am 08.08.2017 um 23:36 schrieb Ramsay Jones:
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Junio,
> 
> I noticed this while looking into the t3700 failure on cygwin tonight.
> Also, I couldn't decide whether or not to add the i18n '_()' brackets
> around the message. In the end I didn't, but will happily add them
> if you think I should.
> 
> Thanks!
> 
> ATB,
> Ramsay Jones
> 
>   builtin/add.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/builtin/add.c b/builtin/add.c
> index e888fb8c5..385b53ae7 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
> force_mode)
>   continue;
>   
>   if (chmod_cache_entry(ce, force_mode) < 0)
> - fprintf(stderr, "cannot chmod '%s'", ce->name);
> + fprintf(stderr, "cannot chmod '%s'\n", ce->name);
>   }
>   }
>   

FYI: I brought this up yesterday in the original thread, along with a
few other observations:

  https://public-inbox.org/git/3c61d9f6-e0fd-22a4-68e0-89fd9ce9b...@web.de/

Not sure if the discussion can or should be revived after all this
time, though; just sending patches like yours might be the way to go.

René

[PATCH] builtin/add: add a missing newline to an stderr message

2017-08-08 Thread Ramsay Jones


Signed-off-by: Ramsay Jones 
---

Hi Junio,

I noticed this while looking into the t3700 failure on cygwin tonight.
Also, I couldn't decide whether or not to add the i18n '_()' brackets
around the message. In the end I didn't, but will happily add them
if you think I should.

Thanks!

ATB,
Ramsay Jones

 builtin/add.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/add.c b/builtin/add.c
index e888fb8c5..385b53ae7 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -43,7 +43,7 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
force_mode)
continue;
 
if (chmod_cache_entry(ce, force_mode) < 0)
-   fprintf(stderr, "cannot chmod '%s'", ce->name);
+   fprintf(stderr, "cannot chmod '%s'\n", ce->name);
}
 }
 
-- 
2.14.0

Re: [PATCH] Convert size datatype to size_t

2017-08-08 Thread Junio C Hamano

Martin Koegler  writes:

> From: Martin Koegler 
>
> It changes the signature of the core object access function
> including any other functions to assure a clean compile if
> sizeof(size_t) != sizeof(unsigned long).
>
> Signed-off-by: Martin Koegler 
> ---
> ...
>  66 files changed, 193 insertions(+), 191 deletions(-)

Wow, that's a lot of changes.  What version did you worked this
patch on?  The reason I ask is because...

> diff --git a/diff-delta.c b/diff-delta.c
> index cd238c8..3d5e1ef 100644

... I do not see any version of Git that had blob cd238c8 at that
path, so "git am -3" is having hard time applying this pach to allow
me reviewing it.

Re: [RFC PATCH 00/10] An attempt to move packfile funcs to its own file

2017-08-08 Thread Junio C Hamano

Jonathan Tan  writes:

> What do you mean by "keep the exposed surface area small enough"? If you
> mean the total number of exposed functions in sha1_file and pack (once
> everything is done), I think it will be almost the same as that
> currently in sha1_file.
> ...
> During this patch set, there might be some functions that need to be
> temporarily made global, but those are reverted to static in the end.

That is exactly what I meant.

> As stated above, I don't think so, but I'll make a list of the functions
> needing to be made global.

Good.

Thanks.

Re: [PATCH 2/2] filter-branch: Handle rewritting (very) old style tags which lack tagger

2017-08-08 Thread Junio C Hamano

Ian Campbell  writes:

> Such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel source tree.
>
> Insert a fake tag header, since newer `git mktag` wont accept the input
> otherwise:
>
> $ git cat-file tag v2.6.12-rc2
> object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
> type commit
> tag v2.6.12-rc2
>
> Linux v2.6.12-rc2 release
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.4 (GNU/Linux)
>
> iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc
> wznDbFU45A54dZC8RZ5JxyE=
> =ESRP
> -END PGP SIGNATURE-
>
> $ git cat-file tag v2.6.12-rc2 | git mktag
> error: char76: could not find "tagger "
> fatal: invalid tag signature file
> $ git cat-file tag v2.6.13-rc4 | git mktag
> 7eab951de91d95875ba34ec4c599f37e1208db93
>
> Signed-off-by: Ian Campbell 
> ---
>  git-filter-branch.sh | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index d07db3fee..6927aa2da 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -540,6 +540,9 @@ if [ "$filter_tag_name" ]; then
>   new_sha1=$( ( printf 'object %s\ntype commit\ntag %s\n' 
> \
>   "$new_sha1" "$new_ref"
>   git cat-file tag "$ref" |
> + awk '/^tagger/  { tagged=1 }
> +  /^$/   { if (!tagged && !done) { print 
> "tagger Unknown  0 +" } ; done=1 }
> +  // { print }' |
>   sed -n \
>   -e '1,/^$/{
> /^object /d

What the change wants to do makes perfect sense, but piping output
from awk into sed looks somewhat gross.  Perhaps we'd want to roll
what the existing sed script is trying to do into this new awk
script?

Re: [PATCH 1/2] filter-branch: Add --state-branch to hold pickled copy of ref map

2017-08-08 Thread Junio C Hamano

Ian Campbell  writes:

> Allowing for incremental updates of large trees.

"by doing what" is missing.  And ...

>
> I have been using this as part of the device tree extraction from the Linux
> kernel source since 2013, about time I sent the patch upstream!

... this does not help understanding what is going on.  It belongs
to the space after three dashes.

Perhaps

Subject: filter-branch: stash away ref map in a branch

With "--state-branch=" option, the mapping from
old object names and filtered ones in ./map/ directory is
stashed away in the object database, and the one from the
previous run is read to populate the ./map/ directory,
allowing for incremental updates of large trees.

or something?

>
> Signed-off-by: Ian Campbell 
> ---
>  git-filter-branch.sh | 39 ++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 3a74602ef..d07db3fee 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -86,7 +86,7 @@ USAGE="[--setup ] [--env-filter ]
>   [--parent-filter ] [--msg-filter ]
>   [--commit-filter ] [--tag-name-filter ]
>   [--subdirectory-filter ] [--original ]
> - [-d ] [-f | --force]
> + [-d ] [-f | --force] [--state-branch ]
>   [--] [...]"
>  
>  OPTIONS_SPEC=
> @@ -106,6 +106,7 @@ filter_msg=cat
>  filter_commit=
>  filter_tag_name=
>  filter_subdir=
> +state_branch=
>  orig_namespace=refs/original/
>  force=
>  prune_empty=
> @@ -181,6 +182,9 @@ do
>   --original)
>   orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
>   ;;
> + --state-branch)
> + state_branch="$OPTARG"
> + ;;
>   *)
>   usage
>   ;;
> @@ -252,6 +256,20 @@ export GIT_INDEX_FILE
>  # map old->new commit ids for rewriting parents
>  mkdir ../map || die "Could not create map/ directory"
>  
> +if [ -n "$state_branch" ] ; then
> + state_commit=`git show-ref -s "$state_branch"`

I hate to nitpick styles, especially on this script that already has
existing violations, but for completeness:

Style: we prefer to write $(command substitution) instead.
Style: we prefer to write "if test", not "if [".
Style: we prefer to avoid ';' and write "if test condtion" and
   "then" on different lines.

It is a bit curious use of "show-ref".  It is not wrong per-se, but
"git rev-parse" may be more common.  I do not care too deeply either
way, though.

Don't we want to make sure the value given to --state-branch is a
full refname, not just a branch name?  What happens when you say 

filter-branch --state-branch master

by mistake?  "show-ref -s" is likely to show your refs/heads/master,
and other master branches that appear as remote-tracking branches for
the remotes you interact with.

> + if [ -n "$state_commit" ] ; then
> + echo "Populating map from $state_branch ($state_commit)" 1>&2
> + git show "$state_commit":filter.map |
> + perl -n -e 'm/(.*):(.*)/ or die;
> + open F, ">../map/$1" or die;
> + print F "$2" or die;
> + close(F) or die'

The process calling this perl script, which carefully diagnoses
malformed input and dies, does not seem to do anything when it sees
errors.  Intended?

> + else
> + echo "Branch $state_branch does not exist. Will create" 1>&2
> + fi
> +fi
> +
>  # we need "--" only if there are no path arguments in $@
>  nonrevs=$(git rev-parse --no-revs "$@") || exit
>  if test -z "$nonrevs"
> @@ -544,6 +562,25 @@ if [ "$filter_tag_name" ]; then
>   done
>  fi
>  
> +if [ -n "$state_branch" ] ; then
> + echo "Saving rewrite state to $state_branch" 1>&2
> + STATE_BLOB=$(ls ../map |
> + perl -n -e 'chomp();
> + open F, "<../map/$_" or die;
> + chomp($f = ); print "$_:$f\n";' |

I see it somewhat gross to pipe the output of "/bin/ls" to a Perl
script, instead of iterating over "while (<../map/*>)" inside the
script itself.

> + git hash-object -w --stdin )
> + STATE_TREE=$(/bin/echo -e "100644 blob $STATE_BLOB\tfilter.map" | git 
> mktree)
> + STATE_PARENT=$(git show-ref -s "$state_branch")

Don't you already have this in $state_commit?

One advantage of reading $state_branch again at this point is to
detect mistakes of running more than one filter-branch (which may
cause you to read $STATE_PARENT that is different from $state_commit
you read earlier), but I do not think that is being done here, so...

> + unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
> + unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE

Hmph.  I can see that you are trying not to be affected by the
committers and authors of the commits on the branch being filtered
(which are set

Re: [RFC PATCH 01/10] pack: move pack name-related functions

2017-08-08 Thread Jonathan Tan

On Tue, 8 Aug 2017 13:36:24 -0700
Stefan Beller  wrote:

> On Tue, Aug 8, 2017 at 12:32 PM, Jonathan Tan  
> wrote:
> > Currently, sha1_file.c and cache.h contain many functions, both related
> > to and unrelated to packfiles. This makes both files very large and
> > causes an unclear separation of concerns.
> >
> > Create a new file, pack.c, to hold all packfile-related functions
> > currently in sha1_file.c, and designate pack.h to hold these
> > packfile-related functions.
> 
> There are also packed refs, so one could (like I did) think that
> pack.c is for generic packing of things, maybe packfile.c
> would be more clear?

Good point. I'll use packfile.c and packfile.h in the next version.

Re: [RFC PATCH 04/10] pack: move open_pack_index(), parse_pack_index()

2017-08-08 Thread Jonathan Tan

On Tue, 08 Aug 2017 13:19:23 -0700
Junio C Hamano  wrote:

> Jonathan Tan  writes:
> 
> > Signed-off-by: Jonathan Tan 
> > ---
> >  builtin/count-objects.c |   1 +
> >  cache.h |   8 ---
> >  pack.c  | 149 
> > 
> >  pack.h  |   8 +++
> >  sha1_file.c | 140 -
> >  sha1_name.c |   1 +
> >  6 files changed, 159 insertions(+), 148 deletions(-)
> 
> This patch is a bit strange...
> 
> > diff --git a/pack.c b/pack.c
> > index 60d9fc3b0..6edc43228 100644
> > --- a/pack.c
> > +++ b/pack.c
> > ...
> > +static struct packed_git *alloc_packed_git(int extra)
> > +{
> > +   struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
> > +   memset(p, 0, sizeof(*p));
> > +   p->pack_fd = -1;
> > +   return p;
> > +}
> > +
> > +struct packed_git *parse_pack_index(unsigned char *sha1, const char 
> > *idx_path)
> > +{
> > +   const char *path = sha1_pack_name(sha1);
> > +   size_t alloc = st_add(strlen(path), 1);
> > +   struct packed_git *p = alloc_packed_git(alloc);
> > +
> > +   memcpy(p->pack_name, path, alloc); /* includes NUL */
> > +   hashcpy(p->sha1, sha1);
> > +   if (check_packed_git_idx(idx_path, p)) {
> > +   free(p);
> > +   return NULL;
> > +   }
> > +
> > +   return p;
> > +}
> 
> We see these two functions appear in pack.c
> 
> > diff --git a/sha1_file.c b/sha1_file.c
> > index 0de39f480..2e414f5f5 100644
> > --- a/sha1_file.c
> > +++ b/sha1_file.c
> > ...
> > @@ -1300,22 +1176,6 @@ struct packed_git *add_packed_git(const char *path, 
> > size_t path_len, int local)
> > return p;
> >  }
> >  
> > -struct packed_git *parse_pack_index(unsigned char *sha1, const char 
> > *idx_path)
> > -{
> > -   const char *path = sha1_pack_name(sha1);
> > -   size_t alloc = st_add(strlen(path), 1);
> > -   struct packed_git *p = alloc_packed_git(alloc);
> > -
> > -   memcpy(p->pack_name, path, alloc); /* includes NUL */
> > -   hashcpy(p->sha1, sha1);
> > -   if (check_packed_git_idx(idx_path, p)) {
> > -   free(p);
> > -   return NULL;
> > -   }
> > -
> > -   return p;
> > -}
> > -
> 
> And we see parse_pack_index() came from sha1_file.c
> 
> But where did alloc_packed_git() come from?  Was the patch split
> incorrectly or something?
> 
> When I applied the whole series and did
> 
> git blame -s -w -M -C -C master.. pack.c
> 
> expecting that pretty much everything has come from sha1_file.c but
> noticed that some lines were actually blamed to a version of pack.c
> and these functions were among them.

alloc_packed_git() in pack.c is a duplicate of the function of the same
name in sha1_file.c in this patch, because at this patch, there are
still functions in both files using this function. A subsequent patch in
this patch set will remove it from pack.c.

I'll add a note explaining this to this patch in the next version.

Re: [RFC PATCH 00/10] An attempt to move packfile funcs to its own file

2017-08-08 Thread Jonathan Tan

On Tue, 08 Aug 2017 13:05:05 -0700
Junio C Hamano  wrote:

> Jonathan Tan  writes:
> 
> > While investigating annotating packfiles and loose objects to support
> > connectivity checks in partial clones [1], I decided to make the effort
> > to separate packfile-related code from sha1_file.c into its own file, to
> > make it easier to both code such changes and review them. Here is the
> > beginning of those efforts.
> >
> > Is this something worth doing, and if yes, is this in the right
> > direction?
> 
> Overall I think it is a good idea to slim down sha1_file.c *if* we
> can keep the exposed surface area small enough.

What do you mean by "keep the exposed surface area small enough"? If you
mean the total number of exposed functions in sha1_file and pack (once
everything is done), I think it will be almost the same as that
currently in sha1_file.

find_pack_entry() and has_packed_and_bad() (not yet in this patch set)
might need to be changed from static to global, but those are the only 2
I can think of. Anyway, I'll report the functions that need to be
changed from static to global at the end.

During this patch set, there might be some functions that need to be
temporarily made global, but those are reverted to static in the end.

> I wonder if the names "pack.[ch]" communicate well that these are
> "object access layer that is about reading from packfiles".  The
> writer side is called "pack-objects.[ch]".

This file will end up being slightly broader than reading from packfiles
- in particular, things like pack_report() (reporting some statistics
not only on the in-repo packfiles themselves) and parse_pack_index()
(which parses an idx file obtained through http) are there too. Hence
the generic name, but I agree that there might be a better name (or
better set of names).

> This may have to make some symbols that used to be private to the
> "object access" layer, which was what sha1_file.c was about, global
> symbols.  After moving things around, do we end up exposing too many
> implementation details to the world outside the "object access"
> layer?  I'd assume they are limited to the resulting pack.h and it
> would be OK as long as nobody other than sha1_file.c and pack.c
> would inculde it, though.

As stated above, I don't think so, but I'll make a list of the functions
needing to be made global.

Re: [RFC PATCH 01/10] pack: move pack name-related functions

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 12:32 PM, Jonathan Tan  wrote:
> Currently, sha1_file.c and cache.h contain many functions, both related
> to and unrelated to packfiles. This makes both files very large and
> causes an unclear separation of concerns.
>
> Create a new file, pack.c, to hold all packfile-related functions
> currently in sha1_file.c, and designate pack.h to hold these
> packfile-related functions.

There are also packed refs, so one could (like I did) think that
pack.c is for generic packing of things, maybe packfile.c
would be more clear?

Re: [PATCH v2 5/5] sha1_file: support loading lazy objects

2017-08-08 Thread Ben Peart




On 7/31/2017 5:02 PM, Jonathan Tan wrote:

Teach sha1_file to invoke the command configured in
extensions.lazyObject whenever an object is requested and unavailable.

The usage of the hook can be suppressed through a flag when invoking
has_object_file_with_flags() and other similar functions.

This is meant as a temporary measure to ensure that all Git commands
work in such a situation. Future patches will update some commands to
either tolerate missing objects (without invoking the command) or be
more efficient in invoking this command.


To prevent fetch from downloading all missing objects, you will also 
need to add logic in check_connected.  The simplest model is to simply 
return 0 if repository_format_lazy_object is set.


/*
 * Running a with lazy_objects there will be objects that are
 * missing locally and we don't want to download a bunch of
 * commits, trees, and blobs just to make sure everything is
 * reachable locally so this option will skip reachablility
 * checks below that use rev-list.  This will stop the check
 * before uploadpack runs to determine if there is anything to
 * fetch.  Returning zero for the first check will also prevent the
 * uploadpack from happening.  It will also skip the check after
 * the fetch is finished to make sure all the objects where
 * downloaded in the pack file.  This will allow the fetch to
 * run and get all the latest tip commit ids for all the branches
 * in the fetch but not pull down commits, trees, or blobs via
 * upload pack.
 */
if (repository_format_lazy_object)
return 0;

[...]

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15f7..1785c61d8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -28,6 +28,11 @@
  #include "list.h"
  #include "mergesort.h"
  #include "quote.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+#include "sha1-lookup.h"
+#include "lazy-object.h"
+#include "sha1-array.h"
  
  #define SZ_FMT PRIuMAX

  static inline uintmax_t sz_fmt(size_t s) { return s; }
@@ -2984,6 +2989,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi,
const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
lookup_replace_object(sha1) :
sha1;
+   int already_retried = 0;
  
  	if (!oi)

oi = _oi;
@@ -3008,30 +3014,38 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi,
}
}
  
-	if (!find_pack_entry(real, )) {

-   /* Most likely it's a loose object. */
-   if (!sha1_loose_object_info(real, oi, flags)) {
-   oi->whence = OI_LOOSE;
-   return 0;
-   }
+retry:
+   if (find_pack_entry(real, ))
+   goto found_packed;
  
-		/* Not a loose object; someone else may have just packed it. */

-   if (flags & OBJECT_INFO_QUICK) {
-   return -1;
-   } else {
-   reprepare_packed_git();
-   if (!find_pack_entry(real, ))
-   return -1;
-   }
+   /* Most likely it's a loose object. */
+   if (!sha1_loose_object_info(real, oi, flags)) {
+   oi->whence = OI_LOOSE;
+   return 0;
}
  
+	/* Not a loose object; someone else may have just packed it. */

+   reprepare_packed_git();
+   if (find_pack_entry(real, ))
+   goto found_packed;


Same feedback as before.  I like to avoid using goto's as flow control 
other than in error handling.


Also, this patch looses the OBJECT_INFO_QUICK logic which could be restored.

[...]


diff --git a/t/t0410/lazy-object b/t/t0410/lazy-object
new file mode 100755
index 0..4f4a9c38a
--- /dev/null
+++ b/t/t0410/lazy-object
@@ -0,0 +1,102 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git lazyObject protocol version 1. See
+# the documentation for extensions.lazyObject in
+# Documentation/technical/repository-version.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# Please note, this sample is a minimal skeleton. No proper error handling
+# was implemented.
+
+use strict;
+use warnings;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = $ARGV[0];
+


At some point, this should be based on the refactored pkt_* functions 
currently contained in the ObjectDB patch series.



+sub packet_bin_read {
+   my $buffer;
+   my $bytes_read = read STDIN, $buffer, 4;
+   if ( $bytes_read == 0 ) {
+
+   # EOF - Git stopped talking to us!
+   exit();
+   }
+   elsif ( $bytes_read != 4 ) {
+   die "invalid packet: '$buffer'";
+   }
+   my $pkt_size =

Re: [RFC PATCH 04/10] pack: move open_pack_index(), parse_pack_index()

2017-08-08 Thread Junio C Hamano

Jonathan Tan  writes:

> Signed-off-by: Jonathan Tan 
> ---
>  builtin/count-objects.c |   1 +
>  cache.h |   8 ---
>  pack.c  | 149 
> 
>  pack.h  |   8 +++
>  sha1_file.c | 140 -
>  sha1_name.c |   1 +
>  6 files changed, 159 insertions(+), 148 deletions(-)

This patch is a bit strange...

> diff --git a/pack.c b/pack.c
> index 60d9fc3b0..6edc43228 100644
> --- a/pack.c
> +++ b/pack.c
> ...
> +static struct packed_git *alloc_packed_git(int extra)
> +{
> + struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
> + memset(p, 0, sizeof(*p));
> + p->pack_fd = -1;
> + return p;
> +}
> +
> +struct packed_git *parse_pack_index(unsigned char *sha1, const char 
> *idx_path)
> +{
> + const char *path = sha1_pack_name(sha1);
> + size_t alloc = st_add(strlen(path), 1);
> + struct packed_git *p = alloc_packed_git(alloc);
> +
> + memcpy(p->pack_name, path, alloc); /* includes NUL */
> + hashcpy(p->sha1, sha1);
> + if (check_packed_git_idx(idx_path, p)) {
> + free(p);
> + return NULL;
> + }
> +
> + return p;
> +}

We see these two functions appear in pack.c

> diff --git a/sha1_file.c b/sha1_file.c
> index 0de39f480..2e414f5f5 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> ...
> @@ -1300,22 +1176,6 @@ struct packed_git *add_packed_git(const char *path, 
> size_t path_len, int local)
>   return p;
>  }
>  
> -struct packed_git *parse_pack_index(unsigned char *sha1, const char 
> *idx_path)
> -{
> - const char *path = sha1_pack_name(sha1);
> - size_t alloc = st_add(strlen(path), 1);
> - struct packed_git *p = alloc_packed_git(alloc);
> -
> - memcpy(p->pack_name, path, alloc); /* includes NUL */
> - hashcpy(p->sha1, sha1);
> - if (check_packed_git_idx(idx_path, p)) {
> - free(p);
> - return NULL;
> - }
> -
> - return p;
> -}
> -

And we see parse_pack_index() came from sha1_file.c

But where did alloc_packed_git() come from?  Was the patch split
incorrectly or something?

When I applied the whole series and did

git blame -s -w -M -C -C master.. pack.c

expecting that pretty much everything has come from sha1_file.c but
noticed that some lines were actually blamed to a version of pack.c
and these functions were among them.

Re: [PATCH v2 2/2 / RFC] branch: quote branch/ref names to improve readability

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 12:43 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> (Though wondering for non-submodule users, if they perceive it as
>> inconsistency as other parts of the code may not follow the rigorous quoting)
>
> Do you mean that we may instead want to remove the excessive quoting
> of branch names and stuff from submodule.c code, because they are
> newer ones that broke the consistency existed before them (i.e. not
> quoting)?

No, I do not. I was just wondering if a non-submodule user
may see differences between different commands now.

For example "checkout -b" already quotes 'external data', which
would be inline with this proposal, but there may be others.
My question was rather an encouragement to check the code base
if there are other occurrences left that do not quote.

In an ideal code base we could just grep for any %s that has no
surrounding quotes, but of course it is not as easy in the real world:
* some outputs use %s construction for non-human consumption
  in e.g. the diff machinery
* sometimes we play sentence lego, stringing words together
  which also is using %s unquoted correctly.

> That certainly is tempting, but I personally find it easier to read
> a message that marks parts that holds "external data" differently
> from the message's text, so I think this patch 2/2 goes in the right
> direction.

Yes. I like the direction this patch is going.

A note on 'external data':
For branchs, paths, submodule names a single quote seems
to be best (my opinion), whereas in e.g. git-status:

Submodule 'sm1' 000...1beffeb (new submodule)

parens seem to do a better job as they describe the state,
not reproducing external data. (This is also the place
where I was reminded of potential sentence lego)

Re: [RFC PATCH 00/10] An attempt to move packfile funcs to its own file

2017-08-08 Thread Junio C Hamano

Jonathan Tan  writes:

> While investigating annotating packfiles and loose objects to support
> connectivity checks in partial clones [1], I decided to make the effort
> to separate packfile-related code from sha1_file.c into its own file, to
> make it easier to both code such changes and review them. Here is the
> beginning of those efforts.
>
> Is this something worth doing, and if yes, is this in the right
> direction?

Overall I think it is a good idea to slim down sha1_file.c *if* we
can keep the exposed surface area small enough.

I wonder if the names "pack.[ch]" communicate well that these are
"object access layer that is about reading from packfiles".  The
writer side is called "pack-objects.[ch]".

This may have to make some symbols that used to be private to the
"object access" layer, which was what sha1_file.c was about, global
symbols.  After moving things around, do we end up exposing too many
implementation details to the world outside the "object access"
layer?  I'd assume they are limited to the resulting pack.h and it
would be OK as long as nobody other than sha1_file.c and pack.c
would inculde it, though.

Thanks.

>
> [1] 
> https://public-inbox.org/git/20170804145113.5ceaf...@twelve2.svl.corp.google.com/
>
> Jonathan Tan (10):
>   pack: move pack name-related functions
>   pack: move static state variables
>   pack: move pack_report()
>   pack: move open_pack_index(), parse_pack_index()
>   pack: move release_pack_memory()
>   pack: move pack-closing functions
>   pack: move use_pack()
>   pack: move unuse_pack()
>   pack: move add_packed_git()
>   pack: move install_packed_git()
>
>  Makefile |   1 +
>  builtin/am.c |   1 +
>  builtin/clone.c  |   1 +
>  builtin/count-objects.c  |   1 +
>  builtin/fetch.c  |   1 +
>  builtin/merge.c  |   1 +
>  builtin/pack-redundant.c |   1 +
>  cache.h  |  45 
>  connected.c  |   1 +
>  git-compat-util.h|   2 -
>  pack.c   | 669 
> +++
>  pack.h   |  51 
>  sha1_file.c  | 667 --
>  sha1_name.c  |   1 +
>  streaming.c  |   1 +
>  15 files changed, 730 insertions(+), 714 deletions(-)
>  create mode 100644 pack.c

[PATCH] Convert size datatype to size_t

2017-08-08 Thread Martin Koegler

From: Martin Koegler 

It changes the signature of the core object access function
including any other functions to assure a clean compile if
sizeof(size_t) != sizeof(unsigned long).

Signed-off-by: Martin Koegler 
---

There is much more to change in the codebase. It JUST fixes the signature
of some functions.

 apply.c  |  6 +++---
 archive-tar.c|  4 ++--
 archive-zip.c|  2 +-
 archive.c|  2 +-
 archive.h|  2 +-
 blame.c  |  4 ++--
 blame.h  |  2 +-
 builtin/cat-file.c   | 10 +-
 builtin/difftool.c   |  2 +-
 builtin/fast-export.c|  6 +++---
 builtin/fmt-merge-msg.c  |  2 +-
 builtin/fsck.c   |  2 +-
 builtin/grep.c   |  6 +++---
 builtin/index-pack.c | 12 ++--
 builtin/log.c|  4 ++--
 builtin/ls-tree.c|  2 +-
 builtin/merge-tree.c |  6 +++---
 builtin/mktag.c  |  2 +-
 builtin/notes.c  |  6 +++---
 builtin/pack-objects.c   | 10 +-
 builtin/reflog.c |  2 +-
 builtin/tag.c|  4 ++--
 builtin/unpack-file.c|  2 +-
 builtin/unpack-objects.c | 14 +++---
 builtin/verify-commit.c  |  2 +-
 bundle.c |  2 +-
 cache.h  | 22 +++---
 combine-diff.c   |  9 +
 commit.c |  6 +++---
 config.c |  2 +-
 delta.h  | 26 +-
 diff-delta.c | 16 
 diff.c   | 14 +++---
 diff.h   |  2 +-
 diffcore.h   |  2 +-
 dir.c|  2 +-
 entry.c  |  4 ++--
 fast-import.c| 18 +-
 fsck.c   |  2 +-
 grep.h   |  2 +-
 http-push.c  |  2 +-
 mailmap.c|  2 +-
 match-trees.c|  4 ++--
 merge-blobs.c|  6 +++---
 merge-blobs.h|  2 +-
 merge-recursive.c|  4 ++--
 notes-cache.c|  2 +-
 notes-merge.c|  2 +-
 notes.c  |  6 +++---
 object.c |  2 +-
 pack-check.c |  2 +-
 pack-objects.h   |  6 +++---
 patch-delta.c| 11 ++-
 read-cache.c |  4 ++--
 ref-filter.c |  4 ++--
 remote-testsvn.c |  4 ++--
 rerere.c |  2 +-
 sha1_file.c  | 44 ++--
 streaming.c  |  8 
 streaming.h  |  2 +-
 submodule-config.c   |  2 +-
 t/helper/test-delta.c|  2 +-
 tag.c|  4 ++--
 tree-walk.c  |  8 
 tree.c   |  2 +-
 xdiff-interface.c|  2 +-
 66 files changed, 193 insertions(+), 191 deletions(-)

diff --git a/apply.c b/apply.c
index f2d5991..ea97fd2 100644
--- a/apply.c
+++ b/apply.c
@@ -3085,7 +3085,7 @@ static int apply_binary_fragment(struct apply_state 
*state,
 struct patch *patch)
 {
struct fragment *fragment = patch->fragments;
-   unsigned long len;
+   size_t len;
void *dst;
 
if (!fragment)
@@ -3174,7 +3174,7 @@ static int apply_binary(struct apply_state *state,
if (has_sha1_file(oid.hash)) {
/* We already have the postimage */
enum object_type type;
-   unsigned long size;
+   size_t size;
char *result;
 
result = read_sha1_file(oid.hash, , );
@@ -3236,7 +3236,7 @@ static int read_blob_object(struct strbuf *buf, const 
struct object_id *oid, uns
strbuf_addf(buf, "Subproject commit %s\n", oid_to_hex(oid));
} else {
enum object_type type;
-   unsigned long sz;
+   size_t sz;
char *result;
 
result = read_sha1_file(oid->hash, , );
diff --git a/archive-tar.c b/archive-tar.c
index c6ed96e..719673d 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -115,7 +115,7 @@ static int stream_blocked(const unsigned char *sha1)
 {
struct git_istream *st;
enum object_type type;
-   unsigned long sz;
+   size_t sz;
char buf[BLOCKSIZE];
ssize_t readlen;
 
@@ -240,7 +240,7 @@ static int write_tar_entry(struct archiver_args *args,
struct ustar_header header;
struct strbuf ext_header = STRBUF_INIT;
unsigned int old_mode = mode;
-   unsigned long size, size_in_header;
+   size_t size, size_in_header;
void *buffer;
int err = 0;
 
diff --git a/archive-zip.c b/archive-zip.c
index e8913e5..4492d64 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -295,7 +295,7 @@ static int write_zip_entry(struct archiver_args *args,
void *buffer;
struct git_istream *stream = NULL;
unsigned long flags = 0;
-

Re: [PATCH v2 2/2 / RFC] branch: quote branch/ref names to improve readability

2017-08-08 Thread Junio C Hamano

Stefan Beller  writes:

> (Though wondering for non-submodule users, if they perceive it as
> inconsistency as other parts of the code may not follow the rigorous quoting)

Do you mean that we may instead want to remove the excessive quoting
of branch names and stuff from submodule.c code, because they are
newer ones that broke the consistency existed before them (i.e. not
quoting)?

That certainly is tempting, but I personally find it easier to read
a message that marks parts that holds "external data" differently
from the message's text, so I think this patch 2/2 goes in the right
direction.

Re: [PATCH v2] t3700: fix broken test under !POSIXPERM

2017-08-08 Thread Ramsay Jones



On 08/08/17 20:32, Ramsay Jones wrote:
> 
> 
> On 08/08/17 20:21, René Scharfe wrote:
>> 76e368c378 (t3700: fix broken test under !SANITY) explains that the test
>> 'git add --chmod=[+-]x changes index with already added file' can fail
>> if xfoo3 is still present as a symlink from a previous test and deletes
>> it with rm(1).  That still leaves it present in the index, which causes
>> the test to fail if POSIXPERM is not defined.  Get rid of it by calling
>> "git reset --hard" as well, as 76e368c378 already mentioned in passing.
> 
> Hmm, I don't think its POSIXPERM (which is defined on cygwin) but
> the lack of SANITY that is the problem. The test would fail on Linux
> as well, if it skipped the prior tests marked with SANITY (they get
> rid of the xfoo3->xfoo2 symlinks, among others).

Yep, I didn't read the commit message properly! ;-)

> 
> Ack, this fixes it for me.
> 
> ATB,
> Ramsay Jones
> 
>>
>> Helped-by: Adam Dinwoodie 
>> Signed-off-by: Rene Scharfe 
>> ---
>> Change since v1: Keep the rm(1) call to avoid a problem on Cygwin.
>>
>>  t/t3700-add.sh | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
>> index f3a4b4a913..0aae21d698 100755
>> --- a/t/t3700-add.sh
>> +++ b/t/t3700-add.sh
>> @@ -356,6 +356,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add 
>> --chmod=+x with symlinks' '
>>  
>>  test_expect_success 'git add --chmod=[+-]x changes index with already added 
>> file' '
>>  rm -f foo3 xfoo3 &&
>> +git reset --hard &&
>>  echo foo >foo3 &&
>>  git add foo3 &&
>>  git add --chmod=+x foo3 &&
>>
>

Re: [PATCH v2] am: fix signoff when other trailers are present

2017-08-08 Thread Junio C Hamano

Phillip Wood  writes:

>  test_expect_success 'am --signoff does not add Signed-off-by: line if 
> already there' '
> - git format-patch --stdout HEAD^ >patch3 &&
> - sed -e "/^Subject/ s,\[PATCH,Re: Re: Re: & 1/5 v2] [foo," patch3 
> >patch4 &&
> - rm -fr .git/rebase-apply &&
> - git reset --hard &&
> - git checkout HEAD^ &&
> - git am --signoff patch4 &&
> - git cat-file commit HEAD >actual &&
> - test $(grep -c "^Signed-off-by:" actual) -eq 1
> + git format-patch --stdout first >patch3 &&
> + git reset --hard first &&
> + git am --signoff  + git log --pretty=%B -2 HEAD >actual &&
> + test_cmp expected-log actual
> +'
> +
> +test_expect_success 'am --signoff adds Signed-off-by: if another author is 
> preset' '

Present?

I think the motivation for adding this is to make sure that what the
previous test checks will be true only when the one we are about to
add is already at the end.  So perhaps the previous test needs to be
retitled from "if already there" to something a bit tighter,
e.g. "if already at the end"?

Also, strictly speaking, I think this isn't testing if another
author is present---it is specifying what should happen when the
last one is not us.

> + NAME="A N Other" &&
> + EMAIL="a.n.ot...@example.com" &&
> + {
> + printf "third\n\nSigned-off-by: %s <%s>\nSigned-off-by: %s 
> <%s>\n\n" \
> + "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL" \
> + "$NAME" "$EMAIL" &&

A "printf" tip: you can do

printf "third\n\n"
printf "S-o-b: %s <%s>\n" A B C D

to get two lines of the latter.  That would clarify what the next
test does which wants to add three of them.

> + cat msg &&
> + printf "Signed-off-by: %s <%s>\nSigned-off-by: %s <%s>\n\n" \
> + "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL" \
> + "$NAME" "$EMAIL"
> + } >expected-log &&
> + git reset --hard first &&
> + GIT_COMMITTER_NAME="$NAME" GIT_COMMITTER_EMAIL="$EMAIL" \
> + git am --signoff  + git log --pretty=%B -2 HEAD >actual &&
> + test_cmp expected-log actual
> +'
> +
> +test_expect_success 'am --signoff duplicates Signed-off-by: if it is not the 
> last one' '
> + NAME="A N Other" &&
> + EMAIL="a.n.ot...@example.com" &&
> + {
> + printf "third\n\nSigned-off-by: %s <%s>\n\
> +Signed-off-by: %s <%s>\nSigned-off-by: %s <%s>\n\n" \
> + "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL" \
> + "$NAME" "$EMAIL" \
> + "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL" &&
> + cat msg &&
> + printf "Signed-off-by: %s <%s>\nSigned-off-by: %s <%s>\n\
> +Signed-off-by: %s <%s>\n\n" \
> + "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL" \
> + "$NAME" "$EMAIL" \
> + "$GIT_COMMITTER_NAME" "$GIT_COMMITTER_EMAIL"
> + } >expected-log &&
> + git format-patch --stdout first >patch3 &&
> + git reset --hard first &&
> + git am --signoff  + git log --pretty=%B -2 HEAD >actual &&
> + test_cmp expected-log actual
>  '
>  
>  test_expect_success 'am without --keep removes Re: and [PATCH] stuff' '
> + git format-patch --stdout HEAD^ >tmp &&
> + sed -e "/^Subject/ s,\[PATCH,Re: Re: Re: & 1/5 v2] [foo," tmp >patch4 &&
> + git reset --hard HEAD^ &&
> + git amgit rev-parse HEAD >expected &&
>   git rev-parse master2 >actual &&
>   test_cmp expected actual

[RFC PATCH 04/10] pack: move open_pack_index(), parse_pack_index()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/count-objects.c |   1 +
 cache.h |   8 ---
 pack.c  | 149 
 pack.h  |   8 +++
 sha1_file.c | 140 -
 sha1_name.c |   1 +
 6 files changed, 159 insertions(+), 148 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 1d82e61f2..185d3190a 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -10,6 +10,7 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "quote.h"
+#include "pack.h"
 
 static unsigned long garbage;
 static off_t size_garbage;
diff --git a/cache.h b/cache.h
index c7f802e4a..5d6839525 100644
--- a/cache.h
+++ b/cache.h
@@ -1603,8 +1603,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
-
 /* A hook to report invalid files in pack directory */
 #define PACKDIR_FILE_PACK 1
 #define PACKDIR_FILE_IDX 2
@@ -1639,12 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * mmap the index file for the specified packfile (if it is not
- * already mmapped).  Return 0 on success.
- */
-extern int open_pack_index(struct packed_git *);
-
 /*
  * munmap the index file for the specified packfile (if it is
  * currently mmapped).
diff --git a/pack.c b/pack.c
index 60d9fc3b0..6edc43228 100644
--- a/pack.c
+++ b/pack.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "mru.h"
+#include "pack.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -59,3 +60,151 @@ void pack_report(void)
pack_open_windows, peak_pack_open_windows,
sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
 }
+
+/*
+ * Open and mmap the index file at path, perform a couple of
+ * consistency checks, then record its information to p.  Return 0 on
+ * success.
+ */
+static int check_packed_git_idx(const char *path, struct packed_git *p)
+{
+   void *idx_map;
+   struct pack_idx_header *hdr;
+   size_t idx_size;
+   uint32_t version, nr, i, *index;
+   int fd = git_open(path);
+   struct stat st;
+
+   if (fd < 0)
+   return -1;
+   if (fstat(fd, )) {
+   close(fd);
+   return -1;
+   }
+   idx_size = xsize_t(st.st_size);
+   if (idx_size < 4 * 256 + 20 + 20) {
+   close(fd);
+   return error("index file %s is too small", path);
+   }
+   idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   close(fd);
+
+   hdr = idx_map;
+   if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) {
+   version = ntohl(hdr->idx_version);
+   if (version < 2 || version > 2) {
+   munmap(idx_map, idx_size);
+   return error("index file %s is version %"PRIu32
+" and is not supported by this binary"
+" (try upgrading GIT to a newer version)",
+path, version);
+   }
+   } else
+   version = 1;
+
+   nr = 0;
+   index = idx_map;
+   if (version > 1)
+   index += 2;  /* skip index header */
+   for (i = 0; i < 256; i++) {
+   uint32_t n = ntohl(index[i]);
+   if (n < nr) {
+   munmap(idx_map, idx_size);
+   return error("non-monotonic index %s", path);
+   }
+   nr = n;
+   }
+
+   if (version == 1) {
+   /*
+* Total size:
+*  - 256 index entries 4 bytes each
+*  - 24-byte entries * nr (20-byte sha1 + 4-byte offset)
+*  - 20-byte SHA1 of the packfile
+*  - 20-byte SHA1 file checksum
+*/
+   if (idx_size != 4*256 + nr * 24 + 20 + 20) {
+   munmap(idx_map, idx_size);
+   return error("wrong index v1 file size in %s", path);
+   }
+   } else if (version == 2) {
+   /*
+* Minimum size:
+*  - 8 bytes of header
+*  - 256 index entries 4 bytes each
+*  - 20-byte sha1 entry * nr
+*  - 4-byte crc entry * nr
+*  - 4-byte offset entry * nr
+*  - 20-byte SHA1 of the packfile
+*  - 20-byte SHA1 file checksum
+* And after the 4-byte offset table might be a
+* variable sized table containing 8-byte entries
+* for offsets larger than 2^31.
+*/
+   unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
+

[RFC PATCH 08/10] pack: move unuse_pack()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 1 -
 pack.c  | 9 +
 pack.h  | 1 +
 sha1_file.c | 9 -
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index dd9f9a9ae..4812f3a63 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,7 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
diff --git a/pack.c b/pack.c
index 85cb65558..93526ea7b 100644
--- a/pack.c
+++ b/pack.c
@@ -596,3 +596,12 @@ unsigned char *use_pack(struct packed_git *p,
*left = win->len - xsize_t(offset);
return win->base + offset;
 }
+
+void unuse_pack(struct pack_window **w_cursor)
+{
+   struct pack_window *w = *w_cursor;
+   if (w) {
+   w->inuse_cnt--;
+   *w_cursor = NULL;
+   }
+}
diff --git a/pack.h b/pack.h
index bf2b99bf9..3876e9ae6 100644
--- a/pack.h
+++ b/pack.h
@@ -149,5 +149,6 @@ extern void close_all_packs(void);
 extern int open_packed_git(struct packed_git *p);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
+extern void unuse_pack(struct pack_window **);
 
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 8f17a07e9..12501ef06 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,15 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void unuse_pack(struct pack_window **w_cursor)
-{
-   struct pack_window *w = *w_cursor;
-   if (w) {
-   w->inuse_cnt--;
-   *w_cursor = NULL;
-   }
-}
-
 static struct packed_git *alloc_packed_git(int extra)
 {
struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-- 
2.14.0.434.g98096fd7a8-goog

[RFC PATCH 10/10] pack: move install_packed_git()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 pack.c  | 11 ++-
 pack.h  |  4 ++--
 sha1_file.c |  9 -
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/cache.h b/cache.h
index bf93477e8..41562dc0b 100644
--- a/cache.h
+++ b/cache.h
@@ -1611,7 +1611,6 @@ extern void (*report_garbage)(unsigned seen_bits, const 
char *path);
 
 extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
-extern void install_packed_git(struct packed_git *pack);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/pack.c b/pack.c
index efe0ed3e8..4eb65e460 100644
--- a/pack.c
+++ b/pack.c
@@ -28,7 +28,7 @@ static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
 static unsigned int pack_open_windows;
-unsigned int pack_open_fds;
+static unsigned int pack_open_fds;
 static unsigned int pack_max_fds;
 static size_t peak_pack_mapped;
 static size_t pack_mapped;
@@ -658,3 +658,12 @@ struct packed_git *add_packed_git(const char *path, size_t 
path_len, int local)
hashclr(p->sha1);
return p;
 }
+
+void install_packed_git(struct packed_git *pack)
+{
+   if (pack->pack_fd != -1)
+   pack_open_fds++;
+
+   pack->next = packed_git;
+   packed_git = pack;
+}
diff --git a/pack.h b/pack.h
index c1f3ff32d..576c4fc7c 100644
--- a/pack.h
+++ b/pack.h
@@ -124,8 +124,6 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
-extern unsigned int pack_open_fds;
-
 extern void pack_report(void);
 
 /*
@@ -152,4 +150,6 @@ extern unsigned char *use_pack(struct packed_git *, struct 
pack_window **, off_t
 extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+extern void install_packed_git(struct packed_git *pack);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 7f12b1ee0..b956ca0c9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,15 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void install_packed_git(struct packed_git *pack)
-{
-   if (pack->pack_fd != -1)
-   pack_open_fds++;
-
-   pack->next = packed_git;
-   packed_git = pack;
-}
-
 void (*report_garbage)(unsigned seen_bits, const char *path);
 
 static void report_helper(const struct string_list *list,
-- 
2.14.0.434.g98096fd7a8-goog

[RFC PATCH 00/10] An attempt to move packfile funcs to its own file

2017-08-08 Thread Jonathan Tan

While investigating annotating packfiles and loose objects to support
connectivity checks in partial clones [1], I decided to make the effort
to separate packfile-related code from sha1_file.c into its own file, to
make it easier to both code such changes and review them. Here is the
beginning of those efforts.

Is this something worth doing, and if yes, is this in the right
direction?

[1] 
https://public-inbox.org/git/20170804145113.5ceaf...@twelve2.svl.corp.google.com/

Jonathan Tan (10):
  pack: move pack name-related functions
  pack: move static state variables
  pack: move pack_report()
  pack: move open_pack_index(), parse_pack_index()
  pack: move release_pack_memory()
  pack: move pack-closing functions
  pack: move use_pack()
  pack: move unuse_pack()
  pack: move add_packed_git()
  pack: move install_packed_git()

 Makefile |   1 +
 builtin/am.c |   1 +
 builtin/clone.c  |   1 +
 builtin/count-objects.c  |   1 +
 builtin/fetch.c  |   1 +
 builtin/merge.c  |   1 +
 builtin/pack-redundant.c |   1 +
 cache.h  |  45 
 connected.c  |   1 +
 git-compat-util.h|   2 -
 pack.c   | 669 +++
 pack.h   |  51 
 sha1_file.c  | 667 --
 sha1_name.c  |   1 +
 streaming.c  |   1 +
 15 files changed, 730 insertions(+), 714 deletions(-)
 create mode 100644 pack.c

-- 
2.14.0.434.g98096fd7a8-goog

[RFC PATCH 09/10] pack: move add_packed_git()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 connected.c |  1 +
 pack.c  | 53 +
 pack.h  |  1 +
 sha1_file.c | 61 -
 5 files changed, 55 insertions(+), 62 deletions(-)

diff --git a/cache.h b/cache.h
index 4812f3a63..bf93477e8 100644
--- a/cache.h
+++ b/cache.h
@@ -1638,7 +1638,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
 extern int odb_pack_keep(const char *name);
 
 extern void clear_delta_base_cache(void);
-extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
diff --git a/connected.c b/connected.c
index 136c2ac16..3e3f0148c 100644
--- a/connected.c
+++ b/connected.c
@@ -3,6 +3,7 @@
 #include "sigchain.h"
 #include "connected.h"
 #include "transport.h"
+#include "pack.h"
 
 /*
  * If we feed all the commits we want to verify to this command
diff --git a/pack.c b/pack.c
index 93526ea7b..efe0ed3e8 100644
--- a/pack.c
+++ b/pack.c
@@ -605,3 +605,56 @@ void unuse_pack(struct pack_window **w_cursor)
*w_cursor = NULL;
}
 }
+
+static void try_to_free_pack_memory(size_t size)
+{
+   release_pack_memory(size);
+}
+
+struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
+{
+   static int have_set_try_to_free_routine;
+   struct stat st;
+   size_t alloc;
+   struct packed_git *p;
+
+   if (!have_set_try_to_free_routine) {
+   have_set_try_to_free_routine = 1;
+   set_try_to_free_routine(try_to_free_pack_memory);
+   }
+
+   /*
+* Make sure a corresponding .pack file exists and that
+* the index looks sane.
+*/
+   if (!strip_suffix_mem(path, _len, ".idx"))
+   return NULL;
+
+   /*
+* ".pack" is long enough to hold any suffix we're adding (and
+* the use xsnprintf double-checks that)
+*/
+   alloc = st_add3(path_len, strlen(".pack"), 1);
+   p = alloc_packed_git(alloc);
+   memcpy(p->pack_name, path, path_len);
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
+   if (!access(p->pack_name, F_OK))
+   p->pack_keep = 1;
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
+   if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
+   free(p);
+   return NULL;
+   }
+
+   /* ok, it looks sane as far as we can check without
+* actually mapping the pack file.
+*/
+   p->pack_size = st.st_size;
+   p->pack_local = local;
+   p->mtime = st.st_mtime;
+   if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
+   hashclr(p->sha1);
+   return p;
+}
diff --git a/pack.h b/pack.h
index 3876e9ae6..c1f3ff32d 100644
--- a/pack.h
+++ b/pack.h
@@ -150,5 +150,6 @@ extern int open_packed_git(struct packed_git *p);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
+extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 12501ef06..7f12b1ee0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,67 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-static struct packed_git *alloc_packed_git(int extra)
-{
-   struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-   memset(p, 0, sizeof(*p));
-   p->pack_fd = -1;
-   return p;
-}
-
-static void try_to_free_pack_memory(size_t size)
-{
-   release_pack_memory(size);
-}
-
-struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
-{
-   static int have_set_try_to_free_routine;
-   struct stat st;
-   size_t alloc;
-   struct packed_git *p;
-
-   if (!have_set_try_to_free_routine) {
-   have_set_try_to_free_routine = 1;
-   set_try_to_free_routine(try_to_free_pack_memory);
-   }
-
-   /*
-* Make sure a corresponding .pack file exists and that
-* the index looks sane.
-*/
-   if (!strip_suffix_mem(path, _len, ".idx"))
-   return NULL;
-
-   /*
-* ".pack" is long enough to hold any suffix we're adding (and
-* the use xsnprintf double-checks that)
-*/
-   alloc = st_add3(path_len, strlen(".pack"), 1);
-   p = alloc_packed_git(alloc);
-   memcpy(p->pack_name, path, path_len);
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
-   if (!access(p->pack_name, F_OK))
-   p->pack_keep = 1;
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
-   if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
-   free(p);
-

[RFC PATCH 03/10] pack: move pack_report()

2017-08-08 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  2 --
 pack.c  | 24 
 pack.h  |  2 ++
 sha1_file.c | 24 
 4 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index 1f0f47819..c7f802e4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1624,8 +1624,6 @@ unsigned long approximate_object_count(void);
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
-extern void pack_report(void);
-
 /*
  * Create a temporary file rooted in the object database directory, or
  * die on failure. The filename is taken from "pattern", which should have the
diff --git a/pack.c b/pack.c
index 0f46e0617..60d9fc3b0 100644
--- a/pack.c
+++ b/pack.c
@@ -35,3 +35,27 @@ struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
 struct mru *packed_git_mru = _git_mru_storage;
+
+#define SZ_FMT PRIuMAX
+static inline uintmax_t sz_fmt(size_t s) { return s; }
+
+void pack_report(void)
+{
+   fprintf(stderr,
+   "pack_report: getpagesize()= %10" SZ_FMT "\n"
+   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
+   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
+   sz_fmt(getpagesize()),
+   sz_fmt(packed_git_window_size),
+   sz_fmt(packed_git_limit));
+   fprintf(stderr,
+   "pack_report: pack_used_ctr= %10u\n"
+   "pack_report: pack_mmap_calls  = %10u\n"
+   "pack_report: pack_open_windows= %10u / %10u\n"
+   "pack_report: pack_mapped  = "
+   "%10" SZ_FMT " / %10" SZ_FMT "\n",
+   pack_used_ctr,
+   pack_mmap_calls,
+   pack_open_windows, peak_pack_open_windows,
+   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
+}
diff --git a/pack.h b/pack.h
index 7fcd45f7b..6098bfe40 100644
--- a/pack.h
+++ b/pack.h
@@ -133,4 +133,6 @@ extern unsigned int pack_max_fds;
 extern size_t peak_pack_mapped;
 extern size_t pack_mapped;
 
+extern void pack_report(void);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 4d95e21eb..0de39f480 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -29,9 +29,6 @@
 #include "mergesort.h"
 #include "quote.h"
 
-#define SZ_FMT PRIuMAX
-static inline uintmax_t sz_fmt(size_t s) { return s; }
-
 const unsigned char null_sha1[20];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
@@ -682,27 +679,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-void pack_report(void)
-{
-   fprintf(stderr,
-   "pack_report: getpagesize()= %10" SZ_FMT "\n"
-   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
-   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
-   sz_fmt(getpagesize()),
-   sz_fmt(packed_git_window_size),
-   sz_fmt(packed_git_limit));
-   fprintf(stderr,
-   "pack_report: pack_used_ctr= %10u\n"
-   "pack_report: pack_mmap_calls  = %10u\n"
-   "pack_report: pack_open_windows= %10u / %10u\n"
-   "pack_report: pack_mapped  = "
-   "%10" SZ_FMT " / %10" SZ_FMT "\n",
-   pack_used_ctr,
-   pack_mmap_calls,
-   pack_open_windows, peak_pack_open_windows,
-   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
-}
-
 /*
  * Open and mmap the index file at path, perform a couple of
  * consistency checks, then record its information to p.  Return 0 on
-- 
2.14.0.434.g98096fd7a8-goog

[RFC PATCH 05/10] pack: move release_pack_memory()

2017-08-08 Thread Jonathan Tan

The function unuse_one_window() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 git-compat-util.h |  2 --
 pack.c| 49 +
 pack.h|  4 
 sha1_file.c   | 49 -
 4 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index db9c22de7..201056e2d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -749,8 +749,6 @@ const char *inet_ntop(int af, const void *src, char *dst, 
size_t size);
 extern int git_atexit(void (*handler)(void));
 #endif
 
-extern void release_pack_memory(size_t);
-
 typedef void (*try_to_free_t)(size_t);
 extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 
diff --git a/pack.c b/pack.c
index 6edc43228..8daa74ad1 100644
--- a/pack.c
+++ b/pack.c
@@ -208,3 +208,52 @@ struct packed_git *parse_pack_index(unsigned char *sha1, 
const char *idx_path)
 
return p;
 }
+
+static void scan_windows(struct packed_git *p,
+   struct packed_git **lru_p,
+   struct pack_window **lru_w,
+   struct pack_window **lru_l)
+{
+   struct pack_window *w, *w_l;
+
+   for (w_l = NULL, w = p->windows; w; w = w->next) {
+   if (!w->inuse_cnt) {
+   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
+   *lru_p = p;
+   *lru_w = w;
+   *lru_l = w_l;
+   }
+   }
+   w_l = w;
+   }
+}
+
+int unuse_one_window(struct packed_git *current)
+{
+   struct packed_git *p, *lru_p = NULL;
+   struct pack_window *lru_w = NULL, *lru_l = NULL;
+
+   if (current)
+   scan_windows(current, _p, _w, _l);
+   for (p = packed_git; p; p = p->next)
+   scan_windows(p, _p, _w, _l);
+   if (lru_p) {
+   munmap(lru_w->base, lru_w->len);
+   pack_mapped -= lru_w->len;
+   if (lru_l)
+   lru_l->next = lru_w->next;
+   else
+   lru_p->windows = lru_w->next;
+   free(lru_w);
+   pack_open_windows--;
+   return 1;
+   }
+   return 0;
+}
+
+void release_pack_memory(size_t need)
+{
+   size_t cur = pack_mapped;
+   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
+   ; /* nothing */
+}
diff --git a/pack.h b/pack.h
index 5be0ed42a..c16220586 100644
--- a/pack.h
+++ b/pack.h
@@ -143,4 +143,8 @@ extern int open_pack_index(struct packed_git *);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
 
+extern int unuse_one_window(struct packed_git *current);
+
+extern void release_pack_memory(size_t);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 2e414f5f5..644876e4e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -679,55 +679,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static void scan_windows(struct packed_git *p,
-   struct packed_git **lru_p,
-   struct pack_window **lru_w,
-   struct pack_window **lru_l)
-{
-   struct pack_window *w, *w_l;
-
-   for (w_l = NULL, w = p->windows; w; w = w->next) {
-   if (!w->inuse_cnt) {
-   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
-   *lru_p = p;
-   *lru_w = w;
-   *lru_l = w_l;
-   }
-   }
-   w_l = w;
-   }
-}
-
-static int unuse_one_window(struct packed_git *current)
-{
-   struct packed_git *p, *lru_p = NULL;
-   struct pack_window *lru_w = NULL, *lru_l = NULL;
-
-   if (current)
-   scan_windows(current, _p, _w, _l);
-   for (p = packed_git; p; p = p->next)
-   scan_windows(p, _p, _w, _l);
-   if (lru_p) {
-   munmap(lru_w->base, lru_w->len);
-   pack_mapped -= lru_w->len;
-   if (lru_l)
-   lru_l->next = lru_w->next;
-   else
-   lru_p->windows = lru_w->next;
-   free(lru_w);
-   pack_open_windows--;
-   return 1;
-   }
-   return 0;
-}
-
-void release_pack_memory(size_t need)
-{
-   size_t cur = pack_mapped;
-   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
-   ; /* nothing */
-}
-
 static void mmap_limit_check(size_t length)
 {
static size_t limit = 0;
-- 
2.14.0.434.g98096fd7a8-goog

[RFC PATCH 06/10] pack: move pack-closing functions

2017-08-08 Thread Jonathan Tan

The function close_pack_fd() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
In doing this, I discovered that some builtins close the packs even
though they, in theory, should not know anything about how objects are
stored. Can we remove those calls? (The tests pass with those calls
removed.)
---
 builtin/am.c|  1 +
 builtin/clone.c |  1 +
 builtin/fetch.c |  1 +
 builtin/merge.c |  1 +
 cache.h |  8 
 pack.c  | 54 ++
 pack.h  |  9 +
 sha1_file.c | 55 ---
 8 files changed, 67 insertions(+), 63 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c973bd96d..c38dd10a3 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -31,6 +31,7 @@
 #include "mailinfo.h"
 #include "apply.h"
 #include "string-list.h"
+#include "pack.h"
 
 /**
  * Returns 1 if the file is empty or does not exist, 0 otherwise.
diff --git a/builtin/clone.c b/builtin/clone.c
index 08b5cc433..53410a45d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -25,6 +25,7 @@
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
+#include "pack.h"
 
 /*
  * Overall FIXMEs:
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c87e59f3b..196a3bfc4 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -17,6 +17,7 @@
 #include "connected.h"
 #include "argv-array.h"
 #include "utf8.h"
+#include "pack.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
diff --git a/builtin/merge.c b/builtin/merge.c
index 900bafdb4..9cff4b276 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -32,6 +32,7 @@
 #include "gpg-interface.h"
 #include "sequencer.h"
 #include "string-list.h"
+#include "pack.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
diff --git a/cache.h b/cache.h
index 5d6839525..25a21a61f 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,15 +1637,7 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * munmap the index file for the specified packfile (if it is
- * currently mmapped).
- */
-extern void close_pack_index(struct packed_git *);
-
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
-extern void close_pack_windows(struct packed_git *);
-extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/pack.c b/pack.c
index 8daa74ad1..c8e2dbdee 100644
--- a/pack.c
+++ b/pack.c
@@ -257,3 +257,57 @@ void release_pack_memory(size_t need)
while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
; /* nothing */
 }
+
+void close_pack_windows(struct packed_git *p)
+{
+   while (p->windows) {
+   struct pack_window *w = p->windows;
+
+   if (w->inuse_cnt)
+   die("pack '%s' still has open windows to it",
+   p->pack_name);
+   munmap(w->base, w->len);
+   pack_mapped -= w->len;
+   pack_open_windows--;
+   p->windows = w->next;
+   free(w);
+   }
+}
+
+int close_pack_fd(struct packed_git *p)
+{
+   if (p->pack_fd < 0)
+   return 0;
+
+   close(p->pack_fd);
+   pack_open_fds--;
+   p->pack_fd = -1;
+
+   return 1;
+}
+
+void close_pack_index(struct packed_git *p)
+{
+   if (p->index_data) {
+   munmap((void *)p->index_data, p->index_size);
+   p->index_data = NULL;
+   }
+}
+
+static void close_pack(struct packed_git *p)
+{
+   close_pack_windows(p);
+   close_pack_fd(p);
+   close_pack_index(p);
+}
+
+void close_all_packs(void)
+{
+   struct packed_git *p;
+
+   for (p = packed_git; p; p = p->next)
+   if (p->do_not_close)
+   die("BUG: want to close pack marked 'do-not-close'");
+   else
+   close_pack(p);
+}
diff --git a/pack.h b/pack.h
index c16220586..fd4668528 100644
--- a/pack.h
+++ b/pack.h
@@ -147,4 +147,13 @@ extern int unuse_one_window(struct packed_git *current);
 
 extern void release_pack_memory(size_t);
 
+extern void close_pack_windows(struct packed_git *);
+extern int close_pack_fd(struct packed_git *);
+/*
+ * munmap the index file for the specified packfile (if it is
+ * currently mmapped).
+ */
+extern void close_pack_index(struct packed_git *);
+extern void close_all_packs(void);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 644876e4e..e2927244f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -717,53 +717,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void

[RFC PATCH 07/10] pack: move use_pack()

2017-08-08 Thread Jonathan Tan

open_packed_git is made global.

Signed-off-by: Jonathan Tan 
---
Unlike the other commits where variables and functions are made global
and then remade static, open_packed_git is not remade static (because a
function in sha1_file.c still uses it).
---
 cache.h |   1 -
 pack.c  | 303 ++--
 pack.h  |  14 +--
 sha1_file.c | 285 
 streaming.c |   1 +
 5 files changed, 299 insertions(+), 305 deletions(-)

diff --git a/cache.h b/cache.h
index 25a21a61f..dd9f9a9ae 100644
--- a/cache.h
+++ b/cache.h
@@ -1637,7 +1637,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, unsigned long *);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/pack.c b/pack.c
index c8e2dbdee..85cb65558 100644
--- a/pack.c
+++ b/pack.c
@@ -24,14 +24,14 @@ char *sha1_pack_index_name(const unsigned char *sha1)
return odb_pack_name(, sha1, "idx");
 }
 
-unsigned int pack_used_ctr;
-unsigned int pack_mmap_calls;
-unsigned int peak_pack_open_windows;
-unsigned int pack_open_windows;
+static unsigned int pack_used_ctr;
+static unsigned int pack_mmap_calls;
+static unsigned int peak_pack_open_windows;
+static unsigned int pack_open_windows;
 unsigned int pack_open_fds;
-unsigned int pack_max_fds;
-size_t peak_pack_mapped;
-size_t pack_mapped;
+static unsigned int pack_max_fds;
+static size_t peak_pack_mapped;
+static size_t pack_mapped;
 struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
@@ -228,7 +228,7 @@ static void scan_windows(struct packed_git *p,
}
 }
 
-int unuse_one_window(struct packed_git *current)
+static int unuse_one_window(struct packed_git *current)
 {
struct packed_git *p, *lru_p = NULL;
struct pack_window *lru_w = NULL, *lru_l = NULL;
@@ -274,7 +274,7 @@ void close_pack_windows(struct packed_git *p)
}
 }
 
-int close_pack_fd(struct packed_git *p)
+static int close_pack_fd(struct packed_git *p)
 {
if (p->pack_fd < 0)
return 0;
@@ -311,3 +311,288 @@ void close_all_packs(void)
else
close_pack(p);
 }
+
+/*
+ * The LRU pack is the one with the oldest MRU window, preferring packs
+ * with no used windows, or the oldest mtime if it has no windows allocated.
+ */
+static void find_lru_pack(struct packed_git *p, struct packed_git **lru_p, 
struct pack_window **mru_w, int *accept_windows_inuse)
+{
+   struct pack_window *w, *this_mru_w;
+   int has_windows_inuse = 0;
+
+   /*
+* Reject this pack if it has windows and the previously selected
+* one does not.  If this pack does not have windows, reject
+* it if the pack file is newer than the previously selected one.
+*/
+   if (*lru_p && !*mru_w && (p->windows || p->mtime > (*lru_p)->mtime))
+   return;
+
+   for (w = this_mru_w = p->windows; w; w = w->next) {
+   /*
+* Reject this pack if any of its windows are in use,
+* but the previously selected pack did not have any
+* inuse windows.  Otherwise, record that this pack
+* has windows in use.
+*/
+   if (w->inuse_cnt) {
+   if (*accept_windows_inuse)
+   has_windows_inuse = 1;
+   else
+   return;
+   }
+
+   if (w->last_used > this_mru_w->last_used)
+   this_mru_w = w;
+
+   /*
+* Reject this pack if it has windows that have been
+* used more recently than the previously selected pack.
+* If the previously selected pack had windows inuse and
+* we have not encountered a window in this pack that is
+* inuse, skip this check since we prefer a pack with no
+* inuse windows to one that has inuse windows.
+*/
+   if (*mru_w && *accept_windows_inuse == has_windows_inuse &&
+   this_mru_w->last_used > (*mru_w)->last_used)
+   return;
+   }
+
+   /*
+* Select this pack.
+*/
+   *mru_w = this_mru_w;
+   *lru_p = p;
+   *accept_windows_inuse = has_windows_inuse;
+}
+
+static int close_one_pack(void)
+{
+   struct packed_git *p, *lru_p = NULL;
+   struct pack_window *mru_w = NULL;
+   int accept_windows_inuse = 1;
+
+   for (p = packed_git; p; p = p->next) {
+   if (p->pack_fd == -1)
+   continue;
+

Re: [PATCH v2] t3700: fix broken test under !POSIXPERM

2017-08-08 Thread Ramsay Jones



On 08/08/17 20:21, René Scharfe wrote:
> 76e368c378 (t3700: fix broken test under !SANITY) explains that the test
> 'git add --chmod=[+-]x changes index with already added file' can fail
> if xfoo3 is still present as a symlink from a previous test and deletes
> it with rm(1).  That still leaves it present in the index, which causes
> the test to fail if POSIXPERM is not defined.  Get rid of it by calling
> "git reset --hard" as well, as 76e368c378 already mentioned in passing.

Hmm, I don't think its POSIXPERM (which is defined on cygwin) but
the lack of SANITY that is the problem. The test would fail on Linux
as well, if it skipped the prior tests marked with SANITY (they get
rid of the xfoo3->xfoo2 symlinks, among others).

Ack, this fixes it for me.

ATB,
Ramsay Jones

> 
> Helped-by: Adam Dinwoodie 
> Signed-off-by: Rene Scharfe 
> ---
> Change since v1: Keep the rm(1) call to avoid a problem on Cygwin.
> 
>  t/t3700-add.sh | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> index f3a4b4a913..0aae21d698 100755
> --- a/t/t3700-add.sh
> +++ b/t/t3700-add.sh
> @@ -356,6 +356,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add 
> --chmod=+x with symlinks' '
>  
>  test_expect_success 'git add --chmod=[+-]x changes index with already added 
> file' '
>   rm -f foo3 xfoo3 &&
> + git reset --hard &&
>   echo foo >foo3 &&
>   git add foo3 &&
>   git add --chmod=+x foo3 &&
>

[RFC PATCH 02/10] pack: move static state variables

2017-08-08 Thread Jonathan Tan

sha1_file.c declares some static variables that store packfile-related
state. Move them to pack.c.

They are temporarily made global, but subsequent commits will restore
their scope back to static.

Signed-off-by: Jonathan Tan 
---
 pack.c  | 14 ++
 pack.h  |  9 +
 sha1_file.c | 13 -
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/pack.c b/pack.c
index 0d191dfd6..0f46e0617 100644
--- a/pack.c
+++ b/pack.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "mru.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -21,3 +22,16 @@ char *sha1_pack_index_name(const unsigned char *sha1)
static struct strbuf buf = STRBUF_INIT;
return odb_pack_name(, sha1, "idx");
 }
+
+unsigned int pack_used_ctr;
+unsigned int pack_mmap_calls;
+unsigned int peak_pack_open_windows;
+unsigned int pack_open_windows;
+unsigned int pack_open_fds;
+unsigned int pack_max_fds;
+size_t peak_pack_mapped;
+size_t pack_mapped;
+struct packed_git *packed_git;
+
+static struct mru packed_git_mru_storage;
+struct mru *packed_git_mru = _git_mru_storage;
diff --git a/pack.h b/pack.h
index 63bfde00c..7fcd45f7b 100644
--- a/pack.h
+++ b/pack.h
@@ -124,4 +124,13 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
+extern unsigned int pack_used_ctr;
+extern unsigned int pack_mmap_calls;
+extern unsigned int peak_pack_open_windows;
+extern unsigned int pack_open_windows;
+extern unsigned int pack_open_fds;
+extern unsigned int pack_max_fds;
+extern size_t peak_pack_mapped;
+extern size_t pack_mapped;
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 7e511ce9e..4d95e21eb 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -682,19 +682,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static unsigned int pack_used_ctr;
-static unsigned int pack_mmap_calls;
-static unsigned int peak_pack_open_windows;
-static unsigned int pack_open_windows;
-static unsigned int pack_open_fds;
-static unsigned int pack_max_fds;
-static size_t peak_pack_mapped;
-static size_t pack_mapped;
-struct packed_git *packed_git;
-
-static struct mru packed_git_mru_storage;
-struct mru *packed_git_mru = _git_mru_storage;
-
 void pack_report(void)
 {
fprintf(stderr,
-- 
2.14.0.434.g98096fd7a8-goog

[RFC PATCH 01/10] pack: move pack name-related functions

2017-08-08 Thread Jonathan Tan

Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.

Create a new file, pack.c, to hold all packfile-related functions
currently in sha1_file.c, and designate pack.h to hold these
packfile-related functions.

In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.

Signed-off-by: Jonathan Tan 
---
 Makefile |  1 +
 builtin/pack-redundant.c |  1 +
 cache.h  | 23 ---
 pack.c   | 23 +++
 pack.h   | 23 +++
 sha1_file.c  | 22 --
 6 files changed, 48 insertions(+), 45 deletions(-)
 create mode 100644 pack.c

diff --git a/Makefile b/Makefile
index 461c845d3..a7b901a18 100644
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += pack.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index cb1df1c76..df36d10e7 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -7,6 +7,7 @@
 */
 
 #include "builtin.h"
+#include "pack.h"
 
 #define BLKSIZE 512
 
diff --git a/cache.h b/cache.h
index 71fe09264..1f0f47819 100644
--- a/cache.h
+++ b/cache.h
@@ -902,20 +902,6 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
-/*
- * Return the name of the (local) packfile with the specified sha1 in
- * its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_name(const unsigned char *sha1);
-
-/*
- * Return the name of the (local) pack index file with the specified
- * sha1 in its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_index_name(const unsigned char *sha1);
-
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
  * The result will be at least `len` characters long, and will be NUL
@@ -1648,15 +1634,6 @@ extern void pack_report(void);
  */
 extern int odb_mkstemp(struct strbuf *template, const char *pattern);
 
-/*
- * Generate the filename to be used for a pack file with checksum "sha1" and
- * extension "ext". The result is written into the strbuf "buf", overwriting
- * any existing contents. A pointer to buf->buf is returned as a convenience.
- *
- * Example: odb_pack_name(out, sha1, "idx") => 
".git/objects/pack/pack-1234..idx"
- */
-extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, 
const char *ext);
-
 /*
  * Create a pack .keep file named "name" (which should generally be the output
  * of odb_pack_name). Returns a file descriptor opened for writing, or -1 on
diff --git a/pack.c b/pack.c
new file mode 100644
index 0..0d191dfd6
--- /dev/null
+++ b/pack.c
@@ -0,0 +1,23 @@
+#include "cache.h"
+
+char *odb_pack_name(struct strbuf *buf,
+   const unsigned char *sha1,
+   const char *ext)
+{
+   strbuf_reset(buf);
+   strbuf_addf(buf, "%s/pack/pack-%s.%s", get_object_directory(),
+   sha1_to_hex(sha1), ext);
+   return buf->buf;
+}
+
+char *sha1_pack_name(const unsigned char *sha1)
+{
+   static struct strbuf buf = STRBUF_INIT;
+   return odb_pack_name(, sha1, "pack");
+}
+
+char *sha1_pack_index_name(const unsigned char *sha1)
+{
+   static struct strbuf buf = STRBUF_INIT;
+   return odb_pack_name(, sha1, "idx");
+}
diff --git a/pack.h b/pack.h
index 8294341af..63bfde00c 100644
--- a/pack.h
+++ b/pack.h
@@ -101,4 +101,27 @@ extern int read_pack_header(int fd, struct pack_header *);
 extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
 extern void finish_tmp_packfile(struct strbuf *name_buffer, const char 
*pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, 
struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
+/*
+ * Generate the filename to be used for a pack file with checksum "sha1" and
+ * extension "ext". The result is written into the strbuf "buf", overwriting
+ * any existing contents. A pointer to buf->buf is returned as a convenience.
+ *
+ * Example: odb_pack_name(out, sha1, "idx") => 
".git/objects/pack/pack-1234..idx"
+ */
+extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, 
const char *ext);
+
+/*
+ * Return the name of the (local) packfile with the specified sha1 in
+ * its name.  The return value is a pointer to memory that is
+ * overwritten each time this function is called.
+ */
+extern char *sha1_pack_name(const unsigned char *sha1);
+
+/*
+ * Return the name of the (local) pack index

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Junio C Hamano

Shawn Pearce  writes:

> For `log_type = 0x4..0x7` the `log_chained` section is used instead to
> compress information that already appeared in a prior log record.  The
> `log_chained` always includes `old_id` for this record, as `new_id` is
> implied by the prior (by file order, more recent) record's `old_id`.
>
> The `not_same_committer` block appears if `log_type & 0x1` is true,
> `not_same_message` block appears if `log_type & 0x2` is true.  When
> one of these blocks is missing, its values are implied by the prior
> (more recent) log record.

Two comments.

 * not-same-committer would be what I would use when I switch
   timezones, even if I stay to be me, right?  I am just wondering
   if it is clear to everybody that "committer" in that phrase is a
   short-hand for "committer information other than the timestamp".

 * Should the set of entries that are allowed to use of "chained"
   log be related to the set of entries that appear in the restart
   table in any way?  For a reader that scans starting at a restart
   point, it would be very cumbersome if the entry were chained from
   the previous entry, as it would force it to backtrack entries to
   find the first non-chained log entry.  A simple "log_chained must
   not be used for an entry that appear in the restart table" rule
   would solve that, but I didn't see it in the document.

[PATCH v2] t3700: fix broken test under !POSIXPERM

2017-08-08 Thread René Scharfe

76e368c378 (t3700: fix broken test under !SANITY) explains that the test
'git add --chmod=[+-]x changes index with already added file' can fail
if xfoo3 is still present as a symlink from a previous test and deletes
it with rm(1).  That still leaves it present in the index, which causes
the test to fail if POSIXPERM is not defined.  Get rid of it by calling
"git reset --hard" as well, as 76e368c378 already mentioned in passing.

Helped-by: Adam Dinwoodie 
Signed-off-by: Rene Scharfe 
---
Change since v1: Keep the rm(1) call to avoid a problem on Cygwin.

 t/t3700-add.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index f3a4b4a913..0aae21d698 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -356,6 +356,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add --chmod=+x 
with symlinks' '
 
 test_expect_success 'git add --chmod=[+-]x changes index with already added 
file' '
rm -f foo3 xfoo3 &&
+   git reset --hard &&
echo foo >foo3 &&
git add foo3 &&
git add --chmod=+x foo3 &&
-- 
2.14.0

Re: reftable [v6]: new ref storage format

2017-08-08 Thread Junio C Hamano

I notice that you raised the location of restart table within a
block in this iteration (or maybe it happened in v5).  

This forces you to hold all contents in core before the first byte
is written out.  You start from the first entry (which will become
the first restart entry), emit a handful as prefix compressed
entries, emit a full entry (which will become the next restart
entry), ... until you have enough to fill both the data and the
restart table, then start writing from the header (which needs the
length of the block), restart table and then data.

I think it is OK to do so for the blocks whose size is limited to
16M, but I wonder if it is sensible to do the same for the index
block whose limit is 2G.  If you keep the restart table after the
data, then you could stream out the entries as you emit, write the
restart table, and then seek back to fix the length in the header,
without holding the 2G in core, no?

Re: [PATCH] git svn fetch: Create correct commit timestamp when using --localtime

2017-08-08 Thread Urs Thuermann

Junio C Hamano  writes:

> > Yep, seems alright.  Can you apply directly?
> > Been a bit preoccupied as of late.  Thanks.
> 
> Surely, I'll just add your Reviewed-by: myself ;-)

OK, thanks.  This will fix the bug I've reported here a week or so ago
(see the References header).

urs

Re: [PATCH v2 2/2 / RFC] branch: quote branch/ref names to improve readability

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 10:11 AM, Kaartic Sivaraam
 wrote:
> Signed-off-by: Kaartic Sivaraam 
> ---
>  branch.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

I like this patch.

In submodule.c we quote a lot of things (branches, submodules, paths), so
this is another step to make the output as a whole more consistent.
(Though wondering for non-submodule users, if they perceive it as
inconsistency as other parts of the code may not follow the rigorous quoting)

>
> diff --git a/branch.c b/branch.c
> index ad5a2299b..a40721f3c 100644
> --- a/branch.c
> +++ b/branch.c
> @@ -90,24 +90,24 @@ int install_branch_config(int flag, const char *local, 
> const char *origin, const
> if (shortname) {
> if (origin)
> printf_ln(rebasing ?
> - _("Branch %s set up to track remote 
> branch %s from %s by rebasing.") :
> - _("Branch %s set up to track remote 
> branch %s from %s."),
> + _("Branch '%s' set up to track 
> remote branch '%s' from '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> remote branch '%s' from '%s'."),
>   local, shortname, origin);
> else
> printf_ln(rebasing ?
> - _("Branch %s set up to track local 
> branch %s by rebasing.") :
> - _("Branch %s set up to track local 
> branch %s."),
> + _("Branch '%s' set up to track 
> local branch '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> local branch '%s'."),
>   local, shortname);
> } else {
> if (origin)
> printf_ln(rebasing ?
> - _("Branch %s set up to track remote 
> ref %s by rebasing.") :
> - _("Branch %s set up to track remote 
> ref %s."),
> + _("Branch '%s' set up to track 
> remote ref '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> remote ref '%s'."),
>   local, remote);
> else
> printf_ln(rebasing ?
> - _("Branch %s set up to track local 
> ref %s by rebasing.") :
> - _("Branch %s set up to track local 
> ref %s."),
> + _("Branch '%s' set up to track 
> local ref '%s' by rebasing.") :
> + _("Branch '%s' set up to track 
> local ref '%s'."),
>   local, remote);
> }
> }
> --
> 2.14.0.rc1.434.g6eded367a
>

Re: What's cooking in git.git (Jul 2017, #09; Mon, 31)

2017-08-08 Thread Igor Djordjevic

On 08/08/2017 15:14, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Aug 07 2017, Igor Djordjevic jotted:
>> On 07/08/2017 23:25, Igor Djordjevic wrote:
>>> On 06/08/2017 22:26, Ævar Arnfjörð Bjarmason wrote:
 On Sat, Aug 05 2017, Junio C. Hamano jotted:
> I actually consider "branch" to *never* invoking a checkout.  Even
> when "git branch -m A B" happens to be done when your checked out
> branch is A and you end up being on B.  That is not a "checkout".

 I think we just have a different mental model of what "checkout"
 means. In my mind any operation that updates the HEAD to point to a new
 branch is a checkout of that branch.
>>>
>>> If I may, from a side-viewer`s point of view, it seems you`re
>>> thinking in low-level implementation details, where what Junio
>>> describes seems more as a high-level, conceptual/end-user`s point of
>>> view.
> 
> Yeah, I think that's a fair summary. Also I didn't mean to de-rail this
> whole thread on what "checkout" really means, just explain what I meant
> with previous comments, since there seemed to be confusion about that.
> 
>>> Needing to update HEAD reference once we "rename" a branch, too, what
>>> you consider a "checkout", seems to be required only because branch
>>> name _is_ the branch reference in Git, so we need to update HEAD to
>>> point to a new/renamed branch reference -- but it`s still the same
>>> branch, conceptually.
> 
> It's not *required* we could do one of three things:
> 
>  1) Do what we do now, i.e. rename the branch/reflog & check out the new
> name.
> 
>  2) Rename the branch/reflog and checkout HEAD^0, i.e. say "the branch
> is now elsewhere, but we haven't moved your commit".
> 
>  3) Just not run replace_each_worktree_head_symref() which would end up
>  on a branch with no commits, i.e. an orphan branch.
> 
> Now, I think 2 & 3 are pretty nonsensical and wouldn't ever propose we
> should do that, but it's illustrative that #1 is not some required
> inevitability in terms of explaining what's happening with the new name
> being checked out (i.e. HEAD being updated).

I think we agree, but we`re talking from different standpoints again.

You say "it *can* be done *but* it doesn`t make sense", where I say 
"it *can`t* be done *because* it doesn`t make sense".

Implementation wise, of course it can be done differently, but 
conceptually it would be wrong (or confusing, at least), thus 
different implementation, while theoretically possible, may not be a 
sane option, thus it`s impractical (not to say "impossible").

By "required", I really meant "required in order to be conceptually 
sane".

>>> Documentation for "git-checkout" states that it is used to "*Switch
>>> branches*...[snip]", and that is not what happens here.
> 
> That's just the summary at the top but not the full story of what
> git-checkout does. E.g. you can checkout a bare SHA1 which is not
> switching branches, or a tag or whatever.

I disagree, by checking out a bare SHA1 (or tag or whatever) I`d say 
you *are* still switching branches - conceptually, at least.

When you move from a named branch to (yet) unnamed one, you are 
switching branches. Same goes for when you switch from one unnamed 
branch to another (named or unnamed).

Might be "detached HEAD" is not something we call an "unnamed 
branch", but that`s what it practically is.

>>> Implementation-wise it does because we can`t do it differently at the
>>> moment, but in user`s eyes it`s still the same branch, so no switch
>>> is made as far as the user is concerned.
> 
> Kind of, it's also worthwhile to think about that in some sense no
> switch would be performed as far as the user is concerned by taking
> option #2, i.e. we'd be in the same working tree / you could still make
> commits.
> 
> You just couldn't make new commits on your "master" which is now called
> "topic" and get new commits on "topic". I think it makes sense to do
> that, but again, it's illustrative that it's not inevitable for
> discussing the implementation.

But your second paragraph exactly explains why it`s not the same with 
option #2 (I guess you mean "no HEAD update to point to renamed 
branch" here) - if user renames branch "master" to "topic", it`s 
natural to expect that next commit is made on "topic" (which is now a 
new name for "master", conceptually still being the same, just 
renamed branch).

Ending up in limbo, in a detached HEAD state, or even worse, still 
being on a branch with an old name "master" (which conceptually 
shouldn`t exist anymore, being known as "topic" now), would be 
confusing, to say the least, and conceptually wrong, I would add.

>>> In a different implementation, where branches would have permanent
>>> references other than their names, no HEAD update would be needed as
>>> the reference would still be the same, no matter the name change,
>>> making the `git branch -m` situation clear even from your standpoint,
>>> I`d say.
>>>
> Really from the end-user's point of

Re: [PATCH v2 1/2 / RFC] builtin/branch: stop supporting the use of --set-upstream option

2017-08-08 Thread Martin Ågren

On 8 August 2017 at 19:11, Kaartic Sivaraam
 wrote:
> The '--set-upstream' option of branch was deprecated in,
>
> b347d06bf branch: deprecate --set-upstream and show help if we
> detect possible mistaken use (Thu, 30 Aug 2012 19:23:13 +0200)
>
> It was deprecated for the reasons specified in the commit message of
> the referenced commit.
>
> Refactor 'branch' so that it dies with an appropraite error message
> when the '--set-upstream' is used.

appropriate. (Also, is this really a refactoring?)

>
> Note that there's a reason behind "dying with an error message" instead of
> "not accepting the '--set-upstream'". ;git branch' would still *accept*
> '--set-upstream' even after it's removal as a consequence of "unique
> prefix can be abbrievated in option names" AND '--set-upstream' is a unique
> prefix of '--set-upstream-to' when '--set-upstream' has been removed. In
> order to smooth the transition for users due to the "prefix issue" it was
> decided to make branch die when seeing the '--set-upstream' flag for a few
> years and let the users know that it would be removed some time in the future.
>
> The before/after behaviour for a simple case follows,
>
> $ git remote
> origin
>
> Before,
>
> $ git branch
> * master
>
> $ git branch --set-upstream origin/master
> The --set-upstream flag is deprecated and will be removed. Consider using 
> --track or --set-upstream-to
> Branch origin/master set up to track local branch master.
>
> $ echo $?
> 0
>
> $ git branch
> * master
>   origin/master
>
> After,
>
> $ git branch
> * master
>
> $ git branch --set-upstream origin/master
> fatal: the '--set-upstream' flag is no longer supported and will be 
> removed. Consider using '--track' or '--set-upstream-to'
>
> $ echo $?
> 128
>
> $ git branch
> * master
>
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Changes in v2:
>
>  The previous patch removed the concerned option while the current patch
>  makes 'git branch' die on seeing the option.
>
>  The possibility of '--set-upstream' becoming an alias of  '--set-upstream-to'
>  was documented.
>
>  Documentation/git-branch.txt |  8 +++
>  builtin/branch.c | 21 +--
>  t/t3200-branch.sh| 50 
> 
>  t/t6040-tracking-info.sh | 16 +++---
>  4 files changed, 17 insertions(+), 78 deletions(-)
>
> diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
> index 81bd0a7b7..372107e0c 100644
> --- a/Documentation/git-branch.txt
> +++ b/Documentation/git-branch.txt
> @@ -195,10 +195,10 @@ start-point is either a local or remote-tracking branch.
> branch.autoSetupMerge configuration variable is true.
>
>  --set-upstream::
> -   If specified branch does not exist yet or if `--force` has been
> -   given, acts exactly like `--track`. Otherwise sets up configuration
> -   like `--track` would when creating the branch, except that where
> -   branch points to is not changed.
> +   This option is no longer supported and will be removed in the future.
> +   Consider using --track or --set-upstream-to instead.
> ++
> +Note: This could possibly become an alias of --set-upstream-to in the future.

Maybe the final note could be removed? Someone who is looking up
--set-upstream because Git just "crashed" on them will only want to know
what they should do instead. Our thoughts about the future are perhaps
not that interesting. (I sort of wonder if this option needs to be
documented at all, especially if this doesn't say anything more than
the die() just did.)

Also, I'm wondering if it should be "has been removed" instead of "will
be removed"? /Implementation-wise/, it has not been removed yet, but to
the user, it has. So maybe just "This option has been removed. Consider
using --track or --set-upstream-to instead." The same below.

I don't know if it's worth trying to use PARSE_OPT_HIDDEN in the
options-struct?

Martin

>  -u ::
>  --set-upstream-to=::
> diff --git a/builtin/branch.c b/builtin/branch.c
> index a3bd2262b..2fcb6f7e5 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -755,8 +755,6 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
> strbuf_release();
> } else if (argc > 0 && argc <= 2) {
> struct branch *branch = branch_get(argv[0]);
> -   int branch_existed = 0, remote_tracking = 0;
> -   struct strbuf buf = STRBUF_INIT;
>
> if (!strcmp(argv[0], "HEAD"))
> die(_("it does not make sense to create 'HEAD' 
> manually"));
> @@ -768,28 +766,11 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
> die(_("-a and -r options to 'git branch' do not make 
> sense with a branch name"));
>
> if (track ==

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Junio C Hamano

Brandon Williams  writes:

>> > +# Add a line break after the return type of top-level functions
>> > +# int
>> > +# foo();
>> > +AlwaysBreakAfterReturnType: TopLevel
>> 
>> We do that?
>
> Haha So generally no we don't do this.  Though there are definitely many
> places in our code base where we do.  Personally this makes it a bit
> easier to read when you end up having long function names.  I also
> worked on a code base which did this and it made it incredible easy to
> grep for the definition of a function.  If you grep for 'foo()' then
> you'd get all the uses of the function including the definition but if
> you grep for '^foo()' you'd get only the definition.
>
> But that's my preference, if we end up using this tool it would probably
> make sense to change this.

Yeah, I even know people who did

int
foo(void)

for greppability of "^foo".  It took some effort to get used to that
style.

>> > +# Insert a space after a cast
>> > +# x = (int32) y;notx = (int32)y;
>> > +SpaceAfterCStyleCast: true
>> 
>> Hmph, I thought we did the latter, i.e. cast sticks to the casted
>> expression without SP.
>
> I've seen both and I wasn't sure which was the correct form to use.

We do the latter because checkpatch.pl from the kernel project tells
us to, I think.

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Brandon Williams

On 08/08, Stefan Beller wrote:
> On Tue, Aug 8, 2017 at 5:05 AM, Johannes Schindelin
>  wrote:
> > Hi Brandon,
> >
> > On Mon, 7 Aug 2017, Brandon Williams wrote:
> >
> >> Add a '.clang-format' file which outlines the git project's coding
> >> style.  This can be used with clang-format to auto-format .c and .h
> >> files to conform with git's style.
> >>
> >> Signed-off-by: Brandon Williams 
> >> ---
> >>
> >> I'm sure this sort of thing comes up every so often on the list but back at
> >> git-merge I mentioned how it would be nice to not have to worry about style
> >> when reviewing patches as that is something mechanical and best left to a
> >> machine (for the most part).
> >
> > Amen.
> >
> > If I never have to see a review mentioning an unwrapped line, I am quite
> > certain I will be quite content.
> >
> > Ciao,
> > Dscho
> 
> As a thought experiment I'd like to propose to take it one step further:
> 
>   If the code was formatted perfectly in one style such that a formatter for
>   this style would not produce changes when rerun again on the code, then
>   each individual could have a clean/smudge filter to work in their preferred
>   style, and only the exchange and storage of code is in a mutual agreed
>   style. If the mutually agreed style is close to what I prefer, I don't have 
> to
>   use clean/smudge filters.
> 
> Additionally to this patch, we'd want to either put a note into
> SubmittingPatches or Documentation/gitworkflows.txt to hint at
> how to use this formatting to just affect the patch that is currently
> worked on or rather a pre-commit hook?

I'm sure both of these things will probably happen if we decide to make
use of a code-formatter.  This RFC is really just trying to ask the
question: "Is this something we want to entertain doing?"
My response would be 'Yes' but there's many other opinions to consider
first :)

-- 
Brandon Williams

Dear Beloved Friend

2017-08-08 Thread Mrs Marois

Dear Beloved Friend,

I am Mrs Nicole Benoite Marois and I have been suffering from ovarian cancer 
disease and the doctor says that i have just few weeks to leave.  I am from 
(Paris) France but based in Benin republic since eleven years ago as a business 
woman dealing with gold exportation before the death of my husband many years 
ago.

I have $4.5 Million US Dollars at Eco-Bank here in Benin republic and I 
instructed the bank to transfer the fund to you as foreigner that will apply to 
the bank after I have gone, that they should release the fund to him/her, but 
you will assure me that you will take 50 % of the fund and give 50% to the 
orphanages home in your country for my heart to rest.

Yours fairly friend,
Mrs. Nicole Benoite Marois

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Brandon Williams

On 08/08, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Add a '.clang-format' file which outlines the git project's coding
> > style.  This can be used with clang-format to auto-format .c and .h
> > files to conform with git's style.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >
> > I'm sure this sort of thing comes up every so often on the list but back at
> > git-merge I mentioned how it would be nice to not have to worry about style
> > when reviewing patches as that is something mechanical and best left to a
> > machine (for the most part).  I saw that 'clang-format' was brought up on 
> > the
> > list once before a couple years ago
> > (https://public-inbox.org/git/20150121220903.ga10...@peff.net/) but nothing
> > really came of it.  I spent a little bit of time combing through the various
> > options and came up with this config based on the general style of our code
> > base.  The big issue though is that our code base isn't consistent so try as
> > you might you wont be able to come up with a config which matches 
> > everything we
> > do (mostly due to the inconsistencies in our code base).
> >
> > Anyway, I thought I'd bring this topic back up and see how people feel 
> > about it.
> 
> I gave just one pass over all the rules you have here.  I didn't
> think too deeply about implications of some of them, but most of
> them looked sensible.  Thanks for compiling this set of rules.
> 
> Having said that, it is easier to refine individual rules you have
> below than to make sure that among the develoepers there is a shared
> notion of how these rules are to be used.  If we get that part wrong,
> we'd see unpleasant consequences, e.g.
> 
>  - We may see unwanted code churn on existing codebase, only for the
>sake of satisfying the formatting rules specified here.

This is an issue when you have an inconsistent code base such as ours as
the tool, even when used to format a diff, will format the surrounding
context.

> 
>  - We may see far more style-only critique to patches posted on the
>list simply because there are more readers than writers, and it
>is likely that running the tool to nitpick other people's patches
>is far easier than writing these patches in the first place (and
>forgetting to ask the formatter to nitpick before sending them
>out).

I would hope that use of such a tool would eventually completely
eliminate style-only critiques.  

> 
>  - Human aesthetics judgement often is necessary to overrule
>mechanical rules (e.g. A human may have two pairs of 
>parameters on separate lines in a function declaration;
>BinPackParameters may try to break after ptrA, lenA, ptrB before
>lenB in such a case).

I know that when you introduce a formatter there are bound to be some
issues where a human would choose one way over the other.  If we start
using a formatter I would expect some of the penalties would need to be
tweaked overtime before we rely too heavily on it.

> 
> We need to set our expectation and a guideline to apply these rules
> straight, before introducing something like this.

> 
> 
> >  .clang-format | 166 
> > ++
> >  1 file changed, 166 insertions(+)
> >  create mode 100644 .clang-format
> >
> > diff --git a/.clang-format b/.clang-format
> > new file mode 100644
> > index 0..7f28dc259
> > --- /dev/null
> > +++ b/.clang-format
> > @@ -0,0 +1,166 @@
> > +# Defaults
> > +
> > +# Use tabs whenever we need to fill whitespace that spans at least from 
> > one tab
> > +# stop to the next one.
> > +UseTab: Always
> > +TabWidth: 8
> > +IndentWidth: 8
> > +ContinuationIndentWidth: 8
> > +ColumnLimit: 80
> 
> I often deliberately chomp my lines much shorter than this limit,
> and also I deliberately write a line that is a tad longer than this
> limit some other times, if doing so makes the result easier to read.
> And I know other develoepers with good taste do the same.  It is
> pointless to have a discussion that begins with "who uses a physical
> terminal these days that can only show 80-columns?" to tweak this
> value, I think.  It is more important to give a guideline on what to
> do when lines in your code goes over this limit.
> 
> A mechanical "formatter" would just find an appropriate point in a
> line with least "penalty" and chomp it into two.  But an appropriate
> way to make such a code that is way too deeply indented readable may
> instead be judicious use of goto's and one-time helper functions,
> for example, which mechanical tools would not do.
> 
> That is an example of what I meant above, i.e. a guideline to apply
> these rules.  We would not want to say "clang-format suggests this
> rewrite, so we should accept the resulting code that is still too
> deeply indented as-is"---using the tool to point out an issue is
> good, though.
> 
> > +
> > +# C Language specifics
> > +Language: Cpp
> 
> Hmph ;-)

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Junio C Hamano

Brandon Williams  writes:

> Add a '.clang-format' file which outlines the git project's coding
> style.  This can be used with clang-format to auto-format .c and .h
> files to conform with git's style.
>
> Signed-off-by: Brandon Williams 
> ---
>
> I'm sure this sort of thing comes up every so often on the list but back at
> git-merge I mentioned how it would be nice to not have to worry about style
> when reviewing patches as that is something mechanical and best left to a
> machine (for the most part).  I saw that 'clang-format' was brought up on the
> list once before a couple years ago
> (https://public-inbox.org/git/20150121220903.ga10...@peff.net/) but nothing
> really came of it.  I spent a little bit of time combing through the various
> options and came up with this config based on the general style of our code
> base.  The big issue though is that our code base isn't consistent so try as
> you might you wont be able to come up with a config which matches everything 
> we
> do (mostly due to the inconsistencies in our code base).
>
> Anyway, I thought I'd bring this topic back up and see how people feel about 
> it.

I gave just one pass over all the rules you have here.  I didn't
think too deeply about implications of some of them, but most of
them looked sensible.  Thanks for compiling this set of rules.

Having said that, it is easier to refine individual rules you have
below than to make sure that among the develoepers there is a shared
notion of how these rules are to be used.  If we get that part wrong,
we'd see unpleasant consequences, e.g.

 - We may see unwanted code churn on existing codebase, only for the
   sake of satisfying the formatting rules specified here.

 - We may see far more style-only critique to patches posted on the
   list simply because there are more readers than writers, and it
   is likely that running the tool to nitpick other people's patches
   is far easier than writing these patches in the first place (and
   forgetting to ask the formatter to nitpick before sending them
   out).

 - Human aesthetics judgement often is necessary to overrule
   mechanical rules (e.g. A human may have two pairs of 
   parameters on separate lines in a function declaration;
   BinPackParameters may try to break after ptrA, lenA, ptrB before
   lenB in such a case).

We need to set our expectation and a guideline to apply these rules
straight, before introducing something like this.

>  .clang-format | 166 
> ++
>  1 file changed, 166 insertions(+)
>  create mode 100644 .clang-format
>
> diff --git a/.clang-format b/.clang-format
> new file mode 100644
> index 0..7f28dc259
> --- /dev/null
> +++ b/.clang-format
> @@ -0,0 +1,166 @@
> +# Defaults
> +
> +# Use tabs whenever we need to fill whitespace that spans at least from one 
> tab
> +# stop to the next one.
> +UseTab: Always
> +TabWidth: 8
> +IndentWidth: 8
> +ContinuationIndentWidth: 8
> +ColumnLimit: 80

I often deliberately chomp my lines much shorter than this limit,
and also I deliberately write a line that is a tad longer than this
limit some other times, if doing so makes the result easier to read.
And I know other develoepers with good taste do the same.  It is
pointless to have a discussion that begins with "who uses a physical
terminal these days that can only show 80-columns?" to tweak this
value, I think.  It is more important to give a guideline on what to
do when lines in your code goes over this limit.

A mechanical "formatter" would just find an appropriate point in a
line with least "penalty" and chomp it into two.  But an appropriate
way to make such a code that is way too deeply indented readable may
instead be judicious use of goto's and one-time helper functions,
for example, which mechanical tools would not do.

That is an example of what I meant above, i.e. a guideline to apply
these rules.  We would not want to say "clang-format suggests this
rewrite, so we should accept the resulting code that is still too
deeply indented as-is"---using the tool to point out an issue is
good, though.

> +
> +# C Language specifics
> +Language: Cpp

Hmph ;-)

> +# Add a line break after the return type of top-level functions
> +# int
> +# foo();
> +AlwaysBreakAfterReturnType: TopLevel

We do that?

> +# Pack as many parameters or arguments onto the same line as possible
> +# int myFunction(int , int ,
> +#int );
> +BinPackArguments: true
> +BinPackParameters: true

I am OK with this but with the caveats I already mentioned.

> +# Insert a space after a cast
> +# x = (int32) y;notx = (int32)y;
> +SpaceAfterCStyleCast: true

Hmph, I thought we did the latter, i.e. cast sticks to the casted
expression without SP.

Thanks.

Re: [RFC] clang-format: outline the git project's coding style

2017-08-08 Thread Stefan Beller

On Tue, Aug 8, 2017 at 5:05 AM, Johannes Schindelin
 wrote:
> Hi Brandon,
>
> On Mon, 7 Aug 2017, Brandon Williams wrote:
>
>> Add a '.clang-format' file which outlines the git project's coding
>> style.  This can be used with clang-format to auto-format .c and .h
>> files to conform with git's style.
>>
>> Signed-off-by: Brandon Williams 
>> ---
>>
>> I'm sure this sort of thing comes up every so often on the list but back at
>> git-merge I mentioned how it would be nice to not have to worry about style
>> when reviewing patches as that is something mechanical and best left to a
>> machine (for the most part).
>
> Amen.
>
> If I never have to see a review mentioning an unwrapped line, I am quite
> certain I will be quite content.
>
> Ciao,
> Dscho

As a thought experiment I'd like to propose to take it one step further:

  If the code was formatted perfectly in one style such that a formatter for
  this style would not produce changes when rerun again on the code, then
  each individual could have a clean/smudge filter to work in their preferred
  style, and only the exchange and storage of code is in a mutual agreed
  style. If the mutually agreed style is close to what I prefer, I don't have to
  use clean/smudge filters.

Additionally to this patch, we'd want to either put a note into
SubmittingPatches or Documentation/gitworkflows.txt to hint at
how to use this formatting to just affect the patch that is currently
worked on or rather a pre-commit hook?

Re: t3700 broken on pu on Cygwin

2017-08-08 Thread Adam Dinwoodie

On Tue, Aug 08, 2017 at 05:32:21PM +0200, René Scharfe wrote:
> Am 08.08.2017 um 17:18 schrieb Adam Dinwoodie:
> > The t3700-add.sh test is currently failing on the pu branch on Cygwin.
> > To my surprise, the problem appears to have been introduced by a merge,
> > 867fa1d6a.  Both parents of that merge have the test succeeding, but
> > it's failing on that merge commit.
> > 
> > Failing test output below:
> 
> >  expecting success:
> > git reset --hard &&
> > echo foo >foo3 &&
> > git add foo3 &&
> > git add --chmod=+x foo3 &&
> > test_mode_in_index 100755 foo3 &&
> > echo foo >xfoo3 &&
> > chmod 755 xfoo3 &&
> > git add xfoo3 &&
> > git add --chmod=-x xfoo3 &&
> > test_mode_in_index 100644 xfoo3
> > 
> >  ++ git reset --hard
> >  HEAD is now at d12df1f commit all
> >  ++ echo foo
> >  ++ git add foo3
> >  ++ git add --chmod=+x foo3
> >  ++ test_mode_in_index 100755 foo3
> >  ++ case "$(git ls-files -s "$2")" in
> >  +++ git ls-files -s foo3
> >  ++ echo pass
> >  pass
> >  ++ echo foo
> >  ++ chmod 755 xfoo3
> >  ++ git add xfoo3
> >  ++ git add --chmod=-x xfoo3
> >  cannot chmod 'xfoo3'++ test_mode_in_index 100644 xfoo3
> >  ++ case "$(git ls-files -s "$2")" in
> >  +++ git ls-files -s xfoo3
> >  ++ echo fail
> >  fail
> >  ++ git ls-files -s xfoo3
> >  12 c5c4ca97a3a080c32920941b665e94a997901491 0   xfoo3
> >  ++ return 1
> >  + test_eval_ret_=1
> >  + want_trace
> >  + test t = t
> >  + test t = t
> >  + set +x
> >  error: last command exited with $?=1
> >  not ok 41 - git add --chmod=[+-]x changes index with already added file
> >  #
> >  #   git reset --hard &&
> >  #   echo foo >foo3 &&
> >  #   git add foo3 &&
> >  #   git add --chmod=+x foo3 &&
> >  #   test_mode_in_index 100755 foo3 &&
> >  #   echo foo >xfoo3 &&
> >  #   chmod 755 xfoo3 &&
> >  #   git add xfoo3 &&
> >  #   git add --chmod=-x xfoo3 &&
> >  #   test_mode_in_index 100644 xfoo3
> >  #
> > 
> 
> That's strange.  The two changes don't seem to be related at all:
> 
>   diff --git a/t/t3700-add.sh b/t/t3700-add.sh
>   index f3a4b4a913..06e3835efb 100755
>   --- a/t/t3700-add.sh
>   +++ b/t/t3700-add.sh
>   @@ -331,9 +331,8 @@ test_expect_success 'git add --dry-run --ignore-missing 
> of non-existing file out
>   test_i18ncmp expect.err actual.err
>'
>   
>   -test_expect_success 'git add empty string should invoke warning' '
>   -   git add "" 2>output &&
>   -   test_i18ngrep "warning: empty strings" output
>   +test_expect_success 'git add empty string should fail' '
>   +   test_must_fail git add ""
>'
>   
>test_expect_success 'git add --chmod=[+-]x stages correctly' '
>   @@ -355,7 +354,7 @@ test_expect_success POSIXPERM,SYMLINKS 'git add 
> --chmod=+x with symlinks' '
>'
>   
>test_expect_success 'git add --chmod=[+-]x changes index with already 
> added file' '
>   -   rm -f foo3 xfoo3 &&
>   +   git reset --hard &&
>   echo foo >foo3 &&
>   git add foo3 &&
>   git add --chmod=+x foo3 &&
> 
> The only difference I can see being introduced with the first change
> is that the file "output" is gone now.

Yep!  I thought it was strange too.  I spent some time undoing and
redoing the changes to check the problem was reproducible.

> Does it help to add the "rm -f foo3 xfoo3 &&" back, in addition to
> the "git reset --hard"?

Apparently so.  Including only the output from that test:

expecting success:
rm -f foo3 xfoo3 &&
git reset --hard &&
echo foo >foo3 &&
git add foo3 &&
git add --chmod=+x foo3 &&
test_mode_in_index 100755 foo3 &&
echo foo >xfoo3 &&
chmod 755 xfoo3 &&
git add xfoo3 &&
git add --chmod=-x xfoo3 &&
test_mode_in_index 100644 xfoo3

++ rm -f foo3 xfoo3
++ git reset --hard
HEAD is now at 02bfd98 commit all
++ echo foo
++ git add foo3
++ git add --chmod=+x foo3
++ test_mode_in_index 100755 foo3
++ case "$(git ls-files -s "$2")" in
+++ git ls-files -s foo3
++ echo pass
pass
++ echo foo
++ chmod 755 xfoo3
++ git add xfoo3
++ git add --chmod=-x xfoo3
++ test_mode_in_index 100644 xfoo3
++ case "$(git ls-files -s "$2")" in
+++ git ls-files -s xfoo3
++ echo pass
pass
+ test_eval_ret_=0
+ want_trace
+ test t = t
+ test t = t
+ set +x
ok 41 - git add --chmod=[+-]x changes index with already added file

I'm running a bisect

1 2 >

1 - 100 of 140 matches

Mail list logo