date:20170918

On 09/19/2017 02:08 AM, Stefan Beller wrote:
>> I am hoping that this last one is not allowed and we can use the
>> "same condition is checked every time we loop" version that hides
>> the uglyness inside the macro.
> 
> By which you are referring to Jonathans solution posted.
> Maybe we can combine the two solutions (checking for thelist
> to not be NULL once, by Jonathan) and using an outer structure
> (SZEDERs solution) by replacing the condition by a for loop,
> roughly (untested):
> 
> #define for_each_string_list_item(item,list) \
> -   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
> +for (; list; list = NULL)
> +for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
> 
> as that would not mingle with any dangling else clause.
> It is also just one statement, such that
> 
> if (bla)
>   for_each_string_list_item {
> baz(item);
>   }
> else
>   foo;
> 
> still works.
> 
> Are there downsides to this combined approach?

On the plus side, it's pleasantly devious; I wouldn't have thought of
using a `for` loop for the initial test. But it doesn't work as written,
because (1) we don't need to guard against `list` being NULL, but rather
`list->items`; and (2) we don't have the liberty to set `list = NULL`
(or `list->items = NULL`, because `list` is owned by the caller and we
shouldn't modify it.

The following is a bit closer:

#define for_each_string_list_item(item,list) \
for (item = (list)->items; item; item = NULL) \
for (; item < (list)->items + (list)->nr; ++item)

But I think that also fails, because a callsite that does

for_each_string_list_item(myitem, mylist)
if (myitem.util)
break;

would expect that `myitem` is still set after breaking out of the loop,
whereas the outer `for` loop would reset it to NULL.

If `break` were an expression we could do something like

#define for_each_string_list_item(item,list) \
for (item = (list)->items; item; break) \
for (; item < (list)->items + (list)->nr; ++item)

So I think we're still left with the suggestions of Jonathan or Gábor.
Or the bigger change of initializing `string_list::items` to point at an
empty sentinal array (similar to `strbuf_slopbuf`) rather than NULL.
Personally, I think that Jonathan's approach makes the most sense,
unless somebody wants to jump in an implement a `string_list_slopbuf`.

By the way, I wonder if any open-coded loops over `string_lists` make
the same mistake as the macro?

Michael

Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted

2017-09-18 Thread Ian Campbell

On Tue, 2017-09-19 at 12:01 +0900, Junio C Hamano wrote:
> 
> Hmph.  I cannot shake this nagging feeling that this is probably a
> solution that is overly narrow to a single problem that won't scale
> into the future.
> 
> [...snip good point...]
> 
> If we drop the "verification" step from the above, that essentially
> becomes an equivaent to "hash-object -t tag -w --stdin".
> 
> So I now have to wonder if it may be sufficient to use "hash-object"
> in filter-branch, without doing this "allow malformed data that we
> would not permit if the tag were being created today, only to help
> replaying an old, already broken data" change to "git mktag".
> 
> Is there something that makes "hash-object" insufficient (like it
> still does some extra checks we would want to disable and cannot
> work as a replacement for your "--allow-missing-tagger")?

I've done a couple of quick tests and it looks like it will work. I'll
run a few more checks and repost.

Ian.

[PATCH v2 03/21] packed_ref_cache: add a backlink to the associated `packed_ref_store`

It will prove convenient in upcoming patches.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e411501871..a3d9210cb0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -7,7 +7,15 @@
 #include "../iterator.h"
 #include "../lockfile.h"
 
+struct packed_ref_store;
+
 struct packed_ref_cache {
+   /*
+* A back-pointer to the packed_ref_store with which this
+* cache is associated:
+*/
+   struct packed_ref_store *refs;
+
struct ref_cache *cache;
 
/*
@@ -154,7 +162,7 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
 }
 
 /*
- * Read from `packed_refs_file` into a newly-allocated
+ * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
  * have its reference count incremented.
  *
@@ -182,7 +190,7 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
  *  compatibility with older clients, but we do not require it
  *  (i.e., "peeled" is a no-op if "fully-peeled" is set).
  */
-static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
+static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
FILE *f;
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
@@ -191,11 +199,12 @@ static struct packed_ref_cache *read_packed_refs(const 
char *packed_refs_file)
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
+   packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 
-   f = fopen(packed_refs_file, "r");
+   f = fopen(refs->path, "r");
if (!f) {
if (errno == ENOENT) {
/*
@@ -205,7 +214,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
 */
return packed_refs;
} else {
-   die_errno("couldn't read %s", packed_refs_file);
+   die_errno("couldn't read %s", refs->path);
}
}
 
@@ -218,7 +227,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
const char *traits;
 
if (!line.len || line.buf[line.len - 1] != '\n')
-   die("unterminated line in %s: %s", packed_refs_file, 
line.buf);
+   die("unterminated line in %s: %s", refs->path, 
line.buf);
 
if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
if (strstr(traits, " fully-peeled "))
@@ -258,7 +267,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
last->flag |= REF_KNOWS_PEELED;
} else {
strbuf_setlen(&line, line.len - 1);
-   die("unexpected line in %s: %s", packed_refs_file, 
line.buf);
+   die("unexpected line in %s: %s", refs->path, line.buf);
}
}
 
@@ -293,7 +302,7 @@ static struct packed_ref_cache *get_packed_ref_cache(struct 
packed_ref_store *re
validate_packed_ref_cache(refs);
 
if (!refs->cache)
-   refs->cache = read_packed_refs(refs->path);
+   refs->cache = read_packed_refs(refs);
 
return refs->cache;
 }
-- 
2.14.1

[PATCH v2 19/21] ref_cache: remove support for storing peeled values

Now that the `packed-refs` backend doesn't use `ref_cache`, there is
nobody left who might want to store peeled values of references in
`ref_cache`. So remove that feature.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c |  9 -
 refs/ref-cache.c  | 42 +-
 refs/ref-cache.h  | 32 ++--
 3 files changed, 11 insertions(+), 72 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b4a01637a6..b1571872fa 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -2,7 +2,6 @@
 #include "../config.h"
 #include "../refs.h"
 #include "refs-internal.h"
-#include "ref-cache.h"
 #include "packed-backend.h"
 #include "../iterator.h"
 #include "../lockfile.h"
@@ -226,6 +225,14 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * This value is set in `base.flags` if the peeled value of the
+ * current reference is known. In that case, `peeled` contains the
+ * correct peeled value for the reference, which might be `null_sha1`
+ * if the reference is not a tag or if it is broken.
+ */
+#define REF_KNOWS_PEELED 0x40
+
 /*
  * An iterator over a packed-refs file that is currently mmapped.
  */
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 54dfb5218c..4f850e1b5c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -38,7 +38,6 @@ struct ref_entry *create_ref_entry(const char *refname,
 
FLEX_ALLOC_STR(ref, name, refname);
oidcpy(&ref->u.value.oid, oid);
-   oidclr(&ref->u.value.peeled);
ref->flag = flag;
return ref;
 }
@@ -491,49 +490,10 @@ static int cache_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
}
 }
 
-enum peel_status peel_entry(struct ref_entry *entry, int repeel)
-{
-   enum peel_status status;
-
-   if (entry->flag & REF_KNOWS_PEELED) {
-   if (repeel) {
-   entry->flag &= ~REF_KNOWS_PEELED;
-   oidclr(&entry->u.value.peeled);
-   } else {
-   return is_null_oid(&entry->u.value.peeled) ?
-   PEEL_NON_TAG : PEEL_PEELED;
-   }
-   }
-   if (entry->flag & REF_ISBROKEN)
-   return PEEL_BROKEN;
-   if (entry->flag & REF_ISSYMREF)
-   return PEEL_IS_SYMREF;
-
-   status = peel_object(entry->u.value.oid.hash, 
entry->u.value.peeled.hash);
-   if (status == PEEL_PEELED || status == PEEL_NON_TAG)
-   entry->flag |= REF_KNOWS_PEELED;
-   return status;
-}
-
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
   struct object_id *peeled)
 {
-   struct cache_ref_iterator *iter =
-   (struct cache_ref_iterator *)ref_iterator;
-   struct cache_ref_iterator_level *level;
-   struct ref_entry *entry;
-
-   level = &iter->levels[iter->levels_nr - 1];
-
-   if (level->index == -1)
-   die("BUG: peel called before advance for cache iterator");
-
-   entry = level->dir->entries[level->index];
-
-   if (peel_entry(entry, 0))
-   return -1;
-   oidcpy(peeled, &entry->u.value.peeled);
-   return 0;
+   return peel_object(ref_iterator->oid->hash, peeled->hash);
 }
 
 static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index a082bfb06c..eda65e73ed 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -38,14 +38,6 @@ struct ref_value {
 * referred to by the last reference in the symlink chain.
 */
struct object_id oid;
-
-   /*
-* If REF_KNOWS_PEELED, then this field holds the peeled value
-* of this reference, or null if the reference is known not to
-* be peelable.  See the documentation for peel_ref() for an
-* exact definition of "peelable".
-*/
-   struct object_id peeled;
 };
 
 /*
@@ -97,21 +89,14 @@ struct ref_dir {
  * public values; see refs.h.
  */
 
-/*
- * The field ref_entry->u.value.peeled of this value entry contains
- * the correct peeled value for the reference, which might be
- * null_sha1 if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x10
-
 /* ref_entry represents a directory of references */
-#define REF_DIR 0x20
+#define REF_DIR 0x10
 
 /*
  * Entry has not yet been read from disk (used only for REF_DIR
  * entries representing loose references)
  */
-#define REF_INCOMPLETE 0x40
+#define REF_INCOMPLETE 0x20
 
 /*
  * A ref_entry represents either a reference or a "subdirectory" of
@@ -252,17 +237,4 @@ struct ref_iterator *cache_ref_iterator_begin(struct 
ref_cache *cache,
  const char *prefix,
  int prime_dir);
 
-/*
- * Peel the entry (if possible) and return its new peel_status.  If
- * repeel is true, re-peel the entry even if ther

[PATCH v2 21/21] packed-backend.c: rename a bunch of things and update comments

We've made huge changes to this file, and some of the old names and
comments are no longer very fitting. So rename a bunch of things:

* `struct packed_ref_cache` → `struct snapshot`
* `acquire_packed_ref_cache()` → `acquire_snapshot()`
* `release_packed_ref_buffer()` → `clear_snapshot_buffer()`
* `release_packed_ref_cache()` → `release_snapshot()`
* `clear_packed_ref_cache()` → `clear_snapshot()`
* `struct packed_ref_entry` → `struct snapshot_record`
* `cmp_packed_ref_entries()` → `cmp_packed_ref_records()`
* `cmp_entry_to_refname()` → `cmp_record_to_refname()`
* `sort_packed_refs()` → `sort_snapshot()`
* `read_packed_refs()` → `create_snapshot()`
* `validate_packed_ref_cache()` → `validate_snapshot()`
* `get_packed_ref_cache()` → `get_snapshot()`
* Renamed local variables and struct members accordingly.

Also update a bunch of comments to reflect the renaming and the
accumulated changes that the code has undergone.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 422 +++---
 1 file changed, 232 insertions(+), 190 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 92b837a237..b9351da843 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -37,10 +37,30 @@ static enum mmap_strategy mmap_strategy = MMAP_OK;
 
 struct packed_ref_store;
 
-struct packed_ref_cache {
+/*
+ * A `snapshot` represents one snapshot of a `packed-refs` file.
+ *
+ * Normally, this will be a mmapped view of the contents of the
+ * `packed-refs` file at the time the snapshot was created. However,
+ * if the `packed-refs` file was not sorted, this might point at heap
+ * memory holding the contents of the `packed-refs` file with its
+ * records sorted by refname.
+ *
+ * `snapshot` instances are reference counted (via
+ * `acquire_snapshot()` and `release_snapshot()`). This is to prevent
+ * an instance from disappearing while an iterator is still iterating
+ * over it. Instances are garbage collected when their `referrers`
+ * count goes to zero.
+ *
+ * The most recent `snapshot`, if available, is referenced by the
+ * `packed_ref_store`. Its freshness is checked whenever
+ * `get_snapshot()` is called; if the existing snapshot is obsolete, a
+ * new snapshot is taken.
+ */
+struct snapshot {
/*
 * A back-pointer to the packed_ref_store with which this
-* cache is associated:
+* snapshot is associated:
 */
struct packed_ref_store *refs;
 
@@ -61,26 +81,42 @@ struct packed_ref_cache {
size_t header_len;
 
/*
-* What is the peeled state of this cache? (This is usually
-* determined from the header of the "packed-refs" file.)
+* What is the peeled state of the `packed-refs` file that
+* this snapshot represents? (This is usually determined from
+* the file's header.)
 */
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
 
/*
-* Count of references to the data structure in this instance,
-* including the pointer from files_ref_store::packed if any.
-* The data will not be freed as long as the reference count
-* is nonzero.
+* Count of references to this instance, including the pointer
+* from `packed_ref_store::snapshot`, if any. The instance
+* will not be freed as long as the reference count is
+* nonzero.
 */
unsigned int referrers;
 
-   /* The metadata from when this packed-refs cache was read */
+   /*
+* The metadata of the `packed-refs` file from which this
+* snapshot was created, used to tell if the file has been
+* replaced since we read it.
+*/
struct stat_validity validity;
 };
 
 /*
- * A container for `packed-refs`-related data. It is not (yet) a
- * `ref_store`.
+ * A `ref_store` representing references stored in a `packed-refs`
+ * file. It implements the `ref_store` interface, though it has some
+ * limitations:
+ *
+ * - It cannot store symbolic references.
+ *
+ * - It cannot store reflogs.
+ *
+ * - It does not support reference renaming (though it could).
+ *
+ * On the other hand, it can be locked outside of a reference
+ * transaction. In that case, it remains locked even after the
+ * transaction is done and the new `packed-refs` file is activated.
  */
 struct packed_ref_store {
struct ref_store base;
@@ -91,10 +127,10 @@ struct packed_ref_store {
char *path;
 
/*
-* A cache of the values read from the `packed-refs` file, if
-* it might still be current; otherwise, NULL.
+* A snapshot of the values read from the `packed-refs` file,
+* if it might still be current; otherwise, NULL.
 */
-   struct packed_ref_cache *cache;
+   struct snapshot *snapshot;
 
/*
 * Lock used for the "packed-refs" file. Note that this (and
@@ -111,43 +147,42 @@ struct packed_ref_store {
 };
 
 /*

[PATCH v2 06/21] read_packed_refs(): only check for a header at the top of the file

This tightens up the parsing a bit; previously, stray header-looking
lines would have been processed.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 154abbd83a..141f02b9c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -255,11 +255,34 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
pos = buf;
eof = buf + size;
 
+   /* If the file has a header line, process it: */
+   if (pos < eof && *pos == '#') {
+   const char *traits;
+
+   eol = memchr(pos, '\n', eof - pos);
+   if (!eol)
+   die_unterminated_line(refs->path, pos, eof - pos);
+
+   strbuf_add(&line, pos, eol + 1 - pos);
+
+   if (!skip_prefix(line.buf, "# pack-refs with:", &traits))
+   die_invalid_line(refs->path, pos, eof - pos);
+
+   if (strstr(traits, " fully-peeled "))
+   peeled = PEELED_FULLY;
+   else if (strstr(traits, " peeled "))
+   peeled = PEELED_TAGS;
+   /* perhaps other traits later as well */
+
+   /* The "+ 1" is for the LF character. */
+   pos = eol + 1;
+   strbuf_reset(&line);
+   }
+
dir = get_ref_dir(packed_refs->cache->root);
while (pos < eof) {
struct object_id oid;
const char *refname;
-   const char *traits;
 
eol = memchr(pos, '\n', eof - pos);
if (!eol)
@@ -267,15 +290,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 
strbuf_add(&line, pos, eol + 1 - pos);
 
-   if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
-   if (strstr(traits, " fully-peeled "))
-   peeled = PEELED_FULLY;
-   else if (strstr(traits, " peeled "))
-   peeled = PEELED_TAGS;
-   /* perhaps other traits later as well */
-   goto next_line;
-   }
-
refname = parse_ref_line(&line, &oid);
if (refname) {
int flag = REF_ISPACKED;
@@ -307,7 +321,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
die_invalid_line(refs->path, line.buf, line.len);
}
 
-   next_line:
/* The "+ 1" is for the LF character. */
pos = eol + 1;
strbuf_reset(&line);
-- 
2.14.1

[PATCH v2 17/21] ref_store: implement `refs_peel_ref()` generically

We're about to stop storing packed refs in a `ref_cache`. That means
that the only way we have left to optimize `peel_ref()` is by checking
whether the reference being peeled is the one currently being iterated
over (in `current_ref_iter`), and if so, using `ref_iterator_peel()`.
But this can be done generically; it doesn't have to be implemented
per-backend.

So implement `refs_peel_ref()` in `refs.c` and remove the `peel_ref()`
method from the refs API.

This removes the last callers of a couple of functions, so delete
them. More cleanup to come...

Signed-off-by: Michael Haggerty 
---
 refs.c| 18 +-
 refs/files-backend.c  | 38 --
 refs/packed-backend.c | 36 
 refs/refs-internal.h  |  3 ---
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/refs.c b/refs.c
index 101c107ee8..c5e6f79c77 100644
--- a/refs.c
+++ b/refs.c
@@ -1735,7 +1735,23 @@ int refs_pack_refs(struct ref_store *refs, unsigned int 
flags)
 int refs_peel_ref(struct ref_store *refs, const char *refname,
  unsigned char *sha1)
 {
-   return refs->be->peel_ref(refs, refname, sha1);
+   int flag;
+   unsigned char base[20];
+
+   if (current_ref_iter && current_ref_iter->refname == refname) {
+   struct object_id peeled;
+
+   if (ref_iterator_peel(current_ref_iter, &peeled))
+   return -1;
+   hashcpy(sha1, peeled.hash);
+   return 0;
+   }
+
+   if (refs_read_ref_full(refs, refname,
+  RESOLVE_REF_READING, base, &flag))
+   return -1;
+
+   return peel_object(base, sha1);
 }
 
 int peel_ref(const char *refname, unsigned char *sha1)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 35648c89fc..7d12de88d0 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -655,43 +655,6 @@ static int lock_raw_ref(struct files_ref_store *refs,
return ret;
 }
 
-static int files_peel_ref(struct ref_store *ref_store,
- const char *refname, unsigned char *sha1)
-{
-   struct files_ref_store *refs =
-   files_downcast(ref_store, REF_STORE_READ | REF_STORE_ODB,
-  "peel_ref");
-   int flag;
-   unsigned char base[20];
-
-   if (current_ref_iter && current_ref_iter->refname == refname) {
-   struct object_id peeled;
-
-   if (ref_iterator_peel(current_ref_iter, &peeled))
-   return -1;
-   hashcpy(sha1, peeled.hash);
-   return 0;
-   }
-
-   if (refs_read_ref_full(ref_store, refname,
-  RESOLVE_REF_READING, base, &flag))
-   return -1;
-
-   /*
-* If the reference is packed, read its ref_entry from the
-* cache in the hope that we already know its peeled value.
-* We only try this optimization on packed references because
-* (a) forcing the filling of the loose reference cache could
-* be expensive and (b) loose references anyway usually do not
-* have REF_KNOWS_PEELED.
-*/
-   if (flag & REF_ISPACKED &&
-   !refs_peel_ref(refs->packed_ref_store, refname, sha1))
-   return 0;
-
-   return peel_object(base, sha1);
-}
-
 struct files_ref_iterator {
struct ref_iterator base;
 
@@ -3012,7 +2975,6 @@ struct ref_storage_be refs_be_files = {
files_initial_transaction_commit,
 
files_pack_refs,
-   files_peel_ref,
files_create_symref,
files_delete_refs,
files_rename_ref,
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2cf2f7f73d..6a930a440c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -849,26 +849,6 @@ static struct packed_ref_cache 
*get_packed_ref_cache(struct packed_ref_store *re
return refs->cache;
 }
 
-static struct ref_dir *get_packed_ref_dir(struct packed_ref_cache 
*packed_ref_cache)
-{
-   return get_ref_dir(packed_ref_cache->cache->root);
-}
-
-static struct ref_dir *get_packed_refs(struct packed_ref_store *refs)
-{
-   return get_packed_ref_dir(get_packed_ref_cache(refs));
-}
-
-/*
- * Return the ref_entry for the given refname from the packed
- * references.  If it does not exist, return NULL.
- */
-static struct ref_entry *get_packed_ref(struct packed_ref_store *refs,
-   const char *refname)
-{
-   return find_ref_entry(get_packed_refs(refs), refname);
-}
-
 static int packed_read_raw_ref(struct ref_store *ref_store,
   const char *refname, unsigned char *sha1,
   struct strbuf *referent, unsigned int *type)
@@ -895,21 +875,6 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
return 0;
 }
 
-static int packed_peel_ref(struct ref_store *ref_store,
-

[PATCH v2 01/21] ref_iterator: keep track of whether the iterator output is ordered

References are iterated over in order by refname, but reflogs are not.
Some consumers of reference iteration care about the difference. Teach
each `ref_iterator` to keep track of whether its output is ordered.

`overlay_ref_iterator` is one of the picky consumers. Add a sanity
check in `overlay_ref_iterator_begin()` to verify that its inputs are
ordered.

Signed-off-by: Michael Haggerty 
---
 refs.c|  4 
 refs/files-backend.c  | 16 +---
 refs/iterator.c   | 15 ++-
 refs/packed-backend.c |  2 +-
 refs/ref-cache.c  |  2 +-
 refs/ref-cache.h  |  3 ++-
 refs/refs-internal.h  | 23 +++
 7 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/refs.c b/refs.c
index b0106b8162..101c107ee8 100644
--- a/refs.c
+++ b/refs.c
@@ -1309,6 +1309,10 @@ struct ref_iterator *refs_ref_iterator_begin(
if (trim)
iter = prefix_ref_iterator_begin(iter, "", trim);
 
+   /* Sanity check for subclasses: */
+   if (!iter->ordered)
+   BUG("reference iterator is not ordered");
+
return iter;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 961424a4ea..35648c89fc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -762,7 +762,7 @@ static struct ref_iterator *files_ref_iterator_begin(
const char *prefix, unsigned int flags)
 {
struct files_ref_store *refs;
-   struct ref_iterator *loose_iter, *packed_iter;
+   struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
struct files_ref_iterator *iter;
struct ref_iterator *ref_iterator;
unsigned int required_flags = REF_STORE_READ;
@@ -772,10 +772,6 @@ static struct ref_iterator *files_ref_iterator_begin(
 
refs = files_downcast(ref_store, required_flags, "ref_iterator_begin");
 
-   iter = xcalloc(1, sizeof(*iter));
-   ref_iterator = &iter->base;
-   base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable);
-
/*
 * We must make sure that all loose refs are read before
 * accessing the packed-refs file; this avoids a race
@@ -811,7 +807,13 @@ static struct ref_iterator *files_ref_iterator_begin(
refs->packed_ref_store, prefix, 0,
DO_FOR_EACH_INCLUDE_BROKEN);
 
-   iter->iter0 = overlay_ref_iterator_begin(loose_iter, packed_iter);
+   overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
+
+   iter = xcalloc(1, sizeof(*iter));
+   ref_iterator = &iter->base;
+   base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable,
+  overlay_iter->ordered);
+   iter->iter0 = overlay_iter;
iter->flags = flags;
 
return ref_iterator;
@@ -2084,7 +2086,7 @@ static struct ref_iterator 
*files_reflog_iterator_begin(struct ref_store *ref_st
struct ref_iterator *ref_iterator = &iter->base;
struct strbuf sb = STRBUF_INIT;
 
-   base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
files_reflog_path(refs, &sb, NULL);
iter->dir_iterator = dir_iterator_begin(sb.buf);
iter->ref_store = ref_store;
diff --git a/refs/iterator.c b/refs/iterator.c
index 4cf449ef66..c475360f0a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -25,9 +25,11 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator)
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
-   struct ref_iterator_vtable *vtable)
+   struct ref_iterator_vtable *vtable,
+   int ordered)
 {
iter->vtable = vtable;
+   iter->ordered = !!ordered;
iter->refname = NULL;
iter->oid = NULL;
iter->flags = 0;
@@ -72,7 +74,7 @@ struct ref_iterator *empty_ref_iterator_begin(void)
struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
struct ref_iterator *ref_iterator = &iter->base;
 
-   base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable, 1);
return ref_iterator;
 }
 
@@ -205,6 +207,7 @@ static struct ref_iterator_vtable merge_ref_iterator_vtable 
= {
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
+   int ordered,
struct ref_iterator *iter0, struct ref_iterator *iter1,
ref_iterator_select_fn *select, void *cb_data)
 {
@@ -219,7 +222,7 @@ struct ref_iterator *merge_ref_iterator_begin(
 * references through only if they exist in both iterators.
 */
 
-   base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable, 
ordered);
iter->iter0 = iter0;
iter->iter1 = iter1;
iter->select = select;
@@ -268,9 +271,11 @@ struct ref_itera

[PATCH v2 07/21] read_packed_refs(): make parsing of the header line more robust

The old code parsed the traits in the `packed-refs` header by looking
for the string " trait " (i.e., the name of the trait with a space on
either side) in the header line. This is fragile, because if any other
implementation of Git forgets to write the trailing space, the last
trait would silently be ignored (and the error might never be
noticed).

So instead, use `string_list_split_in_place()` to split the traits
into tokens then use `unsorted_string_list_has_string()` to look for
the tokens we are interested in. This means that we can read the
traits correctly even if the header line is missing a trailing
space (or indeed, if it is missing the space after the colon, or if it
has multiple spaces somewhere).

However, older Git clients (and perhaps other Git implementations)
still require the surrounding spaces, so we still have to output the
header with a trailing space.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 141f02b9c8..a45e3ff92f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -257,25 +257,30 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 
/* If the file has a header line, process it: */
if (pos < eof && *pos == '#') {
-   const char *traits;
+   char *p;
+   struct string_list traits = STRING_LIST_INIT_NODUP;
 
eol = memchr(pos, '\n', eof - pos);
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(&line, pos, eol + 1 - pos);
+   strbuf_add(&line, pos, eol - pos);
 
-   if (!skip_prefix(line.buf, "# pack-refs with:", &traits))
+   if (!skip_prefix(line.buf, "# pack-refs with:", (const char 
**)&p))
die_invalid_line(refs->path, pos, eof - pos);
 
-   if (strstr(traits, " fully-peeled "))
+   string_list_split_in_place(&traits, p, ' ', -1);
+
+   if (unsorted_string_list_has_string(&traits, "fully-peeled"))
peeled = PEELED_FULLY;
-   else if (strstr(traits, " peeled "))
+   else if (unsorted_string_list_has_string(&traits, "peeled"))
peeled = PEELED_TAGS;
/* perhaps other traits later as well */
 
/* The "+ 1" is for the LF character. */
pos = eol + 1;
+
+   string_list_clear(&traits, 0);
strbuf_reset(&line);
}
 
@@ -610,7 +615,11 @@ int packed_refs_is_locked(struct ref_store *ref_store)
 
 /*
  * The packed-refs header line that we write out.  Perhaps other
- * traits will be added later.  The trailing space is required.
+ * traits will be added later.
+ *
+ * Note that earlier versions of Git used to parse these traits by
+ * looking for " trait " in the line. For this reason, the space after
+ * the colon and the trailing space are required.
  */
 static const char PACKED_REFS_HEADER[] =
"# pack-refs with: peeled fully-peeled \n";
-- 
2.14.1

[PATCH v2 12/21] packed-backend.c: reorder some definitions

No code has been changed. This will make subsequent patches more
self-contained.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 724c88631d..0fe41a7203 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -36,30 +36,6 @@ struct packed_ref_cache {
struct stat_validity validity;
 };
 
-/*
- * Increment the reference count of *packed_refs.
- */
-static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-   packed_refs->referrers++;
-}
-
-/*
- * Decrease the reference count of *packed_refs.  If it goes to zero,
- * free *packed_refs and return true; otherwise return false.
- */
-static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-   if (!--packed_refs->referrers) {
-   free_ref_cache(packed_refs->cache);
-   stat_validity_clear(&packed_refs->validity);
-   free(packed_refs);
-   return 1;
-   } else {
-   return 0;
-   }
-}
-
 /*
  * A container for `packed-refs`-related data. It is not (yet) a
  * `ref_store`.
@@ -92,6 +68,30 @@ struct packed_ref_store {
struct tempfile tempfile;
 };
 
+/*
+ * Increment the reference count of *packed_refs.
+ */
+static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+   packed_refs->referrers++;
+}
+
+/*
+ * Decrease the reference count of *packed_refs.  If it goes to zero,
+ * free *packed_refs and return true; otherwise return false.
+ */
+static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+   if (!--packed_refs->referrers) {
+   free_ref_cache(packed_refs->cache);
+   stat_validity_clear(&packed_refs->validity);
+   free(packed_refs);
+   return 1;
+   } else {
+   return 0;
+   }
+}
+
 struct ref_store *packed_ref_store_create(const char *path,
  unsigned int store_flags)
 {
-- 
2.14.1

[PATCH v2 20/21] mmapped_ref_iterator: inline into `packed_ref_iterator`

Since `packed_ref_iterator` is now delegating to
`mmapped_ref_iterator` rather than `cache_ref_iterator` to do the
heavy lifting, there is no need to keep the two iterators separate. So
"inline" `mmapped_ref_iterator` into `packed_ref_iterator`. This
removes a bunch of boilerplate.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 284 --
 1 file changed, 114 insertions(+), 170 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b1571872fa..92b837a237 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -225,157 +225,6 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
-/*
- * This value is set in `base.flags` if the peeled value of the
- * current reference is known. In that case, `peeled` contains the
- * correct peeled value for the reference, which might be `null_sha1`
- * if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x40
-
-/*
- * An iterator over a packed-refs file that is currently mmapped.
- */
-struct mmapped_ref_iterator {
-   struct ref_iterator base;
-
-   struct packed_ref_cache *packed_refs;
-
-   /* The current position in the mmapped file: */
-   const char *pos;
-
-   /* The end of the mmapped file: */
-   const char *eof;
-
-   struct object_id oid, peeled;
-
-   struct strbuf refname_buf;
-};
-
-static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-   const char *p = iter->pos, *eol;
-
-   strbuf_reset(&iter->refname_buf);
-
-   if (iter->pos == iter->eof)
-   return ref_iterator_abort(ref_iterator);
-
-   iter->base.flags = REF_ISPACKED;
-
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
-   parse_oid_hex(p, &iter->oid, &p) ||
-   !isspace(*p++))
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-
-   eol = memchr(p, '\n', iter->eof - p);
-   if (!eol)
-   die_unterminated_line(iter->packed_refs->refs->path,
- iter->pos, iter->eof - iter->pos);
-
-   strbuf_add(&iter->refname_buf, p, eol - p);
-   iter->base.refname = iter->refname_buf.buf;
-
-   if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
-   if (!refname_is_safe(iter->base.refname))
-   die("packed refname is dangerous: %s",
-   iter->base.refname);
-   oidclr(&iter->oid);
-   iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
-   }
-   if (iter->packed_refs->peeled == PEELED_FULLY ||
-   (iter->packed_refs->peeled == PEELED_TAGS &&
-starts_with(iter->base.refname, "refs/tags/")))
-   iter->base.flags |= REF_KNOWS_PEELED;
-
-   iter->pos = eol + 1;
-
-   if (iter->pos < iter->eof && *iter->pos == '^') {
-   p = iter->pos + 1;
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
-   parse_oid_hex(p, &iter->peeled, &p) ||
-   *p++ != '\n')
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-   iter->pos = p;
-
-   /*
-* Regardless of what the file header said, we
-* definitely know the value of *this* reference. But
-* we suppress it if the reference is broken:
-*/
-   if ((iter->base.flags & REF_ISBROKEN)) {
-   oidclr(&iter->peeled);
-   iter->base.flags &= ~REF_KNOWS_PEELED;
-   } else {
-   iter->base.flags |= REF_KNOWS_PEELED;
-   }
-   } else {
-   oidclr(&iter->peeled);
-   }
-
-   return ITER_OK;
-}
-
-static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
-   struct object_id *peeled)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-
-   if ((iter->base.flags & REF_KNOWS_PEELED)) {
-   oidcpy(peeled, &iter->peeled);
-   return is_null_oid(&iter->peeled) ? -1 : 0;
-   } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
-   return -1;
-   } else {
-   return !!peel_object(iter->oid.hash, peeled->hash);
-   }
-}
-
-static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-
-   release_packed_ref_cache(iter->packed_refs);
-   strbuf_release(&iter->refname_buf);
-   base_ref_iterator_free(ref_iterator);
-   return IT

[PATCH v2 09/21] packed_ref_cache: remember the file-wide peeling state

Rather than store the peeling state (i.e., the one defined by traits
in the `packed-refs` file header line) in a local variable in
`read_packed_refs()`, store it permanently in `packed_ref_cache`. This
will be needed when we stop reading all packed refs at once.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2b80f244c8..ae276f3445 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -18,6 +18,12 @@ struct packed_ref_cache {
 
struct ref_cache *cache;
 
+   /*
+* What is the peeled state of this cache? (This is usually
+* determined from the header of the "packed-refs" file.)
+*/
+   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
+
/*
 * Count of references to the data structure in this instance,
 * including the pointer from files_ref_store::packed if any.
@@ -195,13 +201,13 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
char *buf;
const char *pos, *eol, *eof;
struct strbuf tmp = STRBUF_INIT;
-   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
+   packed_refs->peeled = PEELED_NONE;
 
fd = open(refs->path, O_RDONLY);
if (fd < 0) {
@@ -244,9 +250,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
string_list_split_in_place(&traits, p, ' ', -1);
 
if (unsorted_string_list_has_string(&traits, "fully-peeled"))
-   peeled = PEELED_FULLY;
+   packed_refs->peeled = PEELED_FULLY;
else if (unsorted_string_list_has_string(&traits, "peeled"))
-   peeled = PEELED_TAGS;
+   packed_refs->peeled = PEELED_TAGS;
/* perhaps other traits later as well */
 
/* The "+ 1" is for the LF character. */
@@ -282,8 +288,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
oidclr(&oid);
flag |= REF_BAD_NAME | REF_ISBROKEN;
}
-   if (peeled == PEELED_FULLY ||
-   (peeled == PEELED_TAGS && starts_with(refname, 
"refs/tags/")))
+   if (packed_refs->peeled == PEELED_FULLY ||
+   (packed_refs->peeled == PEELED_TAGS &&
+starts_with(refname, "refs/tags/")))
flag |= REF_KNOWS_PEELED;
entry = create_ref_entry(refname, &oid, flag);
add_ref_entry(dir, entry);
-- 
2.14.1

[PATCH v2 16/21] packed_read_raw_ref(): read the reference from the mmapped buffer

Instead of reading the reference from the `ref_cache`, read it
directly from the mmapped buffer.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 53d0b7e3d6..2cf2f7f73d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -875,18 +875,22 @@ static int packed_read_raw_ref(struct ref_store 
*ref_store,
 {
struct packed_ref_store *refs =
packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
-
-   struct ref_entry *entry;
+   struct packed_ref_cache *packed_refs = get_packed_ref_cache(refs);
+   const char *rec;
 
*type = 0;
 
-   entry = get_packed_ref(refs, refname);
-   if (!entry) {
+   rec = find_reference_location(packed_refs, refname, 1);
+
+   if (!rec) {
+   /* refname is not a packed reference. */
errno = ENOENT;
return -1;
}
 
-   hashcpy(sha1, entry->u.value.oid.hash);
+   if (get_sha1_hex(rec, sha1))
+   die_invalid_line(refs->path, rec, packed_refs->eof - rec);
+
*type = REF_ISPACKED;
return 0;
 }
-- 
2.14.1

[PATCH v2 11/21] mmapped_ref_iterator_advance(): no peeled value for broken refs

If a reference is broken, suppress its peeled value.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 312116a99d..724c88631d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -234,9 +234,15 @@ static int mmapped_ref_iterator_advance(struct 
ref_iterator *ref_iterator)
 
/*
 * Regardless of what the file header said, we
-* definitely know the value of *this* reference:
+* definitely know the value of *this* reference. But
+* we suppress it if the reference is broken:
 */
-   iter->base.flags |= REF_KNOWS_PEELED;
+   if ((iter->base.flags & REF_ISBROKEN)) {
+   oidclr(&iter->peeled);
+   iter->base.flags &= ~REF_KNOWS_PEELED;
+   } else {
+   iter->base.flags |= REF_KNOWS_PEELED;
+   }
} else {
oidclr(&iter->peeled);
}
-- 
2.14.1

[PATCH v2 00/21] Read `packed-refs` using mmap()

This is v2 of a patch series that changes the reading and caching of
the `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and
Johannes for their comments about v1 [1].

The main change since v1 is to accommodate Windows, which doesn't let
you replace a file using `rename()` if the file is currently mmapped.
This is unfortunate, because it means that Windows will never get the
O(N) → O(lg N) improvement for reading single references that more
capable systems can now enjoy.

The background was discussed on the mailing list [2]. The bottom line
is that on Windows, keeping the `packed-refs` lock mmapped would be
tantamount to holding reader lock on that file, preventing anybody
(even unrelated processes) from changing the `packed-refs` file while
it is mmapped. This is even worse than the situation for packfiles
(which is solved using `close_all_packs()`), because a packfile, once
created, never needs to be replaced—every packfile has a filename that
is determined from its contents. The worst that can happen if a
packfile is locked is that another process cannot remove it, but that
is not critical for correctness. The `packed-refs` file, on the other
hand, always has the same filename and needs to be overwritten for
correctness.

So the approach taken here is that a new compile-time option,
`MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then
the `packed-refs` file is read quickly into memory then closed.

Even in that case, though, this branch brings significant performance
benefits, because instead of parsing the whole file and storing it
into lots of little objects in a `ref_cache` (which also involves a
lot of small memory allocations), we copy the verbatim contents of the
file into memory. Then we use the same binary search techniques to
find any references that we need to read, just as we would do if the
file were memory mapped. This means that we only have to fully parse
the references that we are interested in, and hardly have to allocate
any additional memory.

I did some more careful benchmarks of this code vs. Git 2.14.1 on a
repository that is not quite as pathological. The test repo has 110k
references that are fully packed in a `packed-refs` file that has the
`sorted` trait. The current version is compiled three ways:

* `NO_MMAP=YesPlease`—prevents all use of `mmap()`. This variant is
  O(N) when reading a single reference.

* `MMAP_PREVENTS_DELETE=YesPlease`—uses mmap for the initial read, but
  quickly copies the contents to heap-allocated memory and munmaps
  right away. This variant is also O(N) when reading a single
  reference.

* default (mmap enabled)—the `packed-refs` file is kept mmapped for as
  long as it is in use.

The commands that I timed were as follows:

# for-each-ref, warm cache:
$ git -C lots-of-refs for-each-ref --format="%(objectname) %(refname)" 
>/dev/null

# rev-parse, warm cache (this command was run 10 times then the total
# time divided by 10):
$ git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733

# rev-parse, cold cache (but git binary warm):
$ sync ; sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'; git rev-parse v1.0.0; 
time git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733

(Note that the `rev-parse` commands involve a handful of reference
lookups as the argument is DWIMmed.)

Results:

   for-each-ref rev-parse rev-parse
 warm cachewarm cachecold cache
   ----
v2.14.1   92 ms   23.7 ms 30 ms
NO_MMAP=YesPlease 83 ms3.4 ms 10 ms
MMAP_PREVENTS_DELETE=YesPlease82 ms3.5 ms 11 ms
default (mmap enabled)81 ms0.8 ms  6 ms

So this branch is a little bit faster at iterating over all
references, but it really has big advantages when looking up single
references. The advantage is smaller if `NO_MMAP` or
`MMAP_PREVENTS_DELETE` is set, but is still quite significant.

This branch is also available from my fork on GitHub as branch
`mmap-packed-refs`.

My main uncertainties are:

1. Does this code actually work on Windows?

2. Did I implement the new compile-time option correctly? (I just
   cargo-culted some existing options.) Is there some existing option
   that I could piggy-back off of instead of adding a new one?

3. Is a compile-time option sufficient, or would the `mmap()` option
   need to be configurable at runtime, or even tested at repository
   create time as is done for some other filesystem properties in
   `create_default_files()`?

Michael

[1] https://public-inbox.org/git/cover.1505319366.git.mhag...@alum.mit.edu/
[2] 
https://public-inbox.org/git/alpine.DEB.2.21.1.1709142101560.4132@virtualbox/

https://public-inbox.org/git/alpine.DEB.2.21.1.1709150012550.219280@virtualbox/
[3] https://github.com/mhagger/git

Jeff King (1):
  pr

[PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix

From: Jeff King 

If the underlying iterator is ordered, then `prefix_ref_iterator` can
stop as soon as it sees a refname that comes after the prefix. This
will rarely make a big difference now, because `ref_cache_iterator`
only iterates over the directory containing the prefix (and usually
the prefix will span a whole directory anyway). But if *hint, hint* a
future reference backend doesn't itself know where to stop the
iteration, then this optimization will be a big win.

Note that there is no guarantee that the underlying iterator doesn't
include output preceding the prefix, so we have to skip over any
unwanted references before we get to the ones that we want.

Signed-off-by: Jeff King 
Signed-off-by: Michael Haggerty 
---
 refs/iterator.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index c475360f0a..bd35da4e62 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -287,6 +287,20 @@ struct prefix_ref_iterator {
int trim;
 };
 
+/* Return -1, 0, 1 if refname is before, inside, or after the prefix. */
+static int compare_prefix(const char *refname, const char *prefix)
+{
+   while (*prefix) {
+   if (*refname != *prefix)
+   return ((unsigned char)*refname < (unsigned 
char)*prefix) ? -1 : +1;
+
+   refname++;
+   prefix++;
+   }
+
+   return 0;
+}
+
 static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 {
struct prefix_ref_iterator *iter =
@@ -294,9 +308,25 @@ static int prefix_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
int ok;
 
while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
-   if (!starts_with(iter->iter0->refname, iter->prefix))
+   int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
+
+   if (cmp < 0)
continue;
 
+   if (cmp > 0) {
+   /*
+* If the source iterator is ordered, then we
+* can stop the iteration as soon as we see a
+* refname that comes after the prefix:
+*/
+   if (iter->iter0->ordered) {
+   ok = ref_iterator_abort(iter->iter0);
+   break;
+   } else {
+   continue;
+   }
+   }
+
if (iter->trim) {
/*
 * It is nonsense to trim off characters that
-- 
2.14.1

[PATCH v2 15/21] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`

Now that we have an efficient way to iterate, in order, over the
mmapped contents of the `packed-refs` file, we can use that directly
to implement reference iteration for the `packed_ref_store`, rather
than iterating over the `ref_cache`. This is the next step towards
getting rid of the `ref_cache` entirely.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 109 --
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index f336690f16..53d0b7e3d6 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -397,6 +397,27 @@ static int cmp_packed_ref_entries(const void *v1, const 
void *v2)
}
 }
 
+/*
+ * Compare a packed-refs record pointed to by `rec` to the specified
+ * NUL-terminated refname.
+ */
+static int cmp_entry_to_refname(const char *rec, const char *refname)
+{
+   const char *r1 = rec + GIT_SHA1_HEXSZ + 1;
+   const char *r2 = refname;
+
+   while (1) {
+   if (*r1 == '\n')
+   return *r2 ? -1 : 0;
+   if (!*r2)
+   return 1;
+   if (*r1 != *r2)
+   return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : 
+1;
+   r1++;
+   r2++;
+   }
+}
+
 /*
  * `packed_refs->buf` is not known to be sorted. Check whether it is,
  * and if not, sort it into new memory and munmap/free the old
@@ -503,6 +524,17 @@ static const char *find_start_of_record(const char *buf, 
const char *p)
return p;
 }
 
+/*
+ * Return a pointer to the start of the record following the record
+ * that contains `*p`. If none is found before `end`, return `end`.
+ */
+static const char *find_end_of_record(const char *p, const char *end)
+{
+   while (++p < end && (p[-1] != '\n' || p[0] == '^'))
+   ;
+   return p;
+}
+
 /*
  * We want to be able to compare mmapped reference records quickly,
  * without totally parsing them. We can do so because the records are
@@ -591,6 +623,65 @@ static int load_contents(struct packed_ref_cache 
*packed_refs)
return 1;
 }
 
+/*
+ * Find the place in `cache->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted. In the latter mode, `refname` doesn't have to be a proper
+ * reference name; for example, one could search for "refs/replace/"
+ * to find the start of any replace references.
+ *
+ * The record is sought using a binary search, so `cache->buf` must be
+ * sorted.
+ */
+static const char *find_reference_location(struct packed_ref_cache *cache,
+  const char *refname, int mustexist)
+{
+   /*
+* This is not *quite* a garden-variety binary search, because
+* the data we're searching is made up of records, and we
+* always need to find the beginning of a record to do a
+* comparison. A "record" here is one line for the reference
+* itself and zero or one peel lines that start with '^'. Our
+* loop invariant is described in the next two comments.
+*/
+
+   /*
+* A pointer to the character at the start of a record whose
+* preceding records all have reference names that come
+* *before* `refname`.
+*/
+   const char *lo = cache->buf + cache->header_len;
+
+   /*
+* A pointer to a the first character of a record whose
+* reference name comes *after* `refname`.
+*/
+   const char *hi = cache->eof;
+
+   while (lo < hi) {
+   const char *mid, *rec;
+   int cmp;
+
+   mid = lo + (hi - lo) / 2;
+   rec = find_start_of_record(lo, mid);
+   cmp = cmp_entry_to_refname(rec, refname);
+   if (cmp < 0) {
+   lo = find_end_of_record(mid, hi);
+   } else if (cmp > 0) {
+   hi = rec;
+   } else {
+   return rec;
+   }
+   }
+
+   if (mustexist)
+   return NULL;
+   else
+   return lo;
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -887,6 +978,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
const char *prefix, unsigned int flags)
 {
struct packed_ref_store *refs;
+   struct packed_ref_cache *packed_refs;
+   const char *start;
struct packed_ref_iterator *iter;
struct ref_iterator *ref_iterator;
unsigned int required_flags = REF_STORE_READ;
@@ -904,13 +997,23 @@ static struct ref_iterator *packed_ref_iterator_begin(
 * the packed-ref cache is up to date with what is on dis

[PATCH v2 10/21] mmapped_ref_iterator: add iterator over a packed-refs file

Add a new `mmapped_ref_iterator`, which can iterate over the
references in an mmapped `packed-refs` file directly. Use this
iterator from `read_packed_refs()` to fill the packed refs cache.

Note that we are not yet willing to promise that the new iterator
generates its output in order. That doesn't matter for now, because
the packed refs cache doesn't care what order it is filled.

This change adds a lot of boilerplate without providing any obvious
benefits. The benefits will come soon, when we get rid of the
`ref_cache` for packed references altogether.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 207 --
 1 file changed, 152 insertions(+), 55 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index ae276f3445..312116a99d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -163,6 +163,141 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * An iterator over a packed-refs file that is currently mmapped.
+ */
+struct mmapped_ref_iterator {
+   struct ref_iterator base;
+
+   struct packed_ref_cache *packed_refs;
+
+   /* The current position in the mmapped file: */
+   const char *pos;
+
+   /* The end of the mmapped file: */
+   const char *eof;
+
+   struct object_id oid, peeled;
+
+   struct strbuf refname_buf;
+};
+
+static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+   const char *p = iter->pos, *eol;
+
+   strbuf_reset(&iter->refname_buf);
+
+   if (iter->pos == iter->eof)
+   return ref_iterator_abort(ref_iterator);
+
+   iter->base.flags = REF_ISPACKED;
+
+   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
+   parse_oid_hex(p, &iter->oid, &p) ||
+   !isspace(*p++))
+   die_invalid_line(iter->packed_refs->refs->path,
+iter->pos, iter->eof - iter->pos);
+
+   eol = memchr(p, '\n', iter->eof - p);
+   if (!eol)
+   die_unterminated_line(iter->packed_refs->refs->path,
+ iter->pos, iter->eof - iter->pos);
+
+   strbuf_add(&iter->refname_buf, p, eol - p);
+   iter->base.refname = iter->refname_buf.buf;
+
+   if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
+   if (!refname_is_safe(iter->base.refname))
+   die("packed refname is dangerous: %s",
+   iter->base.refname);
+   oidclr(&iter->oid);
+   iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
+   }
+   if (iter->packed_refs->peeled == PEELED_FULLY ||
+   (iter->packed_refs->peeled == PEELED_TAGS &&
+starts_with(iter->base.refname, "refs/tags/")))
+   iter->base.flags |= REF_KNOWS_PEELED;
+
+   iter->pos = eol + 1;
+
+   if (iter->pos < iter->eof && *iter->pos == '^') {
+   p = iter->pos + 1;
+   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
+   parse_oid_hex(p, &iter->peeled, &p) ||
+   *p++ != '\n')
+   die_invalid_line(iter->packed_refs->refs->path,
+iter->pos, iter->eof - iter->pos);
+   iter->pos = p;
+
+   /*
+* Regardless of what the file header said, we
+* definitely know the value of *this* reference:
+*/
+   iter->base.flags |= REF_KNOWS_PEELED;
+   } else {
+   oidclr(&iter->peeled);
+   }
+
+   return ITER_OK;
+}
+
+static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
+   struct object_id *peeled)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+
+   if ((iter->base.flags & REF_KNOWS_PEELED)) {
+   oidcpy(peeled, &iter->peeled);
+   return is_null_oid(&iter->peeled) ? -1 : 0;
+   } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
+   return -1;
+   } else {
+   return !!peel_object(iter->oid.hash, peeled->hash);
+   }
+}
+
+static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+   struct mmapped_ref_iterator *iter =
+   (struct mmapped_ref_iterator *)ref_iterator;
+
+   release_packed_ref_cache(iter->packed_refs);
+   strbuf_release(&iter->refname_buf);
+   base_ref_iterator_free(ref_iterator);
+   return ITER_DONE;
+}
+
+static struct ref_iterator_vtable mmapped_ref_iterator_vtable = {
+   mmapped_ref_iterator_advance,
+   mmapped_ref_iterator_peel,
+   mmapped_ref_iterator_abort
+};
+
+struct ref_iterator *mmapped_ref_iterator_begin(
+   const char *packed_refs_file,
+

[PATCH v2 18/21] packed_ref_store: get rid of the `ref_cache` entirely

Now that everything has been changed to read what it needs directly
out of the `packed-refs` file, `packed_ref_store` doesn't need to
maintain a `ref_cache` at all. So get rid of it.

First of all, this will save a lot of memory and lots of little
allocations. Instead of needing to store complicated parsed data
structures in memory, we just mmap the file (potentially sharing
memory with other processes) and parse only what we need.

Moreover, since the mmapped access to the file reads only the parts of
the file that it needs, this might save reading all of the data from
disk at all (at least if the file starts out sorted).

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 6a930a440c..b4a01637a6 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -45,8 +45,6 @@ struct packed_ref_cache {
 */
struct packed_ref_store *refs;
 
-   struct ref_cache *cache;
-
/* Is the `packed-refs` file currently mmapped? */
int mmapped;
 
@@ -148,7 +146,6 @@ static void release_packed_ref_buffer(struct 
packed_ref_cache *packed_refs)
 static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
 {
if (!--packed_refs->referrers) {
-   free_ref_cache(packed_refs->cache);
stat_validity_clear(&packed_refs->validity);
release_packed_ref_buffer(packed_refs);
free(packed_refs);
@@ -718,15 +715,10 @@ static const char *find_reference_location(struct 
packed_ref_cache *cache,
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
-   struct ref_dir *dir;
-   struct ref_iterator *iter;
int sorted = 0;
-   int ok;
 
packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
-   packed_refs->cache = create_ref_cache(NULL, NULL);
-   packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
packed_refs->peeled = PEELED_NONE;
 
if (!load_contents(packed_refs))
@@ -799,23 +791,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
packed_refs->eof = buf_copy + size;
}
 
-   dir = get_ref_dir(packed_refs->cache->root);
-   iter = mmapped_ref_iterator_begin(
-   packed_refs,
-   packed_refs->buf + packed_refs->header_len,
-   packed_refs->eof);
-   while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-   struct ref_entry *entry =
-   create_ref_entry(iter->refname, iter->oid, iter->flags);
-
-   if ((iter->flags & REF_KNOWS_PEELED))
-   ref_iterator_peel(iter, &entry->u.value.peeled);
-   add_ref_entry(dir, entry);
-   }
-
-   if (ok != ITER_DONE)
-   die("error reading packed-refs file %s", refs->path);
-
return packed_refs;
 }
 
@@ -974,8 +949,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
else
start = packed_refs->buf + packed_refs->header_len;
 
-   iter->iter0 = mmapped_ref_iterator_begin(
-   packed_refs, start, packed_refs->eof);
+   iter->iter0 = mmapped_ref_iterator_begin(packed_refs,
+start, packed_refs->eof);
 
iter->flags = flags;
 
-- 
2.14.1

[PATCH v2 04/21] die_unterminated_line(), die_invalid_line(): new functions

Extract some helper functions for reporting errors. While we're at it,
prevent them from spewing unlimited output to the terminal. These
functions will soon have more callers.

These functions accept the problematic line as a `(ptr, len)` pair
rather than a NUL-terminated string, and `die_invalid_line()` checks
for an EOL itself, because these calling conventions will be
convenient for future callers. (Efficiency is not a concern here
because these functions are only ever called if the `packed-refs` file
is corrupt.)

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a3d9210cb0..5c50c223ef 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -161,6 +161,29 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
return ref;
 }
 
+static NORETURN void die_unterminated_line(const char *path,
+  const char *p, size_t len)
+{
+   if (len < 80)
+   die("unterminated line in %s: %.*s", path, (int)len, p);
+   else
+   die("unterminated line in %s: %.75s...", path, p);
+}
+
+static NORETURN void die_invalid_line(const char *path,
+ const char *p, size_t len)
+{
+   const char *eol = memchr(p, '\n', len);
+
+   if (!eol)
+   die_unterminated_line(path, p, len);
+   else if (eol - p < 80)
+   die("unexpected line in %s: %.*s", path, (int)(eol - p), p);
+   else
+   die("unexpected line in %s: %.75s...", path, p);
+
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -227,7 +250,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
const char *traits;
 
if (!line.len || line.buf[line.len - 1] != '\n')
-   die("unterminated line in %s: %s", refs->path, 
line.buf);
+   die_unterminated_line(refs->path, line.buf, line.len);
 
if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
if (strstr(traits, " fully-peeled "))
@@ -266,8 +289,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 */
last->flag |= REF_KNOWS_PEELED;
} else {
-   strbuf_setlen(&line, line.len - 1);
-   die("unexpected line in %s: %s", refs->path, line.buf);
+   die_invalid_line(refs->path, line.buf, line.len);
}
}
 
-- 
2.14.1

[PATCH v2 08/21] read_packed_refs(): read references with minimal copying

Instead of copying data from the `packed-refs` file one line at time
and then processing it, process the data in place as much as possible.

Also, instead of processing one line per iteration of the main loop,
process a reference line plus its corresponding peeled line (if
present) together.

Note that this change slightly tightens up the parsing of the
`parse-ref` file. Previously, the parser would have accepted multiple
"peeled" lines for a single reference (ignoring all but the last one).
Now it would reject that.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 101 --
 1 file changed, 40 insertions(+), 61 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a45e3ff92f..2b80f244c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -134,33 +134,6 @@ static void clear_packed_ref_cache(struct packed_ref_store 
*refs)
}
 }
 
-/* The length of a peeled reference line in packed-refs, including EOL: */
-#define PEELED_LINE_LENGTH 42
-
-/*
- * Parse one line from a packed-refs file.  Write the SHA1 to sha1.
- * Return a pointer to the refname within the line (null-terminated),
- * or NULL if there was a problem.
- */
-static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
-{
-   const char *ref;
-
-   if (parse_oid_hex(line->buf, oid, &ref) < 0)
-   return NULL;
-   if (!isspace(*ref++))
-   return NULL;
-
-   if (isspace(*ref))
-   return NULL;
-
-   if (line->buf[line->len - 1] != '\n')
-   return NULL;
-   line->buf[--line->len] = 0;
-
-   return ref;
-}
-
 static NORETURN void die_unterminated_line(const char *path,
   const char *p, size_t len)
 {
@@ -221,8 +194,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
size_t size;
char *buf;
const char *pos, *eol, *eof;
-   struct ref_entry *last = NULL;
-   struct strbuf line = STRBUF_INIT;
+   struct strbuf tmp = STRBUF_INIT;
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
@@ -264,9 +236,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(&line, pos, eol - pos);
+   strbuf_add(&tmp, pos, eol - pos);
 
-   if (!skip_prefix(line.buf, "# pack-refs with:", (const char 
**)&p))
+   if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char 
**)&p))
die_invalid_line(refs->path, pos, eof - pos);
 
string_list_split_in_place(&traits, p, ' ', -1);
@@ -281,61 +253,68 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
pos = eol + 1;
 
string_list_clear(&traits, 0);
-   strbuf_reset(&line);
+   strbuf_reset(&tmp);
}
 
dir = get_ref_dir(packed_refs->cache->root);
while (pos < eof) {
+   const char *p = pos;
struct object_id oid;
const char *refname;
+   int flag = REF_ISPACKED;
+   struct ref_entry *entry = NULL;
 
-   eol = memchr(pos, '\n', eof - pos);
+   if (eof - pos < GIT_SHA1_HEXSZ + 2 ||
+   parse_oid_hex(p, &oid, &p) ||
+   !isspace(*p++))
+   die_invalid_line(refs->path, pos, eof - pos);
+
+   eol = memchr(p, '\n', eof - p);
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(&line, pos, eol + 1 - pos);
+   strbuf_add(&tmp, p, eol - p);
+   refname = tmp.buf;
+
+   if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+   if (!refname_is_safe(refname))
+   die("packed refname is dangerous: %s", refname);
+   oidclr(&oid);
+   flag |= REF_BAD_NAME | REF_ISBROKEN;
+   }
+   if (peeled == PEELED_FULLY ||
+   (peeled == PEELED_TAGS && starts_with(refname, 
"refs/tags/")))
+   flag |= REF_KNOWS_PEELED;
+   entry = create_ref_entry(refname, &oid, flag);
+   add_ref_entry(dir, entry);
 
-   refname = parse_ref_line(&line, &oid);
-   if (refname) {
-   int flag = REF_ISPACKED;
+   pos = eol + 1;
+
+   if (pos < eof && *pos == '^') {
+   p = pos + 1;
+   if (eof - p < GIT_SHA1_HEXSZ + 1 ||
+   parse_oid_hex(p, &entry->u.value.peeled, &p) ||
+   *p++ != '\n')

[PATCH v2 05/21] read_packed_refs(): use mmap to read the `packed-refs` file

It's still done in a pretty stupid way, involving more data copying
than necessary. That will improve in future commits.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 42 --
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 5c50c223ef..154abbd83a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -215,8 +215,12 @@ static NORETURN void die_invalid_line(const char *path,
  */
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
-   FILE *f;
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
+   int fd;
+   struct stat st;
+   size_t size;
+   char *buf;
+   const char *pos, *eol, *eof;
struct ref_entry *last = NULL;
struct strbuf line = STRBUF_INIT;
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
@@ -227,8 +231,8 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 
-   f = fopen(refs->path, "r");
-   if (!f) {
+   fd = open(refs->path, O_RDONLY);
+   if (fd < 0) {
if (errno == ENOENT) {
/*
 * This is OK; it just means that no
@@ -241,16 +245,27 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
}
}
 
-   stat_validity_update(&packed_refs->validity, fileno(f));
+   stat_validity_update(&packed_refs->validity, fd);
+
+   if (fstat(fd, &st) < 0)
+   die_errno("couldn't stat %s", refs->path);
+
+   size = xsize_t(st.st_size);
+   buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+   pos = buf;
+   eof = buf + size;
 
dir = get_ref_dir(packed_refs->cache->root);
-   while (strbuf_getwholeline(&line, f, '\n') != EOF) {
+   while (pos < eof) {
struct object_id oid;
const char *refname;
const char *traits;
 
-   if (!line.len || line.buf[line.len - 1] != '\n')
-   die_unterminated_line(refs->path, line.buf, line.len);
+   eol = memchr(pos, '\n', eof - pos);
+   if (!eol)
+   die_unterminated_line(refs->path, pos, eof - pos);
+
+   strbuf_add(&line, pos, eol + 1 - pos);
 
if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
if (strstr(traits, " fully-peeled "))
@@ -258,7 +273,7 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
else if (strstr(traits, " peeled "))
peeled = PEELED_TAGS;
/* perhaps other traits later as well */
-   continue;
+   goto next_line;
}
 
refname = parse_ref_line(&line, &oid);
@@ -291,11 +306,18 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
} else {
die_invalid_line(refs->path, line.buf, line.len);
}
+
+   next_line:
+   /* The "+ 1" is for the LF character. */
+   pos = eol + 1;
+   strbuf_reset(&line);
}
 
-   fclose(f);
-   strbuf_release(&line);
+   if (munmap(buf, size))
+   die_errno("error ummapping packed-refs file");
+   close(fd);
 
+   strbuf_release(&line);
return packed_refs;
 }
 
-- 
2.14.1

[PATCH v2 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible

Keep a copy of the `packed-refs` file contents in memory for as long
as a `packed_ref_cache` object is in use:

* If the system allows it, keep the `packed-refs` file mmapped.

* If not (either because the system doesn't support `mmap()` at all,
  or because a file that is currently mmapped cannot be replaced via
  `rename()`), then make a copy of the file's contents in
  heap-allocated space, and keep that around instead.

We base the choice of behavior on a new build-time switch,
`MMAP_PREVENTS_DELETE`. By default, this switch is set for Windows
variants.

This whole change is still pointless, because we only read the
`packed-refs` file contents immediately after instantiating the
`packed_ref_cache`. But that will soon change.

Signed-off-by: Michael Haggerty 
---
 Makefile  |  10 +++
 config.mak.uname  |   3 +
 refs/packed-backend.c | 184 ++
 3 files changed, 155 insertions(+), 42 deletions(-)

diff --git a/Makefile b/Makefile
index f2bb7f2f63..7a49f99c4f 100644
--- a/Makefile
+++ b/Makefile
@@ -200,6 +200,9 @@ all::
 #
 # Define NO_MMAP if you want to avoid mmap.
 #
+# Define MMAP_PREVENTS_DELETE if a file that is currently mmapped cannot be
+# deleted or cannot be replaced using rename().
+#
 # Define NO_SYS_POLL_H if you don't have sys/poll.h.
 #
 # Define NO_POLL if you do not have or don't want to use poll().
@@ -1383,6 +1386,13 @@ else
COMPAT_OBJS += compat/win32mmap.o
endif
 endif
+ifdef MMAP_PREVENTS_DELETE
+   BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE
+else
+   ifdef USE_WIN32_MMAP
+   BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE
+   endif
+endif
 ifdef OBJECT_CREATION_USES_RENAMES
COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1
 endif
diff --git a/config.mak.uname b/config.mak.uname
index 6604b130f8..685a80d138 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -184,6 +184,7 @@ ifeq ($(uname_O),Cygwin)
UNRELIABLE_FSTAT = UnfortunatelyYes
SPARSE_FLAGS = -isystem /usr/include/w32api -Wno-one-bit-signed-bitfield
OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo
+   MMAP_PREVENTS_DELETE = UnfortunatelyYes
COMPAT_OBJS += compat/cygwin.o
FREAD_READS_DIRECTORIES = UnfortunatelyYes
 endif
@@ -353,6 +354,7 @@ ifeq ($(uname_S),Windows)
NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
NO_NSEC = YesPlease
USE_WIN32_MMAP = YesPlease
+   MMAP_PREVENTS_DELETE = UnfortunatelyYes
# USE_NED_ALLOCATOR = YesPlease
UNRELIABLE_FSTAT = UnfortunatelyYes
OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo
@@ -501,6 +503,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
NO_NSEC = YesPlease
USE_WIN32_MMAP = YesPlease
+   MMAP_PREVENTS_DELETE = UnfortunatelyYes
USE_NED_ALLOCATOR = YesPlease
UNRELIABLE_FSTAT = UnfortunatelyYes
OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 0fe41a7203..4981516f1e 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -7,6 +7,35 @@
 #include "../iterator.h"
 #include "../lockfile.h"
 
+enum mmap_strategy {
+   /*
+* Don't use mmap() at all for reading `packed-refs`.
+*/
+   MMAP_NONE,
+
+   /*
+* Can use mmap() for reading `packed-refs`, but the file must
+* not remain mmapped. This is the usual option on Windows,
+* where you cannot rename a new version of a file onto a file
+* that is currently mmapped.
+*/
+   MMAP_TEMPORARY,
+
+   /*
+* It is OK to leave the `packed-refs` file mmapped while
+* arbitrary other code is running.
+*/
+   MMAP_OK
+};
+
+#if defined(NO_MMAP)
+static enum mmap_strategy mmap_strategy = MMAP_NONE;
+#elif defined(MMAP_PREVENTS_DELETE)
+static enum mmap_strategy mmap_strategy = MMAP_TEMPORARY;
+#else
+static enum mmap_strategy mmap_strategy = MMAP_OK;
+#endif
+
 struct packed_ref_store;
 
 struct packed_ref_cache {
@@ -18,6 +47,21 @@ struct packed_ref_cache {
 
struct ref_cache *cache;
 
+   /* Is the `packed-refs` file currently mmapped? */
+   int mmapped;
+
+   /*
+* The contents of the `packed-refs` file. If the file is
+* mmapped, this points at the mmapped contents of the file.
+* If not, this points at heap-allocated memory containing the
+* contents. If there were no contents (e.g., because the file
+* didn't exist), `buf` and `eof` are both NULL.
+*/
+   char *buf, *eof;
+
+   /* The size of the header line, if any; otherwise, 0: */
+   size_t header_len;
+
/*
 * What is the peeled state of this cache? (This is usually
 * determined from the header of the "packed-refs" file.)
@@ -76,6 +120,26 @@ static void acquire_packed_ref_cache(struct 
packed_ref_cache *packed_refs)
packed_refs->

[PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read

It doesn't actually matter now, because the references are only
iterated over to fill the associated `ref_cache`, which itself puts
them in the correct order. But we want to get rid of the `ref_cache`,
so we want to be able to iterate directly over the `packed-refs`
buffer, and then the iteration will need to be ordered correctly.

In fact, we already write the `packed-refs` file sorted, but it is
possible that other Git clients don't get it right. So let's not
assume that a `packed-refs` file is sorted unless it is explicitly
declared to be so via a `sorted` trait in its header line.

If it is *not* declared to be sorted, then scan quickly through the
file to check. If it is found to be out of order, then sort the
records into a new memory-only copy. This checking and sorting is done
quickly, without parsing the full file contents. However, it needs a
little bit of care to avoid reading past the end of the buffer even if
the `packed-refs` file is corrupt.

Since *we* always write the file correctly sorted, include that trait
when we write or rewrite a `packed-refs` file. This means that the
scan described in the previous paragraph should only have to be done
for `packed-refs` files that were written by older versions of the Git
command-line client, or by other clients that haven't yet learned to
write the `sorted` trait.

If `packed-refs` was already sorted, then (if the system allows it) we
can use the mmapped file contents directly. But if the system doesn't
allow a file that is currently mmapped to be replaced using
`rename()`, then it would be bad for us to keep the file mmapped for
any longer than necessary. So, on such systems, always make a copy of
the file contents, either as part of the sorting process, or
afterwards.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 223 +++---
 1 file changed, 212 insertions(+), 11 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4981516f1e..f336690f16 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -51,11 +51,12 @@ struct packed_ref_cache {
int mmapped;
 
/*
-* The contents of the `packed-refs` file. If the file is
-* mmapped, this points at the mmapped contents of the file.
-* If not, this points at heap-allocated memory containing the
-* contents. If there were no contents (e.g., because the file
-* didn't exist), `buf` and `eof` are both NULL.
+* The contents of the `packed-refs` file. If the file was
+* already sorted, this points at the mmapped contents of the
+* file. If not, this points at heap-allocated memory
+* containing the contents, sorted. If there were no contents
+* (e.g., because the file didn't exist), `buf` and `eof` are
+* both NULL.
 */
char *buf, *eof;
 
@@ -358,7 +359,7 @@ struct ref_iterator *mmapped_ref_iterator_begin(
if (!packed_refs->buf)
return empty_ref_iterator_begin();
 
-   base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 0);
+   base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 1);
 
iter->packed_refs = packed_refs;
acquire_packed_ref_cache(iter->packed_refs);
@@ -371,6 +372,170 @@ struct ref_iterator *mmapped_ref_iterator_begin(
return ref_iterator;
 }
 
+struct packed_ref_entry {
+   const char *start;
+   size_t len;
+};
+
+static int cmp_packed_ref_entries(const void *v1, const void *v2)
+{
+   const struct packed_ref_entry *e1 = v1, *e2 = v2;
+   const char *r1 = e1->start + GIT_SHA1_HEXSZ + 1;
+   const char *r2 = e2->start + GIT_SHA1_HEXSZ + 1;
+
+   while (1) {
+   if (*r1 == '\n')
+   return *r2 == '\n' ? 0 : -1;
+   if (*r1 != *r2) {
+   if (*r2 == '\n')
+   return 1;
+   else
+   return (unsigned char)*r1 < (unsigned char)*r2 
? -1 : +1;
+   }
+   r1++;
+   r2++;
+   }
+}
+
+/*
+ * `packed_refs->buf` is not known to be sorted. Check whether it is,
+ * and if not, sort it into new memory and munmap/free the old
+ * storage.
+ */
+static void sort_packed_refs(struct packed_ref_cache *packed_refs)
+{
+   struct packed_ref_entry *entries = NULL;
+   size_t alloc = 0, nr = 0;
+   int sorted = 1;
+   const char *pos, *eof, *eol;
+   size_t len, i;
+   char *new_buffer, *dst;
+
+   pos = packed_refs->buf + packed_refs->header_len;
+   eof = packed_refs->eof;
+   len = eof - pos;
+
+   if (!len)
+   return;
+
+   /*
+* Initialize entries based on a crude estimate of the number
+* of references in the file (we'll grow it below if needed):
+*/
+   ALLOC_GROW(entries, len / 80 + 20, alloc);
+
+   while (pos < eof) {
+

Re: RFC: Design and code of partial clones (now, missing commits and trees OK)

Jonathan Tan  writes:

> For those interested in partial clones and/or missing objects in repos,
> I've updated my original partialclone patches to not require an explicit
> list of promises. Fetch/clone still only permits exclusion of blobs, but
> the infrastructure is there for a local repo to support missing trees
> and commits as well.
> ...
> Demo
> 
>
> Obtain a repository.
>
> $ make prefix=$HOME/local install
> $ cd $HOME/tmp
> $ git clone https://github.com/git/git
>
> Make it advertise the new feature and allow requests for arbitrary blobs.
>
> $ git -C git config uploadpack.advertiseblobmaxbytes 1
> $ git -C git config uploadpack.allowanysha1inwant 1
>
> Perform the partial clone and check that it is indeed smaller. Specify
> "file://" in order to test the partial clone mechanism. (If not, Git will
> perform a local clone, which unselectively copies every object.)
>
> $ git clone --blob-max-bytes=10 "file://$(pwd)/git" git2
> $ git clone "file://$(pwd)/git" git3
> $ du -sh git2 git3
> 116M  git2
> 129M  git3
>
> Observe that the new repo is automatically configured to fetch missing objects
> from the original repo. Subsequent fetches will also be partial.
>
> $ cat git2/.git/config
> [core]
>   repositoryformatversion = 1
>   filemode = true
>   bare = false
>   logallrefupdates = true
> [remote "origin"]
>   url = [snip]
>   fetch = +refs/heads/*:refs/remotes/origin/*
>   blobmaxbytes = 10
> [extensions]
>   partialclone = origin
> [branch "master"]
>   remote = origin
>   merge = refs/heads/master

The above sequence of events make quite a lot of sense.  And the
following description of how it is designed (snipped) is clear
enough (at least to me) to allow me to say that I quite like it.

Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer

On Mon, Sep 18, 2017 at 04:17:24PM -0700, Jonathan Nieder wrote:

> phionah bugosi wrote:
> 
> > Just to reecho a previous change requested before in one of the mail
> > threads, we currently have two global variables declared:
> >
> > struct mru packed_git_mru_storage;
> > struct mru *packed_git_mru = &packed_git_mru_storage;
> >
> > We normally use pointers in C to point or refer to the same location
> > or space in memory from multiple places. That means in situations
> > where we will have the pointer be assigned to in many places in our
> > code. But in this case, we do not have any other logic refering or
> > assigning to the pointer packed_git_mru. In simple words we actually
> > dont need a pointer in this case.
> >
> > Therefore this patch makes packed_git_mru global a value instead of
> > a pointer and makes use of list.h
> >
> > Signed-off-by: phionah bugosi 
> > ---
> >  builtin/pack-objects.c |  5 +++--
> >  cache.h|  7 ---
> >  list.h |  6 ++
> >  packfile.c | 12 ++--
> >  4 files changed, 15 insertions(+), 15 deletions(-)
> 
> *puzzled* This appears to already be in "pu", with a different author.
> Did you independently make the same change?  Or are you asking for a
> progress report on that change, and just phrasing it in a confusing
> way?

I pointed Phionah at your #leftoverbits comment in:

  
https://public-inbox.org/git/20170912172839.gb144...@aiede.mtv.corp.google.com/

about moving packed_git_mru to list.h. But I'm afraid I wasn't very
clear in giving further guidance.

The goal is to build on _top_ of the patch in that message, and convert
the doubly-linked list implementation used in "struct mru" to use the
shared code in list.h. That should mostly involve touching mru.c and
mru.h, though I think we'd need to tweak the name of the "next" pointer
during the traversal, too.

-Peff

Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors

On Mon, Sep 18, 2017 at 07:46:24PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > When we fail to open $GIT_DIR/info/alternates, we silently
> > assume there are no alternates. This is the right thing to
> > do for ENOENT, but not for other errors.
> >
> > A hard error is probably overkill here. If we fail to read
> > an alternates file then either we'll complete our operation
> > anyway, or we'll fail to find some needed object. Either
> > way, a warning is good idea. And we already have a helper
> > function to handle this pattern; let's just call
> > warn_on_fopen_error().
> 
> I think I prefer a hard error.  What kind of cases are you imagining
> where it would be better to warn?
> 
> E.g. for EIO, erroring out so that the user can try again seems better
> than hoping that the application will be able to cope with the more
> subtle error that comes from discovering some objects are missing.
> 
> For EACCES, I can see how it makes sense to warn and move on, but no
> other errors like that are occuring to me.
> 
> Thoughts?

Yes, EACCES is exactly the example that came to mind.  But most
importantly, we don't know at this point whether the alternate is even
relevant to the current operation. So calling die() here would make some
cases fail that would otherwise succeed. That's usually not a big deal,
but we've had cases where being lazy about dying helps people use git
itself to help repair the case.

Admittedly most of those chicken-and-egg problems are centered around
git-config (e.g., using "git config --edit" to repair broken config), so
it's perhaps less important here. But it seems like a reasonable
principle to follow in general.

If there's a compelling reason to die hard, I'd reconsider it. But I
can't really think of one (if we were _writing_ a new alternates file
and encountered a read error of the old data we're copying, then I think
it would be very important to bail before claiming a successful write.
But that's not what's happening here).

-Peff

Re: [PATCH 1/2] read_info_alternates: read contents into strbuf

On Mon, Sep 18, 2017 at 07:42:53PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > Reported-by: Michael Haggerty 
> > Signed-off-by: Jeff King 
> > ---
> >  sha1_file.c | 29 +
> >  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> Thanks for tracking it down.

To be fair, Michael did most of the work in identifying and bisecting
the bug. He even wrote a very similar patch in parallel; I just swooped
in at the end.

> > path = xstrfmt("%s/info/alternates", relative_base);
> > -   fd = git_open(path);
> > -   free(path);
> > -   if (fd < 0)
> > -   return;
> > -   if (fstat(fd, &st) || (st.st_size == 0)) {
> > -   close(fd);
> > +   if (strbuf_read_file(&buf, path, 1024) < 0) {
> > +   free(path);
> > return;
> 
> strbuf_read_file is careful to release buf on failure, so this doesn't
> leak (but it's a bit subtle).

Yep. I didn't think it was worth calling out with a comment since the
"don't allocate on failure" pattern is common to most of the strbuf
functions.

> What happened to the !st.st_size case?  Is it eliminated for
> simplicity?

Good question, and the answer is yes. Obviously we can bail early on an
empty file, but I don't think there's any reason to complicate the code
with it (the original seems to come from d5a63b9983 (Alternate object
pool mechanism updates., 2005-08-14), where it avoided a corner case
that has long since been deleted.

-Peff

Re: [PATCH 0/2] fix read past end of array in alternates files

On Mon, Sep 18, 2017 at 07:36:03PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > This series fixes a regression in v2.11.1 where we might read past the
> > end of an mmap'd buffer. It was introduced in cf3c635210,
> 
> The above information is super helpful.  Can it go in one of the commit
> messages?

Er, didn't I?

> > base the patch on there, for a few reasons:
> >
> >   1. There's a trivial conflict when merging up (because of
> >  git_open_noatime() becoming just git_open() in the inerim).
> >
> >   2. The reproduction advice relies on our SANITIZE Makefile knob, which
> >  didn't exist back then.
> >
> >   3. The second patch does not apply there because we don't have
> >  warn_on_fopen_errors(). Though admittedly it could be applied
> >  separately after merging up; it's just a clean-up on top.
> 
> Even this part could go in a commit message, but it's fine for it not
> to.

IMHO this kind of meta information doesn't belong in the commit message.
It's useful to the maintainer to know where to apply the patch, but I
don't think it helps somebody who is reading "git log" output.

-Peff

Re: [PATCH v2] Improve performance of git status --ignored

Jameson Miller  writes:

> Improve the performance of the directory listing logic when it wants to list
> non-empty ignored directories. In order to show non-empty ignored directories,
> the existing logic will recursively iterate through all contents of an ignored
> directory. This change introduces the optimization to stop iterating through
> the contents once it finds the first file.

Wow, such an obviously correct optimization.  Very nicely explained, too.

> This can have a significant
> improvement in 'git status --ignored' performance in repositories with a large
> number of files in ignored directories.
>
> For an example of the performance difference on an example repository with
> 196,000 files in 400 ignored directories:
>
> | Command|  Time (s) |
> | -- | - |
> | git status |   1.2 |
> | git status --ignored (old) |   3.9 |
> | git status --ignored (new) |   1.4 |
>
> Signed-off-by: Jameson Miller 
> ---

I wish all the contributions I have to accept are as nicely done as
this one ;-)

Thanks.

>  dir.c | 47 +--
>  1 file changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 1c55dc3..1d17b80 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -49,7 +49,7 @@ struct cached_dir {
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>   struct index_state *istate, const char *path, int len,
>   struct untracked_cache_dir *untracked,
> - int check_only, const struct pathspec *pathspec);
> + int check_only, int stop_at_first_file, const struct pathspec 
> *pathspec);

We might want to make check_only and stop_at_first_file into a
single "unsigned flags" used as a collection of bits, but we can
wait until we start feeling the urge to add the third boolean
parameter to this function (at which point I'd probably demand a
preliminary clean-up to merge these two into a single flags word
before adding the third one as a new bit in that word).

> @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct 
> dir_struct *dir,
>  
>   untracked = lookup_untracked(dir->untracked, untracked,
>dirname + baselen, len - baselen);
> +
> + /*
> +  * If this is an excluded directory, then we only need to check if
> +  * the directory contains any files.
> +  */
>   return read_directory_recursive(dir, istate, dirname, len,
> - untracked, 1, pathspec);
> + untracked, 1, exclude, pathspec);

Nicely explained in the in-code comment.

I'd assume that you want your microsoft e-mail address used on the
signed-off-by line appear as the author, so I'll tweak this a bit to
make it so (otherwise, your 8...@gmail.com would become the author).

Thanks.

Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository

Jonathan Nieder  writes:

>>  test_expect_success 'showing the superproject correctly' '
>
> With the two tweaks mentioned above,
> Reviewed-by: Jonathan Nieder 

I agree with the fixes to the test titles suggested, so I'll queue
the patch with the fixes squashed in.  Hearing "yeah, the titles
were copy-pasted without adjusting, thanks for fixing, Jonathan!"
sent by Øystein would be super nice.

Thanks, both.

Re: [PATCH] t/README: fix typo and grammatically improve a sentence

Kaartic Sivaraam  writes:

> Signed-off-by: Kaartic Sivaraam 
> ---
>  t/README | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/t/README b/t/README
> index 2f95860369751..4b079e4494d93 100644
> --- a/t/README
> +++ b/t/README
> @@ -265,12 +265,12 @@ or:
>  
>  $ sh ./t9200-git-cvsexport-commit.sh --run='-3 21'
>  
> -As noted above, the test set is built going though items left to
> -right, so this:
> +As noted above, the test set is built by going through the items
> +from left to right, so this:
>  
>  $ sh ./t9200-git-cvsexport-commit.sh --run='1-4 !3'
>  
> -will run tests 1, 2, and 4.  Items that comes later have higher
> +will run tests 1, 2, and 4.  Items that come later have higher
>  precedence.  It means that this:
>  
>  $ sh ./t9200-git-cvsexport-commit.sh --run='!3 1-4'
>
> --
> https://github.com/git/git/pull/404

Both of these look to me obvious improvements.  I'll queue them
unless other people object.

Thanks.

Re: [RFC PATCH 0/2] Add named reference to latest push cert

Junio C Hamano  writes:

> But the point is that we do not want such an overhead in core, as
> all of that is a useless waste of the cycle for a site that wants to
> store the push certificate away outside of the repository itself.

By this I do not mean that cert blobs shouldn't become part of the
in-repository record in _all_ installations that receive signed push
certificates.  An easy to reuse example shipped together with our
source would be a good thing, and an easy to enable sample hook may
even be better.  It's just that I do not want to have any "solution"
in the core part that everybody that wants to accept push certs must
run, if the "solution" is not something we can endorse as the best
current practice.  And I do not yet see how anything along the lines
of the patch that started this thread, or its extention by wrapping
them with commit chain, would become that "best current practice".

A sample hook, on the other hand, is simpler for people to experiment
and we might even come up with an unversally useful best current
prctice out of such an experiment, though.

Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted

Ian Campbell  writes:

> This can be useful e.g. in `filter-branch` when rewritting tags produced by
> older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel
> source tree:
>
> $ git cat-file tag v2.6.12-rc2
> object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
> type commit
> tag v2.6.12-rc2
>
> Linux v2.6.12-rc2 release
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.4 (GNU/Linux)
>
> iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc
> wznDbFU45A54dZC8RZ5JxyE=
> =ESRP
> -END PGP SIGNATURE-
>
> $ git cat-file tag v2.6.12-rc2 | git mktag
> error: char76: could not find "tagger "
> fatal: invalid tag signature file
> $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger
> 9e734775f7c22d2f89943ad6c745571f1930105f
>
> To that end, pass the new option to `mktag` in `filter-branch`.

Hmph.  I cannot shake this nagging feeling that this is probably a
solution that is overly narrow to a single problem that won't scale
into the future.

As there is no guarantee that "missing-tagger" will stay to be the
only kind of broken tag objects we'd produce in one version, later
we notice our mistake and then forbid in another version, with the
approach to add '--allow-missing-tagger' optoin would imply that
we'd end up adding more and more such options, and filter-branch
will need to use all these '--allow-this-breakage' options we would
ever add.  Even though I fully agree with the problem you are trying
to solve (i.e. we want to be able to replay an old history without
our tool rejecting the data we have), it was my first reaction when
I read this patch.  IOW, my first reaction was "perhaps a single
option '--allow-broken' to cover the currently known and any future
shape of malformat over tag data is more appropriate".

But then, if we look at the body of cmd_mktag(), it looks like this:

int cmd_mktag(...)
{
read input into strbuf buf;
call verify_tag on buf to sanity check;
call write_sha1_file() the contents of buf as a tag;
report the object name;
}

If we drop the "verification" step from the above, that essentially
becomes an equivaent to "hash-object -t tag -w --stdin".

So I now have to wonder if it may be sufficient to use "hash-object"
in filter-branch, without doing this "allow malformed data that we
would not permit if the tag were being created today, only to help
replaying an old, already broken data" change to "git mktag".

Is there something that makes "hash-object" insufficient (like it
still does some extra checks we would want to disable and cannot
work as a replacement for your "--allow-missing-tagger")?

Thanks.

> Signed-off-by: Ian Campbell 
> ---
>  Documentation/git-mktag.txt |   9 +++-
>  builtin/mktag.c | 100 
> +---
>  git-filter-branch.sh|   2 +-
>  t/t3800-mktag.sh|  33 ++-
>  4 files changed, 98 insertions(+), 46 deletions(-)
>
> diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt
> index fa6a75612..c720c7419 100644
> --- a/Documentation/git-mktag.txt
> +++ b/Documentation/git-mktag.txt
> @@ -9,7 +9,7 @@ git-mktag - Creates a tag object
>  SYNOPSIS
>  
>  [verse]
> -'git mktag'
> +'git mktag' [--allow-missing-tagger]
>  
>  DESCRIPTION
>  ---
> @@ -34,6 +34,13 @@ exists, is separated by a blank line from the header.  The
>  message part may contain a signature that Git itself doesn't
>  care about, but that can be verified with gpg.
>  
> +OPTIONS
> +---
> +--allow-missing-tagger::
> + Allow the `tagger` line in the header to be omitted. This is
> + rarely desirable but may be useful in recreating tags created
> + by older Git.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/builtin/mktag.c b/builtin/mktag.c
> index 031b750f0..0f5dae8d5 100644
> --- a/builtin/mktag.c
> +++ b/builtin/mktag.c
> @@ -1,4 +1,5 @@
>  #include "builtin.h"
> +#include "parse-options.h"
>  #include "tag.h"
>  
>  /*
> @@ -15,6 +16,8 @@
>   * the shortest possible tagger-line.
>   */
>  
> +static int allow_missing_tagger;
> +
>  /*
>   * We refuse to tag something we can't verify. Just because.
>   */
> @@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size)
>   unsigned char sha1[20];
>   const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb;
>   size_t len;
> + const unsigned long min_size = allow_missing_tagger ? 71 : 84;
>  
> - if (size < 84)
> + if (size < min_size)
>   return error("wanna fool me ? you obviously got the size wrong 
> !");
>  
>   buffer[size] = 0;
> @@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size)
>   /* Verify the tagger line */
>   tagger_line = tag_line;
>  
> -

Re: [PATCH 2/2] t9010-*.sh: skip all tests if the PIPE prereq is missing

Ramsay Jones wrote:

> Every test in this file, except one, is marked with the PIPE prereq.
> However, that lone test ('set up svn repo'), only performs some setup
> work and checks whether the following test should be executed (by
> setting an additional SVNREPO prerequisite). Since the following test
> also requires the PIPE prerequisite, performing the setup test, when the
> PIPE preequisite is missing, is simply wasted effort. Use the skip-all
> test facility to skip all tests when the PIPE prerequisite is missing.
>
> Signed-off-by: Ramsay Jones 
> ---
>  t/t9010-svn-fe.sh | 55 
> ---
>  1 file changed, 28 insertions(+), 27 deletions(-)

It was always the intention that this test script eventually be able
to run on Windows, but it was not meant to be.  It would need to use a
socket or something for backflow to work on Windows.

Reviewed-by: Jonathan Nieder

Re: [PATCH 1/2] test-lib: use more compact expression in PIPE prerequisite

Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 
> ---
>  t/test-lib.sh | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)

Reviewed-by: Jonathan Nieder

Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors

Jeff King wrote:

> When we fail to open $GIT_DIR/info/alternates, we silently
> assume there are no alternates. This is the right thing to
> do for ENOENT, but not for other errors.
>
> A hard error is probably overkill here. If we fail to read
> an alternates file then either we'll complete our operation
> anyway, or we'll fail to find some needed object. Either
> way, a warning is good idea. And we already have a helper
> function to handle this pattern; let's just call
> warn_on_fopen_error().

I think I prefer a hard error.  What kind of cases are you imagining
where it would be better to warn?

E.g. for EIO, erroring out so that the user can try again seems better
than hoping that the application will be able to cope with the more
subtle error that comes from discovering some objects are missing.

For EACCES, I can see how it makes sense to warn and move on, but no
other errors like that are occuring to me.

Thoughts?

Thanks,
Jonathan

> Note that technically the errno from strbuf_read_file()
> might be from a read() error, not open(). But since read()
> would never return ENOENT or ENOTDIR, and since it produces
> a generic "unable to access" error, it's suitable for
> handling errors from either.

Re: [PATCH 1/2] read_info_alternates: read contents into strbuf

Jeff King wrote:

> Reported-by: Michael Haggerty 
> Signed-off-by: Jeff King 
> ---
>  sha1_file.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)

Thanks for tracking it down.

Reviewed-by: Jonathan Nieder 

[...]
> +++ b/sha1_file.c
[...]
> @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int 
> len, int sep,
>  
>  static void read_info_alternates(const char * relative_base, int depth)
>  {
> - char *map;
> - size_t mapsz;
> - struct stat st;
>   char *path;
> - int fd;
> + struct strbuf buf = STRBUF_INIT;
>  
>   path = xstrfmt("%s/info/alternates", relative_base);
> - fd = git_open(path);
> - free(path);
> - if (fd < 0)
> - return;
> - if (fstat(fd, &st) || (st.st_size == 0)) {
> - close(fd);
> + if (strbuf_read_file(&buf, path, 1024) < 0) {
> + free(path);
>   return;

strbuf_read_file is careful to release buf on failure, so this doesn't
leak (but it's a bit subtle).

What happened to the !st.st_size case?  Is it eliminated for
simplicity?

The rest looks good.

Thanks,
Jonathan

Re: [PATCH 0/2] fix read past end of array in alternates files

Jeff King wrote:

> This series fixes a regression in v2.11.1 where we might read past the
> end of an mmap'd buffer. It was introduced in cf3c635210,

The above information is super helpful.  Can it go in one of the commit
messages?

>   but I didn't
> base the patch on there, for a few reasons:
>
>   1. There's a trivial conflict when merging up (because of
>  git_open_noatime() becoming just git_open() in the inerim).
>
>   2. The reproduction advice relies on our SANITIZE Makefile knob, which
>  didn't exist back then.
>
>   3. The second patch does not apply there because we don't have
>  warn_on_fopen_errors(). Though admittedly it could be applied
>  separately after merging up; it's just a clean-up on top.

Even this part could go in a commit message, but it's fine for it not
to.

Thanks,
Jonathan

Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository

Hi,

Øystein Walle wrote:

> Running `git fetch --unshallow` on a repo that is not in fact shallow
> produces a fatal error message.

Hm, can you say more about the context?  From a certain point of view,
it might make sense for that command to succeed instead: if the repo
is already unshallow, then why should't "fetch --unshallow" complain
instead of declaring victory?

> Add a helper to rev-parse that scripters
> can use to determine whether a repo is shallow or not.
>
> Signed-off-by: Øystein Walle 
> ---
>  Documentation/git-rev-parse.txt |  3 +++
>  builtin/rev-parse.c |  5 +
>  t/t1500-rev-parse.sh| 15 +++
>  3 files changed, 23 insertions(+)

Regardless, this new rev-parse --is-shallow helper looks like a good
feature.

[...]
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const 
> char *prefix)
>   : "false");
>   continue;
>   }
> + if (!strcmp(arg, "--is-shallow-repository")) {
> + printf("%s\n", is_repository_shallow() ? "true"
> + : "false");
> + continue;
> + }

The implementation is straightforward and correct.

[...]
> --- a/t/t1500-rev-parse.sh
> +++ b/t/t1500-rev-parse.sh

Thanks for writing tests. \o/

> @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'git-path shallow repository' '

What does git-path mean here?  I wonder if it's a copy/paste error.
Did you mean something like

 test_expect_success 'rev-parse --is-shallow-repository in shallow repo' '

?

> + test_commit test_commit &&
> + echo true >expect &&
> + git clone --depth 1 --no-local . shallow &&
> + test_when_finished "rm -rf shallow" &&
> + git -C shallow rev-parse --is-shallow-repository >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'git-path notshallow repository' '

Likewise: should this be

 test_expect_success 'rev-parse --is-shallow-repository in non-shallow repo' '

?

> + echo false >expect &&
> + git rev-parse --is-shallow-repository >actual &&
> + test_cmp expect actual
> +'
> +
>  test_expect_success 'showing the superproject correctly' '

With the two tweaks mentioned above,
Reviewed-by: Jonathan Nieder 

Thanks.

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

Johannes Schindelin  writes:

>> Do you have a concrete suggestion to make these individual entries more
>> helpful for people who may want go back to the original thread in doing
>> so?  In-reply-to: or References: fields of the "What's cooking" report
>> would not help.  I often have the message IDs that made/helped me make
>> these individual comments in the description; they alone would not react
>> to mouse clicks, though.
>
> Oh gawd, not even more stuff piled onto the mail format. Please stop.
> ...
> It probably tries to serve too many purposes at the same time, and thereby
> none.

Well, this was started as my attempt to give a public service that
shows a summary of what is happening in the entire integration tree,
as there was nothing like that before (and going to github.com and
looking at 'pu' branch would not give you an easy overview).  As
many people contribute many topics to the project, complaining that
it talks about too many topics would not get you anywhere.

If you find "What's cooking" report not serving your needs, and if
no one finds it not serving his or her needs, then I can stop
sending these out, of course, but I am not getting the impression
that we are at that point, at least not yet.

Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

Lars Schneider  writes:

> SZEDER noticing a bug in this series that I was about to fix:
> https://public-inbox.org/git/3b175d35-5b1c-43cd-a7e9-85693335b...@gmail.com/
>
> I assume at this point a new patch is better than a re-roll, right?

Thanks, yes indeed.

Re: [RFC PATCH 0/2] Add named reference to latest push cert

Santiago Torres  writes:

> - *if there is a hook* the blob is computed, but it is up to the
>   hook itself to store it *somewhere*. This makes me feel like it's
>   somewhat of a useless waste of computation if the hook is not
>   meant to handle it anyway(which is just a post-receive hook). I
>   find it rather weird that --signed is a builtin flag, and is
>   handled on the server side only partially (just my two cents).

The way it was envisioned to be used is that the repository meant to
be protected by collected push certs may not be trusted as the
permanent store for push certs by all hosting sites.  The hook may
be told the name of a blob to read its contents and is expected to
store it away to somewhere else.

The only reason why we use blob is because creating a blob in
respose to pushes being executed in parallel will result in
different blobs unless there is hash collision.  Instead of us
having to come up with and use a different mechanism to create a
unique temporary filename and feed that to hook, reusing blob as
such was the simplest.

> I understand the concurrency concerns, so I agree with stefan's
> solution, although I don't know how big of a deal it would be, if git
> already supports --atomic pushes (admittedly, I haven't checked if there
> are any guards in place for someone who pushes millions of refs
> atomically). It'd be completely understandable to have a setting to
> disable hadnling of --signed pushes and, ideally, a way to warn the user
> of this feature being disabled if --signed is sent.

I do not think atomic helps at all, when one atomic push updates
branch A while another atomic push updates branch B.  They can still
go in parallel, and their certificates must both be stored.  You can
somehow serialize them and create a single strand of pearls to
advance a single ref, or you can let both to fork two histories to
store the push certs from these two pushes and have somebody create
a merge commit to join the history.  

But the point is that we do not want such an overhead in core, as
all of that is a useless waste of the cycle for a site that wants to
store the push certificate away outside of the repository itself.

Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()

Derrick Stolee  writes:

>> But I do not think we want this "clever" optimization that involves
>> 'n' in the first place.
 +  while (count++ < 10) {
>>> +   for (i = 0; i < n; i++)
>>> +   ((unsigned int*)oid.hash)[i] = hash_base;
>>
>> Does it make sense to assume that uint is always 4-byte (so this
>> code won't work if it is 8-byte on your platform) and doing this is
>> faster than using platform-optimized memcpy()?
>
> I'm not sure what you mean by using memcpy to improve this, because
> it would require calling memcpy in the inner loop, such as
>
>   for (i = 0; i < n; i++)
>   memcpy(oid.hash + i * sizeof(unsigned), &hash_base,
>  sizeof(unsigned));

Sorry, I left it without saying as I thought it was obvious, but
what I meant was to use a whole "struct oid", not just a single
unsigned (repeated), as the hash that is tested.  If you have an
array of object names you use in the test, then

for (count = 0; count < limit; count++) {
hashcpy(&oid.hash, &samples[count]);

... do the probing ...
}

> First, this doesn't just measure the time it takes to determine non-
> existence,

Sorry, my phrasing was indeed misleading.  I know the time we spend
to see if we have or do not have the object is the largest cycle
spender in these codepaths (and even if it were, I do not think that
is what you are trying to optimize in these patches anyway).  

But if I recall correctly, the way we come up with the unique
abbreviation for an object that exists and an object that does not
exist are different?  And because most of the time (think: "git log
-p" output) we would be finding abbreviation for objects that we do
have, benchmarking and tweaking the code that comes up with an
object that does not exist is not optimizing for the right case.

Back when I wrote that initial response, I didn't check how
different the code was between the two cases, but now I did.  It
seems that in both cases we start from the shortest-allowed length
and then extend the same way, and the only difference between these
two cases is that we return immediately when our candidate name is
long enough not to match any existing object when the full name
refers to an object we do not have, while we return only when
disambiguity is resolved.  I _think_ these amount to the same
computation (i.e. when an object with the full name we have exists,
the amount of computation we need to come up with its unique
abbreviation is the same as the computation we need to find the
unique abbreviation for the same name in another repository that has
identical set of objects, minus that single object), so from that
point of view, throwing random hashes, most of which would not name
any existing object, and measuring how much time it takes to run
get_short_oid() to compute the minimum length of the unique prefix
should be sufficient.

Thanks.

Re: [PATCH] describe: teach --match to handle branches and remotes

2017-09-18 Thread Jacob Keller

On Mon, Sep 18, 2017 at 4:52 PM, Junio C Hamano  wrote:
> Max Kirillov  writes:
>
>>  --match ::
>>   Only consider tags matching the given `glob(7)` pattern,
>> - excluding the "refs/tags/" prefix.  This can be used to avoid
>> - leaking private tags from the repository. If given multiple times, a
>> - list of patterns will be accumulated, and tags matching any of the
>> - patterns will be considered. Use `--no-match` to clear and reset the
>> - list of patterns.
>> + excluding the "refs/tags/" prefix. If used with `--all`, it also
>> + considers local branches and remote-tracking references matching the
>> + pattern, excluding respectively "refs/heads/" and "refs/remotes/"
>> + prefix; references of other types are never considered. If given
>> + multiple times, a list of patterns will be accumulated, and tags
>> + matching any of the patterns will be considered.  Use `--no-match` to
>> + clear and reset the list of patterns.
>>
>>  --exclude ::
>>   Do not consider tags matching the given `glob(7)` pattern, excluding
>> - the "refs/tags/" prefix. This can be used to narrow the tag space and
>> - find only tags matching some meaningful criteria. If given multiple
>> - times, a list of patterns will be accumulated and tags matching any
>> - of the patterns will be excluded. When combined with --match a tag will
>> - be considered when it matches at least one --match pattern and does not
>> + the "refs/tags/" prefix. If used with `--all`, it also does not 
>> consider
>> + local branches and remote-tracking references matching the pattern,
>> + excluding respectively "refs/heads/" and "refs/remotes/" prefix;
>> + references of other types are never considered. If given multiple 
>> times,
>> + a list of patterns will be accumulated and tags matching any of the
>> + patterns will be excluded. When combined with --match a tag will be
>> + considered when it matches at least one --match pattern and does not
>>   match any of the --exclude patterns. Use `--no-exclude` to clear and
>>   reset the list of patterns.
>
> OK, I find this written clearly enough.
>
>> diff --git a/builtin/describe.c b/builtin/describe.c
>> index 94ff2fba0b..2a2e998063 100644
>> --- a/builtin/describe.c
>> +++ b/builtin/describe.c
>> @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path,
>>   }
>>  }
>>
>> +/* Drops prefix. Returns NULL if the path is not expected with current 
>> settings. */
>> +static const char *get_path_to_match(int is_tag, int all, const char *path)
>> +{
>> + if (is_tag)
>> + return path + 10;
>
> This is a faithful conversion of the existing code that wants to
> behave the same as original, but a bit more on this later.
>
>> + else if (all) {
>> + if (starts_with(path, "refs/heads/"))
>> + return path + 11; /* "refs/heads/..." */
>> + else if (starts_with(path, "refs/remotes/"))
>> + return path + 13; /* "refs/remotes/..." */
>> + else
>> + return 0;
>
> I think you can use skip_prefix() to avoid counting the length of
> the prefix yourself, i.e.
>
> else if all {
> const char *body;
>
> if (skip_prefix(path, "refs/heads/", &body))
> return body;
> else if (skip_prefix(path, "refs/remotes/", &body))
> ...
> }
>
> Whether you do the above or not, the last one that returns 0 should
> return NULL (to the language it is the same thing, but to humans, we
> write NULL when it is the null pointer, not the number 0).
>
>> + } else
>> + return NULL;
>> +}
>
> Perhaps the whole thing may want to be a bit more simplified, like:
>
> static const *skip_ref_prefix(const char *path, int all)
> {
> const char *prefix[] = {
> "refs/tags/", "refs/heads/", "refs/remotes/"
> };
> const char *body;
> int cnt;
> int bound = all ? ARRAY_SIZE(prefix) : 1;
>

I found the implicit use of "bound = 1" means "we only care about
tags" to be a bit weird here. I guess it's not really that big a deal
overall, and this is definitely cleaner than the original
implementation.

> for (cnt = 0; cnt < bound; cnt++)
> if (skip_prefix(path, prefix[cnt], &body);
> return body;
> return NULL;
> }
>
> The hardcoded +10 for "is_tag" case assumes that anything other than
> "refs/tags/something" would ever be used to call into this function
> when is_tag is true, and that may well be true in the current code
> and have been so ever since the original code was written, but it
> still smells like an invitation for future bugs.
>
> I dunno.
>
>> +
>>  static int get_name(const ch

Re: [PATCH 1/2] read_info_alternates: read contents into strbuf

Jeff King  writes:

> We could also just make a NUL-terminated copy of the input
> bytes and operate on that. But since all but one caller
> already is passing a string, instead let's just fix that
> caller to provide NUL-terminated input in the first place,
> by swapping out mmap for strbuf_read_file().
> ...
> Let's also drop the "len" parameter entirely from
> link_alt_odb_entries(), since it's completely ignored. That
> will avoid any new callers re-introducing a similar bug.

Both design decisions make perfect sense to me.

>  sha1_file.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)

And diffstat also agrees that it is a good change ;-)

Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

> I am hoping that this last one is not allowed and we can use the
> "same condition is checked every time we loop" version that hides
> the uglyness inside the macro.

By which you are referring to Jonathans solution posted.
Maybe we can combine the two solutions (checking for thelist
to not be NULL once, by Jonathan) and using an outer structure
(SZEDERs solution) by replacing the condition by a for loop,
roughly (untested):

#define for_each_string_list_item(item,list) \
-   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
+for (; list; list = NULL)
+for (item = (list)->items; item < (list)->items + (list)->nr; ++item)

as that would not mingle with any dangling else clause.
It is also just one statement, such that

if (bla)
  for_each_string_list_item {
baz(item);
  }
else
  foo;

still works.

Are there downsides to this combined approach?

Re: [PATCH] describe: teach --match to handle branches and remotes

Max Kirillov  writes:

>  --match ::
>   Only consider tags matching the given `glob(7)` pattern,
> - excluding the "refs/tags/" prefix.  This can be used to avoid
> - leaking private tags from the repository. If given multiple times, a
> - list of patterns will be accumulated, and tags matching any of the
> - patterns will be considered. Use `--no-match` to clear and reset the
> - list of patterns.
> + excluding the "refs/tags/" prefix. If used with `--all`, it also
> + considers local branches and remote-tracking references matching the
> + pattern, excluding respectively "refs/heads/" and "refs/remotes/"
> + prefix; references of other types are never considered. If given
> + multiple times, a list of patterns will be accumulated, and tags
> + matching any of the patterns will be considered.  Use `--no-match` to
> + clear and reset the list of patterns.
>  
>  --exclude ::
>   Do not consider tags matching the given `glob(7)` pattern, excluding
> - the "refs/tags/" prefix. This can be used to narrow the tag space and
> - find only tags matching some meaningful criteria. If given multiple
> - times, a list of patterns will be accumulated and tags matching any
> - of the patterns will be excluded. When combined with --match a tag will
> - be considered when it matches at least one --match pattern and does not
> + the "refs/tags/" prefix. If used with `--all`, it also does not consider
> + local branches and remote-tracking references matching the pattern,
> + excluding respectively "refs/heads/" and "refs/remotes/" prefix;
> + references of other types are never considered. If given multiple times,
> + a list of patterns will be accumulated and tags matching any of the
> + patterns will be excluded. When combined with --match a tag will be
> + considered when it matches at least one --match pattern and does not
>   match any of the --exclude patterns. Use `--no-exclude` to clear and
>   reset the list of patterns.

OK, I find this written clearly enough.

> diff --git a/builtin/describe.c b/builtin/describe.c
> index 94ff2fba0b..2a2e998063 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path,
>   }
>  }
>  
> +/* Drops prefix. Returns NULL if the path is not expected with current 
> settings. */
> +static const char *get_path_to_match(int is_tag, int all, const char *path)
> +{
> + if (is_tag)
> + return path + 10;

This is a faithful conversion of the existing code that wants to
behave the same as original, but a bit more on this later.

> + else if (all) {
> + if (starts_with(path, "refs/heads/"))
> + return path + 11; /* "refs/heads/..." */
> + else if (starts_with(path, "refs/remotes/"))
> + return path + 13; /* "refs/remotes/..." */
> + else
> + return 0;

I think you can use skip_prefix() to avoid counting the length of
the prefix yourself, i.e.

else if all {
const char *body;

if (skip_prefix(path, "refs/heads/", &body))
return body;
else if (skip_prefix(path, "refs/remotes/", &body))
...
}

Whether you do the above or not, the last one that returns 0 should
return NULL (to the language it is the same thing, but to humans, we
write NULL when it is the null pointer, not the number 0).

> + } else
> + return NULL;
> +}

Perhaps the whole thing may want to be a bit more simplified, like:

static const *skip_ref_prefix(const char *path, int all)
{
const char *prefix[] = {
"refs/tags/", "refs/heads/", "refs/remotes/"
};
const char *body;
int cnt;
int bound = all ? ARRAY_SIZE(prefix) : 1;

for (cnt = 0; cnt < bound; cnt++)
if (skip_prefix(path, prefix[cnt], &body);
return body;
return NULL;
}

The hardcoded +10 for "is_tag" case assumes that anything other than
"refs/tags/something" would ever be used to call into this function
when is_tag is true, and that may well be true in the current code
and have been so ever since the original code was written, but it
still smells like an invitation for future bugs.

I dunno.

> +
>  static int get_name(const char *path, const struct object_id *oid, int flag, 
> void *cb_data)
>  {
>   int is_tag = starts_with(path, "refs/tags/");
> @@ -140,12 +156,13 @@ static int get_name(const char *path, const struct 
> object_id *oid, int flag, voi
>*/
>   if (exclude_patterns.nr) {
>   struct string_list_item *item;
> + const char *path_to_match = get_path_to_match(is_tag, all, 
> path);

> +

Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer

Hi,

phionah bugosi wrote:

> Just to reecho a previous change requested before in one of the mail
> threads, we currently have two global variables declared:
>
> struct mru packed_git_mru_storage;
> struct mru *packed_git_mru = &packed_git_mru_storage;
>
> We normally use pointers in C to point or refer to the same location
> or space in memory from multiple places. That means in situations
> where we will have the pointer be assigned to in many places in our
> code. But in this case, we do not have any other logic refering or
> assigning to the pointer packed_git_mru. In simple words we actually
> dont need a pointer in this case.
>
> Therefore this patch makes packed_git_mru global a value instead of
> a pointer and makes use of list.h
>
> Signed-off-by: phionah bugosi 
> ---
>  builtin/pack-objects.c |  5 +++--
>  cache.h|  7 ---
>  list.h |  6 ++
>  packfile.c | 12 ++--
>  4 files changed, 15 insertions(+), 15 deletions(-)

*puzzled* This appears to already be in "pu", with a different author.
Did you independently make the same change?  Or are you asking for a
progress report on that change, and just phrasing it in a confusing
way?

Confused,
Jonathan

Re: RFC v3: Another proposed hash function transition plan

Hi,

Gilles Van Assche wrote:
> Hi Johannes,

>> SHA-256 got much more cryptanalysis than SHA3-256 […].
>
> I do not think this is true. Keccak/SHA-3 actually got (and is still
> getting) a lot of cryptanalysis, with papers published at renowned
> crypto conferences [1].
>
> Keccak/SHA-3 is recognized to have a significant safety margin. E.g.,
> one can cut the number of rounds in half (as in Keyak or KangarooTwelve)
> and still get a very strong function. I don't think we could say the
> same for SHA-256 or SHA-512…

I just wanted to thank you for paying attention to this conversation
and weighing in.

Most of the regulars in the git project are not crypto experts.  This
kind of extra information (and e.g. [2]) is very useful to us.

Thanks,
Jonathan

> Kind regards,
> Gilles, for the Keccak team
>
> [1] https://keccak.team/third_party.html
[2] 
https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/

Re: RFC v3: Another proposed hash function transition plan

2017-09-18 Thread Johannes Schindelin

Hi Gilles,

On Mon, 18 Sep 2017, Gilles Van Assche wrote:

> > SHA-256 got much more cryptanalysis than SHA3-256 […].
> 
> I do not think this is true.

Please read what I said again: SHA-256 got much more cryptanalysis than
SHA3-256.

I never said that SHA3-256 got little cryptanalysis. Personally, I think
that SHA3-256 got a ton more cryptanalysis than SHA-1, and that SHA-256
*still* got more cryptanalysis. But my opinion does not count, really.
However, the two experts I pestered with questions over questions left me
with that strong impression, and their opinion does count.

> Keccak/SHA-3 actually got (and is still getting) a lot of cryptanalysis,
> with papers published at renowned crypto conferences [1].
> 
> Keccak/SHA-3 is recognized to have a significant safety margin. E.g.,
> one can cut the number of rounds in half (as in Keyak or KangarooTwelve)
> and still get a very strong function. I don't think we could say the
> same for SHA-256 or SHA-512…

Again, I do not want to criticize SHA3/Keccak. Personally, I have a lot of
respect for Keccak.

I also have a lot of respect for everybody who scrutinized the SHA2 family
of algorithms.

I also respect the fact that there are more implementations of SHA-256,
and thanks to everybody seeming to demand SHA-256 checksums instead of
SHA-1 or MD5 for downloads, bugs in those implementations are probably
discovered relatively quickly, and I also cannot ignore the prospect of
hardware support for SHA-256.

In any case, having SHA3 as a fallback in case SHA-256 gets broken seems
like a very good safety net to me.

Ciao,
Johannes

Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

>> instead compared to the proposed configuration above.
>> (Even better yet, then people could play around with "v1 only"
>> and see how it falls apart on old servers)
>
> Except we can't start with an explicit whitelist because we must
> fallback to v0 if v1 isn't supported otherwise we would break people.
>
> That is unless we have the semantics of: If not configured v0 will be
> used, otherwise only use the configured protocol versions.
>

Good point.

Thinking about this, how about:

  If not configured, we do as we want. (i.e. Git has full control over
  it's decision making process, which for now is "favor v0 over v1 as
  we are experimenting with v1". This strategy may change in the future
  to "prefer highest version number that both client and server can speak".)

  If it is configured, "use highest configured number from the given set".

?

Re: [PATCH v2] add test for bug in git-mv for recursive submodules

>> Took a little while but here is a more clean patch creating individual
>> submodules for the nesting.
>>
>> Cheers Heiko

Thanks for writing this test!

>
> Thanks.  Stefan, does this look good to you now?

Yes, though there are nits below.

> It is not quite clear which step is expected to fail with the
> current code by reading the test or the proposed log message.  Does
> "mv" refuse to work and we do not get to run "status", or does
> "status" report a failure, or do we fail well before that?

git-mv failing seems like a new possibility without incurring
another process spawn with the new repository object.
(Though then we could also just fix the recursed submodule)

> The log message that only says "This does not work when ..." is not
> helpful in figuring it out, either.  Something like "This does not
> work and fails to update the paths for its configurations" or
> whatever that describes "what actually happens" (in contrast to
> "what ought to happen", which you described clearly) should be
> there.
>
> Description on how you happened to have discovered the issue feels a
> lot less relevant compared to that, and it is totally useless if it
> is unclear what the issue is in the first place.
>
>>  t/t7001-mv.sh | 25 +
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
>> index e365d1ff77..cbc5fb37fe 100755
>> --- a/t/t7001-mv.sh
>> +++ b/t/t7001-mv.sh
>> @@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested 
>> directories' '
>>   test_cmp actual expect
>>  '
>>
>> +test_expect_failure 'moving nested submodules' '
>> + git commit -am "cleanup commit" &&
>> + mkdir sub_nested_nested &&
>> + (cd sub_nested_nested &&

We seem to have different styles for nested shell. I prefer

  outside command &&
  (
  first nested command here &&
  ...

as that aligns indentation to the nesting level. I have seen
the style you use a lot in the  test suite, and we do not have
a guideline in Documentation/CodingGuidelines, so I do not
complain too loudly. ;)


>> + touch nested_level2 &&
>> + git init &&
>> + git add . &&
>> + git commit -m "nested level 2"
>> + ) &&
>> + mkdir sub_nested &&
>> + (cd sub_nested &&
>> + touch nested_level1 &&
>> + git init &&
>> + git add . &&
>> + git commit -m "nested level 1"
>> + git submodule add ../sub_nested_nested &&
>> + git commit -m "add nested level 2"
>> + ) &&
>> + git submodule add ./sub_nested nested_move &&
>> + git commit -m "add nested_move" &&
>> + git submodule update --init --recursive &&

So far a nice setup!

>> + git mv nested_move sub_nested_moved &&

This is the offending command that produces the bug,
as it will break most subsequent commands, such as

>> + git status

git-status is one of the basic commands. Without
status to function, I think it is hard to recover your repo without
a lot of in-depth knowledge of Git (submodules).

I wonder if git-status should complain more gracefully
and fallback to one of --ignore-submodules={dirty, all},
that actually still works.

Maybe we could introduce a new default mode for this
flag, that is "none-except-on-error", though this sounds
as if we're fixing symptoms instead of the root cause.

Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

On 09/18, Stefan Beller wrote:
> >> From a users POV this may be frustrating as I would imagine that
> >> people want to run
> >>
> >>   git config --global protocol.version 2
> >>
> >> to try it out and then realize that some of their hosts do not speak
> >> 2, so they have to actually configure it per repo/remote.
> >
> > The point would be to be able to set this globally, not per-repo.  Even
> > if a repo doesn't speak v2 then it should be able to gracefully degrade
> > to v1 without the user having to do anything.  The reason why there is
> > this escape hatch is if doing the protocol negotiation out of band
> > causing issues with communicating with a server that it can be shut off.
> 
> In the current situation it is easy to assume that if v1 (and not v0)
> is configured, that the users intent is "to try out v1 and fallback
> gracefully to v0".
> 
> But this will change over time in the future!
> 
> Eventually people will have the desire to say:
> "Use version N+1, but never version N", because N has
> performance or security issues; the user might not want
> to bother to try N or even actively want to be affirmed that
> Git will never use version N.
> 
> In this future we need a mechanism, that either contains a
> white list or black list of protocols. To keep it simple (I assume
> there won't be many protocol versions), a single white list will do.
> 
> However transitioning from the currently proposed "try the new
> configured thing and fallback to whatever" to "this is the exact list
> of options that Git will try for you" will be hard, as we may break people
> if we do not unconditionally fall back to v0.
> 
> That is why I propose to start with an explicit white list as then we
> do not have to have a transition plan or otherwise work around the
> issue. Also it doesn't hurt now to use
> 
> git config --global protocol.version v1,v0
> 
> instead compared to the proposed configuration above.
> (Even better yet, then people could play around with "v1 only"
> and see how it falls apart on old servers)

Except we can't start with an explicit whitelist because we must
fallback to v0 if v1 isn't supported otherwise we would break people.

That is unless we have the semantics of: If not configured v0 will be
used, otherwise only use the configured protocol versions.

-- 
Brandon Williams

Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

>> From a users POV this may be frustrating as I would imagine that
>> people want to run
>>
>>   git config --global protocol.version 2
>>
>> to try it out and then realize that some of their hosts do not speak
>> 2, so they have to actually configure it per repo/remote.
>
> The point would be to be able to set this globally, not per-repo.  Even
> if a repo doesn't speak v2 then it should be able to gracefully degrade
> to v1 without the user having to do anything.  The reason why there is
> this escape hatch is if doing the protocol negotiation out of band
> causing issues with communicating with a server that it can be shut off.

In the current situation it is easy to assume that if v1 (and not v0)
is configured, that the users intent is "to try out v1 and fallback
gracefully to v0".

But this will change over time in the future!

Eventually people will have the desire to say:
"Use version N+1, but never version N", because N has
performance or security issues; the user might not want
to bother to try N or even actively want to be affirmed that
Git will never use version N.

In this future we need a mechanism, that either contains a
white list or black list of protocols. To keep it simple (I assume
there won't be many protocol versions), a single white list will do.

However transitioning from the currently proposed "try the new
configured thing and fallback to whatever" to "this is the exact list
of options that Git will try for you" will be hard, as we may break people
if we do not unconditionally fall back to v0.

That is why I propose to start with an explicit white list as then we
do not have to have a transition plan or otherwise work around the
issue. Also it doesn't hurt now to use

git config --global protocol.version v1,v0

instead compared to the proposed configuration above.
(Even better yet, then people could play around with "v1 only"
and see how it falls apart on old servers)

RE: [PATCH v6 12/12] fsmonitor: add a performance test


> -Original Message-
> From: Johannes Schindelin [mailto:johannes.schinde...@gmx.de]
> Sent: Monday, September 18, 2017 10:25 AM
> To: Ben Peart 
> Cc: david.tur...@twosigma.com; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> pclo...@gmail.com; p...@peff.net
> Subject: Re: [PATCH v6 12/12] fsmonitor: add a performance test
> 
> Hi Ben,
> 
> sorry for not catching this earlier:
> 
> On Fri, 15 Sep 2017, Ben Peart wrote:
> 
> > [...]
> > +
> > +int cmd_dropcaches(void)
> > +{
> > +   HANDLE hProcess = GetCurrentProcess();
> > +   HANDLE hToken;
> > +   int status;
> > +
> > +   if (!OpenProcessToken(hProcess, TOKEN_QUERY |
> TOKEN_ADJUST_PRIVILEGES, &hToken))
> > +   return error("Can't open current process token");
> > +
> > +   if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1))
> > +   return error("Can't get SeProfileSingleProcessPrivilege");
> > +
> > +   CloseHandle(hToken);
> > +
> > +   HMODULE ntdll = LoadLibrary("ntdll.dll");
> 

Thanks Johannes, I'll fix that.

> Git's source code still tries to abide by C90, and for simplicity's sake, this
> extends to the Windows-specific part. Therefore, the `ntdll` variable needs to
> be declared at the beginning of the function (I do agree that it makes for
> better code to reduce the scope of variables, but C90 simply did not allow
> variables to be declared in the middle of functions).
> 
> I wanted to send a patch address this in the obvious way, but then I
> encountered these lines:
> 
> > +   DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) =
> > +   (DWORD(WINAPI *)(INT, PVOID,
> ULONG))GetProcAddress(ntdll, "NtSetSystemInformation");
> > +   if (!NtSetSystemInformation)
> > +   return error("Can't get function addresses, wrong Windows
> > +version?");
> 
> It turns out that we have seen this plenty of times in Git for Windows'
> fork, so much so that we came up with a nice helper to make this all a bit
> more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and
> INIT_PROC_ADDR helpers in compat/win32/lazyload.h.
> 
> Maybe this would be the perfect excuse to integrate this patch into upstream
> Git? 

This patch is pretty hefty already.  How about you push this capability
upstream and I take advantage of it in a later patch. :)

This would be the patch (you can also cherry-pick it from
> 25c4dc3a73352e72e995594cf1b4afa46e93d040 in
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.
> com%2Fdscho%2Fgit&data=02%7C01%7CBen.Peart%40microsoft.com%7C96
> 4027bdc1f34a033c1f08d4fea1056e%7C72f988bf86f141af91ab2d7cd011db47
> %7C1%7C0%7C636413414914282865&sdata=jyvu6G7myRY9UA1XxWx2tDZ%
> 2BWsIWqLTRMT8WfzEGe5g%3D&reserved=0):
> 
> -- snip --
> From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00
> 2001
> From: Johannes Schindelin 
> Date: Tue, 10 Jan 2017 23:14:20 +0100
> Subject: [PATCH] Win32: simplify loading of DLL functions
> 
> Dynamic loading of DLL functions is duplicated in several places in Git for
> Windows' source code.
> 
> This patch adds a pair of macros to simplify the process: the
> DECLARE_PROC_ADDR(, , ,
> ..) macro to be used at the beginning of a
> code block, and the INIT_PROC_ADDR() macro to call before
> using the declared function. The return value of the INIT_PROC_ADDR() call
> has to be checked; If it is NULL, the function was not found in the specified
> DLL.
> 
> Example:
> 
> DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
>   LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
> 
> if (!INIT_PROC_ADDR(CreateHardLinkW))
> return error("Could not find CreateHardLinkW() function";
> 
>   if (!CreateHardLinkW(source, target, NULL))
>   return error("could not create hardlink from %S to %S",
>source, target);
>   return 0;
> 
> Signed-off-by: Karsten Blees 
> Signed-off-by: Johannes Schindelin 
> ---
>  compat/win32/lazyload.h | 44
> 
>  1 file changed, 44 insertions(+)
>  create mode 100644 compat/win32/lazyload.h
> 
> diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file
> mode 100644 index 000..91c10dad2fb
> --- /dev/null
> +++ b/compat/win32/lazyload.h
> @@ -0,0 +1,44 @@
> +#ifndef LAZYLOAD_H
> +#define LAZYLOAD_H
> +
> +/* simplify loading of DLL functions */
> +
> +struct proc_addr {
> + const char *const dll;
> + const char *const function;
> + FARPROC pfunction;
> + unsigned initialized : 1;
> +};
> +
> +/* Declares a function to be loaded dynamically from a DLL. */ #define
> +DECLARE_PROC_ADDR(dll, rettype, function, ...) \
> + static struct proc_addr proc_addr_##function = \
> + { #dll, #function, NULL, 0 }; \
> + static rettype (WINAPI *function)(__VA_ARGS__)
> +
> +/*
> + * Loads a function from a DLL (once-only).
> + * Returns non-NULL function pointer on success.
> + * Returns

Re: [RFC PATCH 0/2] Add named reference to latest push cert

On Mon, Sep 18, 2017 at 7:22 AM, Santiago Torres  wrote:
> Hello, everyone.
>
> Sorry for being late in this thread, I was making sure I didn't say
> anything outrageously wrong.
>
>> That's Stefan; I wouldn't have suggested any approach that uses the
>> blob whose sole purpose is to serve as a temporary storage area to
>> pass the information to the hook as part of the permanent record.
>>

I put out a vague design that seemed to have more advantages
in my mind at the time of writing. :)

>
> Hmm, as far as I understand *this* is the status quo. We get an
> ephemeral sha1/oid only if you have a hook attached. Otherwise, you will
> never see the object at all, even if you have --signed set.
>
> I suggested preparing this RFC/Patch because of the following reasons
> (please let me know if my understanding of any of this is wrong...):
>
> - I find it weird that the cli allows for a --signed push and
>   nowhere in the receive-pack's feedback you're told if the push
>   certificate is compute/stored/handled at all. I think that, at the
>   latest, receive pack should let the user know whether the signed
>   push succeeded, or if there is no hook attached to handle it.

How would a user benefit from it?
(Are there cases where the organisation wants the user to not know
deliberately what happened to their push cert? Do we care about these
cases?)

> - *if there is a hook* the blob is computed, but it is up to the
>   hook itself to store it *somewhere*. This makes me feel like it's
>   somewhat of a useless waste of computation if the hook is not
>   meant to handle it anyway(which is just a post-receive hook). I
>   find it rather weird that --signed is a builtin flag, and is
>   handled on the server side only partially (just my two cents).

I agree, but many features in Git start out small and only partially.

> - To me, the way push certificates are handled now feels like having
>   git commit -S producing a detached signature that you have to
>   embed somehow in the resulting commit object. (As a matter of
>   fact, many points on [1] seem to back this notion, and even recall
>   some drawbacks on push certificates the way they are handled
>   today)
>
> I understand the concurrency concerns, so I agree with stefan's
> solution, although I don't know how big of a deal it would be, if git
> already supports --atomic pushes (admittedly, I haven't checked if there
> are any guards in place for someone who pushes millions of refs
> atomically). It'd be completely understandable to have a setting to
> disable hadnling of --signed pushes and, ideally, a way to warn the user
> of this feature being disabled if --signed is sent.

That makes sense.

Re: [PATCH] pack: make packed_git_mru global a value instead of a pointer

On Sun, Sep 17, 2017 at 1:44 PM, Christian Couder
 wrote:
> On Sun, Sep 17, 2017 at 10:42 PM, Christian Couder
>  wrote:
>> On Sun, Sep 17, 2017 at 10:17 PM, phionah bugosi  wrote:
>>> Signed-off-by: phionah bugosi 
>>> ---
>>>  builtin/pack-objects.c |  5 +++--
>>>  cache.h|  7 ---
>>>  list.h |  6 ++
>>>  packfile.c | 12 ++--
>>>  4 files changed, 15 insertions(+), 15 deletions(-)
>>>
>>> This patch makes packed_git_mru global a value instead of a pointer and
>>> makes use of list.h
>>
>> Please explain _why_ you are doing that (why is the resulting code better).
>
> Also the explanations should be in the commit message, that is above
> your "Signed-off-by: ..." line.

A similar patch exists at origin/jn/per-repo-object-store-fixes^^
(I haven't checked if there are differences and if so which
patch is better, all I am noting here, is that there has been
work very similar to this one a couple days prior)

[PATCH v2] Improve performance of git status --ignored

2017-09-18 Thread Jameson Miller

Improve the performance of the directory listing logic when it wants to list
non-empty ignored directories. In order to show non-empty ignored directories,
the existing logic will recursively iterate through all contents of an ignored
directory. This change introduces the optimization to stop iterating through
the contents once it finds the first file. This can have a significant
improvement in 'git status --ignored' performance in repositories with a large
number of files in ignored directories.

For an example of the performance difference on an example repository with
196,000 files in 400 ignored directories:

| Command|  Time (s) |
| -- | - |
| git status |   1.2 |
| git status --ignored (old) |   3.9 |
| git status --ignored (new) |   1.4 |

Signed-off-by: Jameson Miller 
---
 dir.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 1c55dc3..1d17b80 100644
--- a/dir.c
+++ b/dir.c
@@ -49,7 +49,7 @@ struct cached_dir {
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
struct index_state *istate, const char *path, int len,
struct untracked_cache_dir *untracked,
-   int check_only, const struct pathspec *pathspec);
+   int check_only, int stop_at_first_file, const struct pathspec 
*pathspec);
 static int get_dtype(struct dirent *de, struct index_state *istate,
 const char *path, int len);
 
@@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct 
dir_struct *dir,
 
untracked = lookup_untracked(dir->untracked, untracked,
 dirname + baselen, len - baselen);
+
+   /*
+* If this is an excluded directory, then we only need to check if
+* the directory contains any files.
+*/
return read_directory_recursive(dir, istate, dirname, len,
-   untracked, 1, pathspec);
+   untracked, 1, exclude, pathspec);
 }
 
 /*
@@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct 
dir_struct *dir,
 * with check_only set.
 */
return read_directory_recursive(dir, istate, path->buf, 
path->len,
-   cdir->ucd, 1, pathspec);
+   cdir->ucd, 1, 0, pathspec);
/*
 * We get path_recurse in the first run when
 * directory_exists_in_index() returns index_nonexistent. We
@@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir)
  * Also, we ignore the name ".git" (even if it is not a directory).
  * That likely will not change.
  *
+ * If 'stop_at_first_file' is specified, 'path_excluded' is returned
+ * to signal that a file was found. This is the least significant value that
+ * indicates that a file was encountered that does not depend on the order of
+ * whether an untracked or exluded path was encountered first.
+ *
  * Returns the most significant path_treatment value encountered in the scan.
+ * If 'stop_at_first_file' is specified, `path_excluded` is the most
+ * significant path_treatment value that will be returned.
  */
+
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
struct index_state *istate, const char *base, int baselen,
struct untracked_cache_dir *untracked, int check_only,
-   const struct pathspec *pathspec)
+   int stop_at_first_file, const struct pathspec *pathspec)
 {
struct cached_dir cdir;
enum path_treatment state, subdir_state, dir_state = path_none;
@@ -1832,12 +1845,34 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
subdir_state =
read_directory_recursive(dir, istate, path.buf,
 path.len, ud,
-check_only, pathspec);
+check_only, 
stop_at_first_file, pathspec);
if (subdir_state > dir_state)
dir_state = subdir_state;
}
 
if (check_only) {
+   if (stop_at_first_file) {
+   /*
+* If stopping at first file, then
+* signal that a file was found by
+* returning `path_excluded`. This is
+* to return a consistent value
+* regardless of whether an ignored or
+* excluded file happened to be
+* encountered 1st.
+*
+

[PATCH v2] Improve performance of git status --ignored

2017-09-18 Thread Jameson Miller

This is the second version of my patches to improve handling of
ignored files

I have decided to break the original patch series into two parts:

 1) Perf improvements to handling ignored directories

 2) Expose extra options to control which ignored files are displayed
 by git status.

This patch will address #1, and I will follow up with another patch
for #2.

This patch improves the performance of 'git status --ignored' by
cutting out unnecessary work when determining if an ignored directory
is empty or not. The current logic will recursively enumerate through
all contents of an ignored directory to determine whether it is empty
or not. The new logic will return after the 1st file is
encountered. This can result in a significant speedup in work dirs
with a lot of files in ignored directories.

As an example of the performance improvement, here is a representative
example of a repository with ~190,000 files in ~400 ignored
directories:

| Command| Time (s) |
| -- |  |
| git status |1.2   |
| git status --ignored (old) |3.9   |
| git status --ignored (new) |1.4   |

Jameson Miller (1):
  Improve performance of git status --ignored

 dir.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

-- 
2.7.4

[OUTREACHY] pack: make packed_git_mru global a value instead of a pointer

2017-09-18 Thread phionah bugosi

Just to reecho a previous change requested before in one of the mail threads, 
we currently have
two global variables declared:

struct mru packed_git_mru_storage;
struct mru *packed_git_mru = &packed_git_mru_storage;

We normally use pointers in C to point or refer to the same location or space 
in memory from multiple places. That
means in situations where we will have the pointer be assigned to in many 
places in our code. But in this case, we do not
have any other logic refering or assigning to the pointer packed_git_mru. In 
simple words we actually dont 
need a pointer in this case.

Therefore this patch makes packed_git_mru global a value instead of a pointer 
and
makes use of list.h


Signed-off-by: phionah bugosi 
---
 builtin/pack-objects.c |  5 +++--
 cache.h|  7 ---
 list.h |  6 ++
 packfile.c | 12 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a57b4f0..189123f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "list.h"
 #include "config.h"
 #include "attr.h"
 #include "object.h"
@@ -1012,7 +1013,7 @@ static int want_object_in_pack(const unsigned char *sha1,
return want;
}
 
-   for (entry = packed_git_mru->head; entry; entry = entry->next) {
+   for (entry = packed_git_mru.head; entry; entry = entry->next) {
struct packed_git *p = entry->item;
off_t offset;
 
@@ -1030,7 +1031,7 @@ static int want_object_in_pack(const unsigned char *sha1,
}
want = want_found_object(exclude, p);
if (!exclude && want > 0)
-   mru_mark(packed_git_mru, entry);
+   mru_mark(&packed_git_mru, entry);
if (want != -1)
return want;
}
diff --git a/cache.h b/cache.h
index a916bc7..c8d7086 100644
--- a/cache.h
+++ b/cache.h
@@ -1585,13 +1585,6 @@ extern struct packed_git {
char pack_name[FLEX_ARRAY]; /* more */
 } *packed_git;
 
-/*
- * A most-recently-used ordered version of the packed_git list, which can
- * be iterated instead of packed_git (and marked via mru_mark).
- */
-struct mru;
-extern struct mru *packed_git_mru;
-
 struct pack_entry {
off_t offset;
unsigned char sha1[20];
diff --git a/list.h b/list.h
index a226a87..525703b 100644
--- a/list.h
+++ b/list.h
@@ -26,6 +26,12 @@
 #define LIST_H 1
 
 /*
+ * A most-recently-used ordered version of the packed_git list, which can
+ * be iterated instead of packed_git (and marked via mru_mark).
+ */
+extern struct mru packed_git_mru
+
+/*
  * The definitions of this file are adopted from those which can be
  * found in the Linux kernel headers to enable people familiar with the
  * latter find their way in these sources as well.
diff --git a/packfile.c b/packfile.c
index f86fa05..61a61aa 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "list.h"
 #include "mru.h"
 #include "pack.h"
 #include "dir.h"
@@ -41,8 +42,7 @@ static size_t peak_pack_mapped;
 static size_t pack_mapped;
 struct packed_git *packed_git;
 
-static struct mru packed_git_mru_storage;
-struct mru *packed_git_mru = &packed_git_mru_storage;
+struct mru packed_git_mru;
 
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
@@ -861,9 +861,9 @@ static void prepare_packed_git_mru(void)
 {
struct packed_git *p;
 
-   mru_clear(packed_git_mru);
+   mru_clear(&packed_git_mru);
for (p = packed_git; p; p = p->next)
-   mru_append(packed_git_mru, p);
+   mru_append(&packed_git_mru, p);
 }
 
 static int prepare_packed_git_run_once = 0;
@@ -1832,9 +1832,9 @@ int find_pack_entry(const unsigned char *sha1, struct 
pack_entry *e)
if (!packed_git)
return 0;
 
-   for (p = packed_git_mru->head; p; p = p->next) {
+   for (p = packed_git_mru.head; p; p = p->next) {
if (fill_pack_entry(sha1, e, p->item)) {
-   mru_mark(packed_git_mru, p);
+   mru_mark(&packed_git_mru, p);
return 1;
}
}
-- 
2.7.4

[PATCH] rev-parse: rev-parse: add --is-shallow-repository

2017-09-18 Thread Øystein Walle

Running `git fetch --unshallow` on a repo that is not in fact shallow
produces a fatal error message. Add a helper to rev-parse that scripters
can use to determine whether a repo is shallow or not.

Signed-off-by: Øystein Walle 
---
 Documentation/git-rev-parse.txt |  3 +++
 builtin/rev-parse.c |  5 +
 t/t1500-rev-parse.sh| 15 +++
 3 files changed, 23 insertions(+)

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index b1293f24b..0917b8207 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -235,6 +235,9 @@ print a message to stderr and exit with nonzero status.
 --is-bare-repository::
When the repository is bare print "true", otherwise "false".
 
+--is-shallow-repository::
+   When the repository is shallow print "true", otherwise "false".
+
 --resolve-git-dir ::
Check if  is a valid repository or a gitfile that
points at a valid repository, and print the location of the
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 2bd28d3c0..c923207f2 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
: "false");
continue;
}
+   if (!strcmp(arg, "--is-shallow-repository")) {
+   printf("%s\n", is_repository_shallow() ? "true"
+   : "false");
+   continue;
+   }
if (!strcmp(arg, "--shared-index-path")) {
if (read_cache() < 0)
die(_("Could not read the index"));
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index 03d3c7f6d..9d3433a30 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' '
test_cmp expect actual
 '
 
+test_expect_success 'git-path shallow repository' '
+   test_commit test_commit &&
+   echo true >expect &&
+   git clone --depth 1 --no-local . shallow &&
+   test_when_finished "rm -rf shallow" &&
+   git -C shallow rev-parse --is-shallow-repository >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'git-path notshallow repository' '
+   echo false >expect &&
+   git rev-parse --is-shallow-repository >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'showing the superproject correctly' '
git rev-parse --show-superproject-working-tree >out &&
test_must_be_empty out &&
-- 
2.11.0.485.g4e59582

Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms

On 09/13, Stefan Beller wrote:
> On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williams  wrote:
> > Create protocol.{c,h} and provide functions which future servers and
> > clients can use to determine which protocol to use or is being used.
> >
> > Also introduce the 'GIT_PROTOCOL' environment variable which will be
> > used to communicate a colon separated list of keys with optional values
> > to a server.  Unknown keys and values must be tolerated.  This mechanism
> > is used to communicate which version of the wire protocol a client would
> > like to use with a server.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  Documentation/config.txt | 16 +++
> >  Documentation/git.txt|  5 
> >  Makefile |  1 +
> >  cache.h  |  7 +
> >  protocol.c   | 72 
> > 
> >  protocol.h   | 15 ++
> >  6 files changed, 116 insertions(+)
> >  create mode 100644 protocol.c
> >  create mode 100644 protocol.h
> >
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index dc4e3f58a..d5b28a32c 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -2517,6 +2517,22 @@ The protocol names currently used by git are:
> >  `hg` to allow the `git-remote-hg` helper)
> >  --
> >
> > +protocol.version::
> 
> It would be cool to set a set of versions that are good. (I am not sure
> if that can be deferred to a later patch.)
> 
>   Consider we'd have versions 0,1,2,3,4 in the future:
>   In an ideal world the client and server would talk using v4
>   as it is the most advanced protocol, right?
>   Maybe a security/performance issue is found on the server side
>   with say protocol v3. Still no big deal as we are speaking v4.
>   But then consider an issue is found on the client side with v4.
>   Then the client would happily talk 0..3 while the server would
>   love to talk using 0,1,2,4.
> 
> The way I think about protocol version negotiation is that
> both parties involved have a set of versions that they tolerate
> to talk (they might understand more than the tolerated set, but the
> user forbade some), and the goal of the negotiation is to find
> the highest version number that is part of both the server set
> and client set. So quite naturally with this line of thinking the
> configuration is to configure a set of versions, which is what
> I propose here. Maybe even in the wire format, separated
> with colons?

I'm sure it wouldn't take too much to change this to be a multi-valued
config.  Because after this series there is just v0 and v1 I didn't
think through this case too much.  If others agree then I can go ahead
and make it so in a reroll.

> 
> > +   If set, clients will attempt to communicate with a server using
> > +   the specified protocol version.  If unset, no attempt will be
> > +   made by the client to communicate using a particular protocol
> > +   version, this results in protocol version 0 being used.
> 
> This sounds as if we're going to be really shy at first and only
> users that care will try out new versions at their own risk.
> From a users POV this may be frustrating as I would imagine that
> people want to run
> 
>   git config --global protocol.version 2
> 
> to try it out and then realize that some of their hosts do not speak
> 2, so they have to actually configure it per repo/remote.

The point would be to be able to set this globally, not per-repo.  Even
if a repo doesn't speak v2 then it should be able to gracefully degrade
to v1 without the user having to do anything.  The reason why there is
this escape hatch is if doing the protocol negotiation out of band
causing issues with communicating with a server that it can be shut off.


> > +   Supported versions:
> 
> > +* `0` - the original wire protocol.
> 
> In the future this may be misleading as it doesn't specify the date of
> when it was original. e.g. are capabilities already supported in "original"?
> 
> Maybe phrase it as "wire protocol as of v2.14" ? (Though this sounds
> as if new capabilities added in the future are not allowed)

Yeah I can see how this could be misleading, though I'm not sure how
best to word it to avoid that.

> 
> 
> > +
> > +extern enum protocol_version parse_protocol_version(const char *value);
> > +extern enum protocol_version get_protocol_version_config(void);
> > +extern enum protocol_version determine_protocol_version_server(void);
> > +extern enum protocol_version determine_protocol_version_client(const char 
> > *server_response);
> 
> Here is a good place to have some comments.

-- 
Brandon Williams

Re: [PATCH 3/8] daemon: recognize hidden request arguments

On 09/13, Stefan Beller wrote:
> On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williams  wrote:
> > A normal request to git-daemon is structured as
> > "command path/to/repo\0host=..\0" and due to a bug in an old version of
> > git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the
> > command, 2009-06-04) we aren't able to place any extra args (separated
> > by NULs) besides the host.
> >
> > In order to get around this limitation teach git-daemon to recognize
> > additional request arguments hidden behind a second NUL byte.  Requests
> > can then be structured like:
> > "command path/to/repo\0host=..\0\0version=1\0key=value\0".  git-daemon
> > can then parse out the extra arguments and set 'GIT_PROTOCOL'
> > accordingly.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  daemon.c | 71 
> > +++-
> >  1 file changed, 61 insertions(+), 10 deletions(-)
> >
> > diff --git a/daemon.c b/daemon.c
> > index 30747075f..250dbf82c 100644
> > --- a/daemon.c
> > +++ b/daemon.c
> > @@ -282,7 +282,7 @@ static const char *path_ok(const char *directory, 
> > struct hostinfo *hi)
> > return NULL;/* Fallthrough. Deny by default */
> >  }
> >
> > -typedef int (*daemon_service_fn)(void);
> > +typedef int (*daemon_service_fn)(const struct argv_array *env);
> >  struct daemon_service {
> > const char *name;
> > const char *config_name;
> > @@ -363,7 +363,7 @@ static int run_access_hook(struct daemon_service 
> > *service, const char *dir,
> >  }
> >
> >  static int run_service(const char *dir, struct daemon_service *service,
> > -  struct hostinfo *hi)
> > +  struct hostinfo *hi, const struct argv_array *env)
> >  {
> > const char *path;
> > int enabled = service->enabled;
> > @@ -422,7 +422,7 @@ static int run_service(const char *dir, struct 
> > daemon_service *service,
> >  */
> > signal(SIGTERM, SIG_IGN);
> >
> > -   return service->fn();
> > +   return service->fn(env);
> >  }
> >
> >  static void copy_to_log(int fd)
> > @@ -462,25 +462,34 @@ static int run_service_command(struct child_process 
> > *cld)
> > return finish_command(cld);
> >  }
> >
> > -static int upload_pack(void)
> > +static int upload_pack(const struct argv_array *env)
> >  {
> > struct child_process cld = CHILD_PROCESS_INIT;
> > argv_array_pushl(&cld.args, "upload-pack", "--strict", NULL);
> > argv_array_pushf(&cld.args, "--timeout=%u", timeout);
> > +
> > +   argv_array_pushv(&cld.env_array, env->argv);
> > +
> > return run_service_command(&cld);
> >  }
> >
> > -static int upload_archive(void)
> > +static int upload_archive(const struct argv_array *env)
> >  {
> > struct child_process cld = CHILD_PROCESS_INIT;
> > argv_array_push(&cld.args, "upload-archive");
> > +
> > +   argv_array_pushv(&cld.env_array, env->argv);
> > +
> > return run_service_command(&cld);
> >  }
> >
> > -static int receive_pack(void)
> > +static int receive_pack(const struct argv_array *env)
> >  {
> > struct child_process cld = CHILD_PROCESS_INIT;
> > argv_array_push(&cld.args, "receive-pack");
> > +
> > +   argv_array_pushv(&cld.env_array, env->argv);
> > +
> > return run_service_command(&cld);
> >  }
> >
> > @@ -574,7 +583,7 @@ static void canonicalize_client(struct strbuf *out, 
> > const char *in)
> >  /*
> >   * Read the host as supplied by the client connection.
> >   */
> > -static void parse_host_arg(struct hostinfo *hi, char *extra_args, int 
> > buflen)
> > +static char *parse_host_arg(struct hostinfo *hi, char *extra_args, int 
> > buflen)
> >  {
> > char *val;
> > int vallen;
> > @@ -602,6 +611,39 @@ static void parse_host_arg(struct hostinfo *hi, char 
> > *extra_args, int buflen)
> > if (extra_args < end && *extra_args)
> > die("Invalid request");
> > }
> > +
> > +   return extra_args;
> > +}
> > +
> > +static void parse_extra_args(struct argv_array *env, const char 
> > *extra_args,
> > +int buflen)
> > +{
> > +   const char *end = extra_args + buflen;
> > +   struct strbuf git_protocol = STRBUF_INIT;
> > +
> > +   for (; extra_args < end; extra_args += strlen(extra_args) + 1) {
> > +   const char *arg = extra_args;
> > +
> > +   /*
> > +* Parse the extra arguments, adding most to 'git_protocol'
> > +* which will be used to set the 'GIT_PROTOCOL' envvar in 
> > the
> > +* service that will be run.
> > +*
> > +* If there ends up being a particular arg in the future 
> > that
> > +* git-daemon needs to parse specificly (like the 'host' 
> > arg)
> > +* then it can be parsed here and not added to 
> > 'git_protocol'.
> > +*/
>

Re: [PATCH 2/4] push, fetch: error out for submodule entries not pointing to commits

On 09/12, Jonathan Nieder wrote:
> From: Stefan Beller 
> 
> The check_has_commit helper uses resolves a submodule entry to a
> commit, when validating its existence. As a side effect this means
> tolerates a submodule entry pointing to a tag, which is not a valid
> submodule entry that git commands would know how to cope with.
> 
> Tighten the check to require an actual commit, not a tag pointing to a
> commit.
> 
> Also improve the error handling when a submodule entry points to
> non-commit (e.g., a blob) to error out instead of warning and
> pretending the pointed to object doesn't exist.
> 
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 

Looks good!

> ---
>  submodule.c| 33 +
>  t/t5531-deep-submodule-push.sh | 10 ++
>  2 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/submodule.c b/submodule.c
> index 3cea8221e0..e0da55920d 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -767,19 +767,36 @@ static int append_oid_to_argv(const struct object_id 
> *oid, void *data)
>   return 0;
>  }
>  
> +struct has_commit_data {
> + int result;
> + const char *path;
> +};
> +
>  static int check_has_commit(const struct object_id *oid, void *data)
>  {
> - int *has_commit = data;
> + struct has_commit_data *cb = data;
>  
> - if (!lookup_commit_reference(oid))
> - *has_commit = 0;
> + enum object_type type = sha1_object_info(oid->hash, NULL);
>  
> - return 0;
> + switch (type) {
> + case OBJ_COMMIT:
> + return 0;
> + case OBJ_BAD:
> + /*
> +  * Object is missing or invalid. If invalid, an error message
> +  * has already been printed.
> +  */
> + cb->result = 0;
> + return 0;
> + default:
> + die(_("submodule entry '%s' (%s) is a %s, not a commit"),
> + cb->path, oid_to_hex(oid), typename(type));
> + }
>  }
>  
>  static int submodule_has_commits(const char *path, struct oid_array *commits)
>  {
> - int has_commit = 1;
> + struct has_commit_data has_commit = { 1, path };
>  
>   /*
>* Perform a cheap, but incorrect check for the existence of 'commits'.
> @@ -795,7 +812,7 @@ static int submodule_has_commits(const char *path, struct 
> oid_array *commits)
>  
>   oid_array_for_each_unique(commits, check_has_commit, &has_commit);
>  
> - if (has_commit) {
> + if (has_commit.result) {
>   /*
>* Even if the submodule is checked out and the commit is
>* present, make sure it exists in the submodule's object store
> @@ -814,12 +831,12 @@ static int submodule_has_commits(const char *path, 
> struct oid_array *commits)
>   cp.dir = path;
>  
>   if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) || out.len)
> - has_commit = 0;
> + has_commit.result = 0;
>  
>   strbuf_release(&out);
>   }
>  
> - return has_commit;
> + return has_commit.result;
>  }
>  
>  static int submodule_needs_pushing(const char *path, struct oid_array 
> *commits)
> diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
> index 0f84a53146..39cb2c1c34 100755
> --- a/t/t5531-deep-submodule-push.sh
> +++ b/t/t5531-deep-submodule-push.sh
> @@ -298,6 +298,16 @@ test_expect_success 'push succeeds if submodule commit 
> disabling recursion from
>   )
>  '
>  
> +test_expect_success 'submodule entry pointing at a tag is error' '
> + git -C work/gar/bage tag -a test1 -m "tag" &&
> + tag=$(git -C work/gar/bage rev-parse test1^{tag}) &&
> + git -C work update-index --cacheinfo 16 "$tag" gar/bage &&
> + git -C work commit -m "bad commit" &&
> + test_when_finished "git -C work reset --hard HEAD^" &&
> + test_must_fail git -C work push --recurse-submodules=on-demand 
> ../pub.git master 2>err &&
> + test_i18ngrep "is a tag, not a commit" err
> +'
> +
>  test_expect_success 'push fails if recurse submodules option passed as yes' '
>   (
>   cd work/gar/bage &&
> -- 
> 2.14.1.690.gbb1197296e
> 

-- 
Brandon Williams

Re: [RFC PATCH v2 1/2] implement fetching of moved submodules

On 09/15, Heiko Voigt wrote:
> We store the changed submodules paths to calculate which submodule needs
> fetching. This does not work for moved submodules since their paths do
> not stay the same in case of a moved submodules. In case of new
> submodules we do not have a path in the current checkout, since they
> just appeared in this fetch.
> 
> It is more general to collect the submodule names for changes instead of
> their paths to include the above cases.
> 
> With the change described above we implement 'on-demand' fetching of
> changes in moved submodules.
> 
> Note: This does only work when repositories have a .gitmodules file. In
> other words: It breaks if we do not get a name for a repository.
> IIRC, consensus was that this is a requirement to get nice submodule
> handling these days?
> 
> NEEDSWORK: This breaks t5531 and t5545 because they do not use a
> .gitmodules file. I will add a fallback to paths to help such users.
> 
> Signed-off-by: Heiko Voigt 
> ---
> This an update of the previous series[1] to the current master. The
> fallback is still missing but now it should not conflict with any topics
> in flight anymore (hopefully).

So the idea is to collect changed submodule's name, instead of their
path, so that if they happened to moved you don't have to worry about
the path changing underneath you.  This should be good once those tests
get fixed.

Thanks for working on cleaning this up! :)

> 
> Cheers Heiko
> 
> [1] https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/
> 
>  submodule.c | 91 
> +
>  t/t5526-fetch-submodules.sh | 35 +
>  2 files changed, 85 insertions(+), 41 deletions(-)
> 
> diff --git a/submodule.c b/submodule.c
> index 3cea8221e0..38b9905e43 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -21,7 +21,7 @@
>  #include "parse-options.h"
>  
>  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
> -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
> +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
>  static int initialized_fetch_ref_tips;
>  static struct oid_array ref_tips_before_fetch;
>  static struct oid_array ref_tips_after_fetch;
> @@ -667,11 +667,11 @@ const struct submodule *submodule_from_ce(const struct 
> cache_entry *ce)
>  }
>  
>  static struct oid_array *submodule_commits(struct string_list *submodules,
> -const char *path)
> +const char *name)
>  {
>   struct string_list_item *item;
>  
> - item = string_list_insert(submodules, path);
> + item = string_list_insert(submodules, name);
>   if (item->util)
>   return (struct oid_array *) item->util;
>  
> @@ -680,39 +680,34 @@ static struct oid_array *submodule_commits(struct 
> string_list *submodules,
>   return (struct oid_array *) item->util;
>  }
>  
> +struct collect_changed_submodules_cb_data {
> + struct string_list *changed;
> + const struct object_id *commit_oid;
> +};
> +
>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
> struct diff_options *options,
> void *data)
>  {
> + struct collect_changed_submodules_cb_data *me = data;
> + struct string_list *changed = me->changed;
> + const struct object_id *commit_oid = me->commit_oid;
>   int i;
> - struct string_list *changed = data;
>  
>   for (i = 0; i < q->nr; i++) {
>   struct diff_filepair *p = q->queue[i];
>   struct oid_array *commits;
> + const struct submodule *submodule;
> +
>   if (!S_ISGITLINK(p->two->mode))
>   continue;
>  
> - if (S_ISGITLINK(p->one->mode)) {
> - /*
> -  * NEEDSWORK: We should honor the name configured in
> -  * the .gitmodules file of the commit we are examining
> -  * here to be able to correctly follow submodules
> -  * being moved around.
> -  */
> - commits = submodule_commits(changed, p->two->path);
> - oid_array_append(commits, &p->two->oid);
> - } else {
> - /* Submodule is new or was moved here */
> - /*
> -  * NEEDSWORK: When the .git directories of submodules
> -  * live inside the superprojects .git directory some
> -  * day we should fetch new submodules directly into
> -  * that location too when config or options request
> -  * that so they can be checked out from there.
> -  */
> + submodule = submodule_from_path(commit_oid, p->two->path);
> + if (!submodule)
>

Re: [RFC PATCH v2 2/2] submodule: simplify decision tree whether to or not to fetch

On 09/15, Heiko Voigt wrote:
> To make extending this logic later easier.
> 
> Signed-off-by: Heiko Voigt 

I like the result of this patch, its much cleaner.

> ---
>  submodule.c | 74 
> ++---
>  1 file changed, 37 insertions(+), 37 deletions(-)
> 
> diff --git a/submodule.c b/submodule.c
> index 38b9905e43..fa44fc59f2 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1118,6 +1118,31 @@ struct submodule_parallel_fetch {
>  };
>  #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
>  
> +static int get_fetch_recurse_config(const struct submodule *submodule,
> + struct submodule_parallel_fetch *spf)
> +{
> + if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT)
> + return spf->command_line_option;
> +
> + if (submodule) {
> + char *key;
> + const char *value;
> +
> + int fetch_recurse = submodule->fetch_recurse;
> + key = xstrfmt("submodule.%s.fetchRecurseSubmodules", 
> submodule->name);
> + if (!repo_config_get_string_const(the_repository, key, &value)) 
> {
> + fetch_recurse = parse_fetch_recurse_submodules_arg(key, 
> value);
> + }
> + free(key);
> +
> + if (fetch_recurse != RECURSE_SUBMODULES_NONE)
> + /* local config overrules everything except commandline 
> */
> + return fetch_recurse;
> + }
> +
> + return spf->default_option;
> +}
> +
>  static int get_next_submodule(struct child_process *cp,
> struct strbuf *err, void *data, void **task_cb)
>  {
> @@ -1137,46 +1162,21 @@ static int get_next_submodule(struct child_process 
> *cp,
>  
>   submodule = submodule_from_path(&null_oid, ce->name);
>  
> - default_argv = "yes";
> - if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
> - int fetch_recurse = RECURSE_SUBMODULES_NONE;
> -
> - if (submodule) {
> - char *key;
> - const char *value;
> -
> - fetch_recurse = submodule->fetch_recurse;
> - key = 
> xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
> - if 
> (!repo_config_get_string_const(the_repository, key, &value)) {
> - fetch_recurse = 
> parse_fetch_recurse_submodules_arg(key, value);
> - }
> - free(key);
> - }
> -
> - if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
> - if (fetch_recurse == RECURSE_SUBMODULES_OFF)
> - continue;
> - if (fetch_recurse == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> - if 
> (!unsorted_string_list_lookup(&changed_submodule_names,
> -  
> submodule->name))
> - continue;
> - default_argv = "on-demand";
> - }
> - } else {
> - if (spf->default_option == 
> RECURSE_SUBMODULES_OFF)
> - continue;
> - if (spf->default_option == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> - if 
> (!unsorted_string_list_lookup(&changed_submodule_names,
> -   
> submodule->name))
> - continue;
> - default_argv = "on-demand";
> - }
> - }
> - } else if (spf->command_line_option == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> - if 
> (!unsorted_string_list_lookup(&changed_submodule_names,
> + switch (get_fetch_recurse_config(submodule, spf))
> + {
> + default:
> + case RECURSE_SUBMODULES_DEFAULT:
> + case RECURSE_SUBMODULES_ON_DEMAND:
> + if (!submodule || 
> !unsorted_string_list_lookup(&changed_submodule_names,
>submodule->name))
>   continue;
>   default_argv = "on-demand";
> + break;
> + case RECURSE_SUBMODULES_ON:
> + default_argv = "yes";
> + break;
> + case RECURSE_SUBMODULES_OFF:
> + continue;
>   }
>  
>   strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name);
> -- 
> 2.14.1.145.gb3622a4
> 

-- 
Bra

Re: [PATCH] protocol: make parse_protocol_version() private

On 09/17, Ramsay Jones wrote:
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Brandon,
> 
> If you need to re-roll your 'bw/protocol-v1' branch, could you please
> squash this into the relevant patch (commit 45954f179e, "protocol:
> introduce protocol extention mechanisms", 13-09-2017).
> 
> This assumes you agree that this symbol does not need to be public; if
> not, then please just ignore! ;-)

Thanks!  I've updated my local version of the series to reflect this.

> 
> Thanks!
> 
> ATB,
> Ramsay Jones
> 
>  protocol.c | 2 +-
>  protocol.h | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/protocol.c b/protocol.c
> index 1b16c7b9a..369503065 100644
> --- a/protocol.c
> +++ b/protocol.c
> @@ -2,7 +2,7 @@
>  #include "config.h"
>  #include "protocol.h"
>  
> -enum protocol_version parse_protocol_version(const char *value)
> +static enum protocol_version parse_protocol_version(const char *value)
>  {
>   if (!strcmp(value, "0"))
>   return protocol_v0;
> diff --git a/protocol.h b/protocol.h
> index 2fa6486d0..18f9a5235 100644
> --- a/protocol.h
> +++ b/protocol.h
> @@ -7,7 +7,6 @@ enum protocol_version {
>   protocol_v1 = 1,
>  };
>  
> -extern enum protocol_version parse_protocol_version(const char *value);
>  extern enum protocol_version get_protocol_version_config(void);
>  extern enum protocol_version determine_protocol_version_server(void);
>  extern enum protocol_version determine_protocol_version_client(const char 
> *server_response);
> -- 
> 2.14.0

-- 
Brandon Williams

RE: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

> -Original Message-
> From: Torsten Bögershausen [mailto:tbo...@web.de]
> Sent: Monday, September 18, 2017 11:43 AM
> To: Ben Peart ; Junio C Hamano
> ; Ben Peart 
> Cc: david.tur...@twosigma.com; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index
> extension
> 
> On 2017-09-18 15:38, Ben Peart wrote:
> >
> >
> > On 9/17/2017 4:02 AM, Junio C Hamano wrote:
> >> Ben Peart  writes:
> >>
> >>> diff --git a/t/helper/test-dump-fsmonitor.c
> >>> b/t/helper/test-dump-fsmonitor.c new file mode 100644 index
> >>> 00..482d749bb9
> >>> --- /dev/null
> >>> +++ b/t/helper/test-dump-fsmonitor.c
> >>> @@ -0,0 +1,21 @@
> >>> +#include "cache.h"
> >>> +
> >>> +int cmd_main(int ac, const char **av) {
> >>> +    struct index_state *istate = &the_index;
> >>> +    int i;
> >>> +
> >>> +    setup_git_directory();
> >>> +    if (do_read_index(istate, get_index_file(), 0) < 0)
> >>> +    die("unable to read index file");
> >>> +    if (!istate->fsmonitor_last_update) {
> >>> +    printf("no fsmonitor\n");
> >>> +    return 0;
> >>> +    }
> >>> +    printf("fsmonitor last update %"PRIuMAX"\n",
> >>> istate->fsmonitor_last_update);
> >>
> >> After pushing this out and had Travis complain, I queued a squash on
> >> top of this to cast the argument to (uintmax_t), like you did in an
> >> earlier step (I think it was [PATCH 04/12]).
> >>
> >
> > Thanks. I'll update this to cast it as (uint64_t) as that is what
> > get/put_be64 use.  As far as I can tell they both map to the same
> > thing (unsigned long long) so there isn't functional difference.
> (Just to double-check): This is the way to print "PRIuMAX" correctly  (on all
> platforms):
> 
> printf("fsmonitor last update %"PRIuMAX"\n",  (uintmax_t)istate-
> >fsmonitor_last_update);
> 

Should I just make the variable type itself uintmax_t and then just skip
the cast altogether? I went with uint64_t because that is what
getnanotime returned.

[PATCH 2/2] read_info_alternates: warn on non-trivial errors

When we fail to open $GIT_DIR/info/alternates, we silently
assume there are no alternates. This is the right thing to
do for ENOENT, but not for other errors.

A hard error is probably overkill here. If we fail to read
an alternates file then either we'll complete our operation
anyway, or we'll fail to find some needed object. Either
way, a warning is good idea. And we already have a helper
function to handle this pattern; let's just call
warn_on_fopen_error().

Note that technically the errno from strbuf_read_file()
might be from a read() error, not open(). But since read()
would never return ENOENT or ENOTDIR, and since it produces
a generic "unable to access" error, it's suitable for
handling errors from either.

Signed-off-by: Jeff King 
---
I'm pretty comfortable with the rationale in the final paragraph. But if
we're concerned that warn_on_fopen_errors() might change, we can swap
out strbuf_read_file() for fopen()+strbuf_read().

 sha1_file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sha1_file.c b/sha1_file.c
index b1e4193679..9cec326298 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -432,6 +432,7 @@ static void read_info_alternates(const char * 
relative_base, int depth)
 
path = xstrfmt("%s/info/alternates", relative_base);
if (strbuf_read_file(&buf, path, 1024) < 0) {
+   warn_on_fopen_errors(path);
free(path);
return;
}
-- 
2.14.1.1014.g252e627ae0

[PATCH 1/2] read_info_alternates: read contents into strbuf

The link_alt_odb_entries() function has always taken a
ptr/len pair as input. Until cf3c635210 (alternates: accept
double-quoted paths, 2016-12-12), we made a copy of those
bytes in a string. But after that commit, we switched to
parsing the input left-to-right, and we ignore "len"
totally, instead reading until we hit a NUL.

This has mostly gone unnoticed for a few reasons:

  1. All but one caller passes a NUL-terminated string, with
 "len" pointing to the NUL.

  2. The remaining caller, read_info_alternates(), passes in
 an mmap'd file. Unless the file is an exact multiple of
 the page size, it will generally be followed by NUL
 padding to the end of the page, which just works.

The easiest way to demonstrate the problem is to build with:

  make SANITIZE=address NO_MMAP=Nope test

Any test which involves $GIT_DIR/info/alternates will fail,
as the mmap emulation (correctly) does not add an extra NUL,
and ASAN complains about reading past the end of the buffer.

One solution would be to teach link_alt_odb_entries() to
respect "len". But it's actually a bit tricky, since we
depend on unquote_c_style() under the hood, and it has no
ptr/len variant.

We could also just make a NUL-terminated copy of the input
bytes and operate on that. But since all but one caller
already is passing a string, instead let's just fix that
caller to provide NUL-terminated input in the first place,
by swapping out mmap for strbuf_read_file().

There's no advantage to using mmap on the alternates file.
It's not expected to be large (and anyway, we're copying its
contents into an in-memory linked list). Nor is using
git_open() buying us anything here, since we don't keep the
descriptor open for a long period of time.

Let's also drop the "len" parameter entirely from
link_alt_odb_entries(), since it's completely ignored. That
will avoid any new callers re-introducing a similar bug.

Reported-by: Michael Haggerty 
Signed-off-by: Jeff King 
---
I didn't reproduce with the mmap even-page-size thing, but I
think it's the same reason we didn't notice the "git log -G"
read-past-mmap issues for a long time. Which makes testing
with ASAN and NO_MMAP an interesting combination, as it
should find out any similar cases (and after this, the whole
test suite passes with that configuration).

 sha1_file.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5f71bbac3e..b1e4193679 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -398,7 +398,7 @@ static const char *parse_alt_odb_entry(const char *string,
return end;
 }
 
-static void link_alt_odb_entries(const char *alt, int len, int sep,
+static void link_alt_odb_entries(const char *alt, int sep,
 const char *relative_base, int depth)
 {
struct strbuf objdirbuf = STRBUF_INIT;
@@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int 
len, int sep,
 
 static void read_info_alternates(const char * relative_base, int depth)
 {
-   char *map;
-   size_t mapsz;
-   struct stat st;
char *path;
-   int fd;
+   struct strbuf buf = STRBUF_INIT;
 
path = xstrfmt("%s/info/alternates", relative_base);
-   fd = git_open(path);
-   free(path);
-   if (fd < 0)
-   return;
-   if (fstat(fd, &st) || (st.st_size == 0)) {
-   close(fd);
+   if (strbuf_read_file(&buf, path, 1024) < 0) {
+   free(path);
return;
}
-   mapsz = xsize_t(st.st_size);
-   map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0);
-   close(fd);
 
-   link_alt_odb_entries(map, mapsz, '\n', relative_base, depth);
-
-   munmap(map, mapsz);
+   link_alt_odb_entries(buf.buf, '\n', relative_base, depth);
+   strbuf_release(&buf);
 }
 
 struct alternate_object_database *alloc_alt_odb(const char *dir)
@@ -503,7 +492,7 @@ void add_to_alternates_file(const char *reference)
if (commit_lock_file(lock))
die_errno("unable to move new alternates file into 
place");
if (alt_odb_tail)
-   link_alt_odb_entries(reference, strlen(reference), 
'\n', NULL, 0);
+   link_alt_odb_entries(reference, '\n', NULL, 0);
}
free(alts);
 }
@@ -516,7 +505,7 @@ void add_to_alternates_memory(const char *reference)
 */
prepare_alt_odb();
 
-   link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
+   link_alt_odb_entries(reference, '\n', NULL, 0);
 }
 
 /*
@@ -619,7 +608,7 @@ void prepare_alt_odb(void)
if (!alt) alt = "";
 
alt_odb_tail = &alt_odb_list;
-   link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);
+   link_alt_odb_entries(alt, PATH_SEP, NULL, 0);
 
read_info_alternates(get_object_directory(), 0);
 }
-- 
2.14.1.1014.g252e627ae0

[PATCH 0/2] fix read past end of array in alternates files

This series fixes a regression in v2.11.1 where we might read past the
end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't
base the patch on there, for a few reasons:

  1. There's a trivial conflict when merging up (because of
 git_open_noatime() becoming just git_open() in the inerim).

  2. The reproduction advice relies on our SANITIZE Makefile knob, which
 didn't exist back then.

  3. The second patch does not apply there because we don't have
 warn_on_fopen_errors(). Though admittedly it could be applied
 separately after merging up; it's just a clean-up on top.

It does apply on the current "maint".

  [1/2]: read_info_alternates: read contents into strbuf
  [2/2]: read_info_alternates: warn on non-trivial errors

 sha1_file.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

-Peff

Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-18 Thread Torsten Bögershausen

On 2017-09-18 15:38, Ben Peart wrote:
> 
> 
> On 9/17/2017 4:02 AM, Junio C Hamano wrote:
>> Ben Peart  writes:
>>
>>> diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
>>> new file mode 100644
>>> index 00..482d749bb9
>>> --- /dev/null
>>> +++ b/t/helper/test-dump-fsmonitor.c
>>> @@ -0,0 +1,21 @@
>>> +#include "cache.h"
>>> +
>>> +int cmd_main(int ac, const char **av)
>>> +{
>>> +    struct index_state *istate = &the_index;
>>> +    int i;
>>> +
>>> +    setup_git_directory();
>>> +    if (do_read_index(istate, get_index_file(), 0) < 0)
>>> +    die("unable to read index file");
>>> +    if (!istate->fsmonitor_last_update) {
>>> +    printf("no fsmonitor\n");
>>> +    return 0;
>>> +    }
>>> +    printf("fsmonitor last update %"PRIuMAX"\n",
>>> istate->fsmonitor_last_update);
>>
>> After pushing this out and had Travis complain, I queued a squash on
>> top of this to cast the argument to (uintmax_t), like you did in an
>> earlier step (I think it was [PATCH 04/12]).
>>
> 
> Thanks. I'll update this to cast it as (uint64_t) as that is what get/put_be64
> use.  As far as I can tell they both map to the same thing (unsigned long 
> long)
> so there isn't functional difference.
(Just to double-check): This is the way to print "PRIuMAX" correctly
 (on all platforms):

printf("fsmonitor last update %"PRIuMAX"\n",
 (uintmax_t)istate->fsmonitor_last_update);

Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension




On 9/17/2017 12:47 AM, Junio C Hamano wrote:

Ben Peart  writes:


+write_integration_script() {
+   write_script .git/hooks/fsmonitor-test<<-\EOF
+   if [ "$#" -ne 2 ]; then
+   echo "$0: exactly 2 arguments expected"
+   exit 2
+   fi
+   if [ "$1" != 1 ]; then
+   echo -e "Unsupported core.fsmonitor hook version.\n" >&2
+   exit 1
+   fi


In addition to "echo -e" thing pointed out earlier, these look
somewhat unusual in our shell scripts, relative to what
Documentation/CodingGuidelines tells us to do:


I'm happy to make these changes.  I understand the difficulty of 
creating a consistent coding style especially after the fact.




Copy/paste is usually a developers best friend as it allows you to avoid 
a lot of errors by reusing existing tested code.  One of the times it 
backfires is when that code doesn't match the current desired coding style.


I only point these out to help lend some additional impetus to the 
effort to formalize the coding style and to provide tooling to handle 
what should mostly be a mechanical process. IMO, the goal should be to 
save the maintainer and contributors the cost of having to write up and 
respond to formatting feedback. :)


Some stats on these same coding style errors in the current bash scripts:

298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space)
140 instances of "if \[ .* \]" (not using the preferred "test")
293 instances of "if .*; then"

Wouldn't it be great not to have to write up style feedback for when 
these all get copy/pasted into new scripts? :)






  - We prefer a space between the function name and the parentheses,
and no space inside the parentheses. The opening "{" should also
be on the same line.

(incorrect)
my_function(){
...

(correct)
my_function () {
...

  - We prefer "test" over "[ ... ]".

  - Do not write control structures on a single line with semicolon.
"then" should be on the next line for if statements, and "do"
should be on the next line for "while" and "for".

(incorrect)
if test -f hello; then
do this
fi

(correct)
if test -f hello
then
do this
fi


diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman
new file mode 100755
index 00..aaee5d1fe3
--- /dev/null
+++ b/t/t7519/fsmonitor-watchman
@@ -0,0 +1,128 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use IPC::Open2;
+ ...
+   open (my $fh, ">", ".git/watchman-query.json");
+   print $fh "[\"query\", \"$git_work_tree\", { \
+   \"since\": $time, \
+   \"fields\": [\"name\"], \
+   \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", 
\"exists\"]]] \
+   }]";
+   close $fh;
+
+   print CHLD_IN "[\"query\", \"$git_work_tree\", { \
+   \"since\": $time, \
+   \"fields\": [\"name\"], \
+   \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", 
\"exists\"]]] \
+   }]";


This look painful to read, write and maintain.  IIRC, Perl supports
the <

I agree!  I'm definitely *not* a perl developer so was unaware of this 
construct.  A few minutes with stack overflow and now I can clean this up.



+}
\ No newline at end of file


Oops.

Thanks.

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-18 Thread Johannes Schindelin

Hi Junio,

On Sat, 16 Sep 2017, Junio C Hamano wrote:

> And as you alluded to, we may need to see if we can help making it
> easier to do the latter when needed.

That mistakes it for "Someone Else's Problem".

> Johannes Schindelin  writes:
> 
> > That's what you are buying into by having these "What's cooking" mails
> > that are in no usable way connected to the original threads.
> 
> For the above reason, I do not think this is a particularly useful
> stance to take.

I agree, but this is the process you chose to use.

> Do you have a concrete suggestion to make these individual entries more
> helpful for people who may want go back to the original thread in doing
> so?  In-reply-to: or References: fields of the "What's cooking" report
> would not help.  I often have the message IDs that made/helped me make
> these individual comments in the description; they alone would not react
> to mouse clicks, though.

Oh gawd, not even more stuff piled onto the mail format. Please stop.

> Having said that, I'd expect that individual contributors have past
> messages pertaining to the smaller numbers of their own topics in flight
> in their mailbox than the project wide "What's cooking" report covers,
> so perhaps an affort to devise such a mechanism may result in an
> over-engineering waste nobody finds useful.  I dunno.

I frequently find myself putting off that What's cooking mail because it
simply takes too long to study.

It probably tries to serve too many purposes at the same time, and thereby
none.

In the same vein as "to a hammer, everything looks like a nail": when it
comes to project management, a dedicated tool will always beat a
general-purpose mailing list.

Ciao,
Dscho

Re: [PATCH v6 12/12] fsmonitor: add a performance test

2017-09-18 Thread Johannes Schindelin

Hi Ben,

sorry for not catching this earlier:

On Fri, 15 Sep 2017, Ben Peart wrote:

> [...]
> +
> +int cmd_dropcaches(void)
> +{
> + HANDLE hProcess = GetCurrentProcess();
> + HANDLE hToken;
> + int status;
> +
> + if (!OpenProcessToken(hProcess, TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES, 
> &hToken))
> + return error("Can't open current process token");
> +
> + if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1))
> + return error("Can't get SeProfileSingleProcessPrivilege");
> +
> + CloseHandle(hToken);
> +
> + HMODULE ntdll = LoadLibrary("ntdll.dll");

Git's source code still tries to abide by C90, and for simplicity's sake,
this extends to the Windows-specific part. Therefore, the `ntdll` variable
needs to be declared at the beginning of the function (I do agree that it
makes for better code to reduce the scope of variables, but C90 simply did
not allow variables to be declared in the middle of functions).

I wanted to send a patch address this in the obvious way, but then I
encountered these lines:

> + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) =
> + (DWORD(WINAPI *)(INT, PVOID, ULONG))GetProcAddress(ntdll, 
> "NtSetSystemInformation");
> + if (!NtSetSystemInformation)
> + return error("Can't get function addresses, wrong Windows 
> version?");

It turns out that we have seen this plenty of times in Git for Windows'
fork, so much so that we came up with a nice helper to make this all a bit
more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and
INIT_PROC_ADDR helpers in compat/win32/lazyload.h.

Maybe this would be the perfect excuse to integrate this patch into
upstream Git? This would be the patch (you can also cherry-pick it from
25c4dc3a73352e72e995594cf1b4afa46e93d040 in https://github.com/dscho/git):

-- snip --
>From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin 
Date: Tue, 10 Jan 2017 23:14:20 +0100
Subject: [PATCH] Win32: simplify loading of DLL functions

Dynamic loading of DLL functions is duplicated in several places in Git
for Windows' source code.

This patch adds a pair of macros to simplify the process: the
DECLARE_PROC_ADDR(, , ,
..) macro to be used at the beginning of a
code block, and the INIT_PROC_ADDR() macro to call before
using the declared function. The return value of the INIT_PROC_ADDR()
call has to be checked; If it is NULL, the function was not found in the
specified DLL.

Example:

DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
  LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);

if (!INIT_PROC_ADDR(CreateHardLinkW))
return error("Could not find CreateHardLinkW() function";

if (!CreateHardLinkW(source, target, NULL))
return error("could not create hardlink from %S to %S",
 source, target);
return 0;

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---
 compat/win32/lazyload.h | 44 
 1 file changed, 44 insertions(+)
 create mode 100644 compat/win32/lazyload.h

diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h
new file mode 100644
index 000..91c10dad2fb
--- /dev/null
+++ b/compat/win32/lazyload.h
@@ -0,0 +1,44 @@
+#ifndef LAZYLOAD_H
+#define LAZYLOAD_H
+
+/* simplify loading of DLL functions */
+
+struct proc_addr {
+   const char *const dll;
+   const char *const function;
+   FARPROC pfunction;
+   unsigned initialized : 1;
+};
+
+/* Declares a function to be loaded dynamically from a DLL. */
+#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \
+   static struct proc_addr proc_addr_##function = \
+   { #dll, #function, NULL, 0 }; \
+   static rettype (WINAPI *function)(__VA_ARGS__)
+
+/*
+ * Loads a function from a DLL (once-only).
+ * Returns non-NULL function pointer on success.
+ * Returns NULL + errno == ENOSYS on failure.
+ */
+#define INIT_PROC_ADDR(function) \
+   (function = get_proc_addr(&proc_addr_##function))
+
+static inline void *get_proc_addr(struct proc_addr *proc)
+{
+   /* only do this once */
+   if (!proc->initialized) {
+   HANDLE hnd;
+   proc->initialized = 1;
+   hnd = LoadLibraryExA(proc->dll, NULL,
+LOAD_LIBRARY_SEARCH_SYSTEM32);
+   if (hnd)
+   proc->pfunction = GetProcAddress(hnd, proc->function);
+   }
+   /* set ENOSYS if DLL or function was not found */
+   if (!proc->pfunction)
+   errno = ENOSYS;
+   return proc->pfunction;
+}
+
+#endif
-- snap --

With this patch, this fixup to your patch would make things compile (you
can also cherry-pick d05996fb61027512b8ab31a36c4a7a677dea11bb from my
fork):

-- snipsnap --
>From d05996fb61027512b8ab31a36c4a7a677dea11bb Mon Sep 17 00:00:00 2001
From: Johannes Schinde

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-18 Thread Santiago Torres

Hello, everyone.

Sorry for being late in this thread, I was making sure I didn't say
anything outrageously wrong.

> That's Stefan; I wouldn't have suggested any approach that uses the
> blob whose sole purpose is to serve as a temporary storage area to
> pass the information to the hook as part of the permanent record.
>

Hmm, as far as I understand *this* is the status quo. We get an
ephemeral sha1/oid only if you have a hook attached. Otherwise, you will
never see the object at all, even if you have --signed set.

I suggested preparing this RFC/Patch because of the following reasons
(please let me know if my understanding of any of this is wrong...):

- I find it weird that the cli allows for a --signed push and
nowhere in the receive-pack's feedback you're told if the push
certificate is compute/stored/handled at all. I think that, at the
latest, receive pack should let the user know whether the signed
push succeeded, or if there is no hook attached to handle it.

- *if there is a hook* the blob is computed, but it is up to the
hook itself to store it *somewhere*. This makes me feel like it's
somewhat of a useless waste of computation if the hook is not
meant to handle it anyway(which is just a post-receive hook). I
find it rather weird that --signed is a builtin flag, and is
handled on the server side only partially (just my two cents).

- To me, the way push certificates are handled now feels like having
git commit -S producing a detached signature that you have to
embed somehow in the resulting commit object. (As a matter of
fact, many points on [1] seem to back this notion, and even recall
some drawbacks on push certificates the way they are handled
today)

I understand the concurrency concerns, so I agree with stefan's
solution, although I don't know how big of a deal it would be, if git
already supports --atomic pushes (admittedly, I haven't checked if there
are any guards in place for someone who pushes millions of refs
atomically). It'd be completely understandable to have a setting to
disable hadnling of --signed pushes and, ideally, a way to warn the user
of this feature being disabled if --signed is sent.

Maybe I'm missing the bigger picture, but to me it feels that a named
ref to the latest push certificate would make it easier to
handle/tool/sync around the push certificate solution?

Thanks,
-Santiago.

[1]
https://public-inbox.org/git/CAJo=hJvWbjEM9E5AjPHgmQ=eY8xf=Q=xtukeu2ur7auuqea...@mail.gmail.com/

signature.asc
Description: PGP signature

Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension




On 9/16/2017 11:27 AM, Torsten Bögershausen wrote:

On 2017-09-15 21:20, Ben Peart wrote:

+if [ "$1" != 1 ]
+then
+   echo -e "Unsupported core.fsmonitor hook version.\n" >&2
+   exit 1
+fi


The echo -e not portable
(It was detected by a tighter version of the lint script,
  which I have here, but not yet send to the list :-(

This will do:
echo  "Unsupported core.fsmonitor hook version." >&2



Thanks, I'll fix these and the ones in the t/t7519 directory.

Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

On 9/18/2017 9:32 AM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:peart...@gmail.com]
Sent: Monday, September 18, 2017 9:07 AM
To: David Turner ; 'Ben Peart'

Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
p...@peff.net
Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file
system monitor to speed up detecting new or changed files.

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
a file system monitor to speed up detecting new or changed files.

+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}

This functions return values are backwards relative to the rest of the

git_config_* functions.

I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
configured, it returns 0. I don't make use of the -1 /* default value */ option
as I didn't see any use/value in this case. What is backwards?

The other git_config_* functions return 1 for error and 0 for success.

Hmm, I followed the model (ie copy/paste :)) used by the tracked cache. 
If you look at how it uses, the return value, it is 0 == false == remove 
the extension, 1 == true == add the extension.  I'm doing the same with 
fsmonitor.

static void tweak_untracked_cache(struct index_state *istate)
{
switch (git_config_get_untracked_cache()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_untracked_cache(istate);
break;
case 1: /* true */
add_untracked_cache(istate);
break;
default: /* unknown value: do nothing */
break;
}
}

void tweak_fsmonitor(struct index_state *istate)
{
switch (git_config_get_fsmonitor()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_fsmonitor(istate);
break;
case 1: /* true */
add_fsmonitor(istate);
break;
default: /* unknown value: do nothing */
break;
}
}

[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/

Did you intend to make a comment here?

Sorry.  I was going to make a comment that I didn't see how that could work
since we weren't touching the untracked cache here, but then I saw the bit
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.

The logic here assumes that when the index is written out, it is valid 
including the untracked cache and the CE_FSMONITOR_VALID bits. 
Therefore it should still all be valid as of the time the fsmonitor was 
queried and the index saved.

Another way of thinking about this is that the fsmonitor extension is 
simply adding another persisted bit on each index entry.  It just gets 
persisted/restored differently than the other persisted bits.

Obviously, before we can use it assuming it reflects the *current* state 
of the working directory, we have to refresh the bits via the refresh logic.

Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension




On 9/17/2017 4:02 AM, Junio C Hamano wrote:

Ben Peart  writes:


diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
new file mode 100644
index 00..482d749bb9
--- /dev/null
+++ b/t/helper/test-dump-fsmonitor.c
@@ -0,0 +1,21 @@
+#include "cache.h"
+
+int cmd_main(int ac, const char **av)
+{
+   struct index_state *istate = &the_index;
+   int i;
+
+   setup_git_directory();
+   if (do_read_index(istate, get_index_file(), 0) < 0)
+   die("unable to read index file");
+   if (!istate->fsmonitor_last_update) {
+   printf("no fsmonitor\n");
+   return 0;
+   }
+   printf("fsmonitor last update %"PRIuMAX"\n", 
istate->fsmonitor_last_update);


After pushing this out and had Travis complain, I queued a squash on
top of this to cast the argument to (uintmax_t), like you did in an
earlier step (I think it was [PATCH 04/12]).



Thanks. I'll update this to cast it as (uint64_t) as that is what 
get/put_be64 use.  As far as I can tell they both map to the same thing 
(unsigned long long) so there isn't functional difference.

Re: [PATCH] t2018: remove unwanted empty line

2017-09-18 Thread Kaartic Sivaraam


On Monday 18 September 2017 12:52 AM, Kevin Daudt wrote:

Why is this empty line unwanted? This kind of whitespace can help
separate logical sections, just like paragraphs would.

You're right. I seem to have sent a fix precariously because I haven't 
such separation
before (forgetting the fact that git has contributors whose way of 
writing test vary diversly).
Should better stop back and think the next time rather than spamming the 
list. Sorry, for this.


---
Kaartic

RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread David Turner

> -Original Message-
> From: Ben Peart [mailto:peart...@gmail.com]
> Sent: Monday, September 18, 2017 9:07 AM
> To: David Turner ; 'Ben Peart'
> 
> Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> p...@peff.net
> Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a 
> file
> system monitor to speed up detecting new or changed files.
> 
> Thanks for taking the time to review/provide feedback!
> 
> On 9/15/2017 5:35 PM, David Turner wrote:
> >> -Original Message-
> >> From: Ben Peart [mailto:benpe...@microsoft.com]
> >> Sent: Friday, September 15, 2017 3:21 PM
> >> To: benpe...@microsoft.com
> >> Cc: David Turner ; ava...@gmail.com;
> >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> >> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
> >> a file system monitor to speed up detecting new or changed files.
> >
> >> +int git_config_get_fsmonitor(void)
> >> +{
> >> +  if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
> >> +  core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> >> +
> >> +  if (core_fsmonitor && !*core_fsmonitor)
> >> +  core_fsmonitor = NULL;
> >> +
> >> +  if (core_fsmonitor)
> >> +  return 1;
> >> +
> >> +  return 0;
> >> +}
> >
> > This functions return values are backwards relative to the rest of the
> git_config_* functions.
> 
> I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
> configured, it returns 0. I don't make use of the -1 /* default value */ 
> option
> as I didn't see any use/value in this case. What is backwards?

The other git_config_* functions return 1 for error and 0 for success.

> > [snip]
> >
> > +>  /*
> > +>   * With fsmonitor, we can trust the untracked cache's valid field.
> > +>   */
> >
> 
> Did you intend to make a comment here?

Sorry.  I was going to make a comment that I didn't see how that could work 
since we weren't touching the untracked cache here, but then I saw the bit 
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.

Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.




On 9/17/2017 4:03 AM, Junio C Hamano wrote:

Ben Peart  writes:


+[[fsmonitor-watchman]]
+fsmonitor-watchman
+~~~


I've queued a mini squash on top to make sure the  line aligns
with the length of the string above it by adding three ~'s here.



Thanks, I'll do the same assuming there will be another re-roll.

Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.




On 9/15/2017 3:43 PM, David Turner wrote:




-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor
extension.

This includes the core.fsmonitor setting, the query-fsmonitor hook, and the
fsmonitor index extension.

Signed-off-by: Ben Peart 
---
  Documentation/config.txt |  6 ++
  Documentation/githooks.txt   | 23 +++
  Documentation/technical/index-format.txt | 19 +++
  3 files changed, 48 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt index
dc4e3f58a2..c196007a27 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -413,6 +413,12 @@ core.protectNTFS::
8.3 "short" names.
Defaults to `true` on Windows, and `false` elsewhere.

+core.fsmonitor::
+   If set, the value of this variable is used as a command which
+   will identify all files that may have changed since the
+   requested date/time. This information is used to speed up git by
+   avoiding unnecessary processing of files that have not changed.


I'm confused here.  You have a file called "fsmonitor-watchman", which seems to 
discuss the protocol for core.fsmonitor scripts in general, and you have this 
documentation, which does not link to that file.  Can you clarify this?


I'll add the missing link to the documentation in githooks.txt.  The 
documentation should be enough for someone to develop another 
integration script.


The fsmonitor-watchman script allows people to easily use this patch 
series with the existing Watchman monitor but it can certainly also be 
used as a sample for how to integrate with another file system monitor.







+The hook should output to stdout the list of all files in the working
+directory that may have changed since the requested time.  The logic
+should be inclusive so that it does not miss any potential changes.


+"It is OK to include files which have not actually changed.  Newly-created and 
deleted files should also be included.  When files are renamed, both the old and the new 
name should be included."

Also, please discuss case sensitivity issues (e.g. on OS X).


+The paths should be relative to the root of the working directory and
+be separated by a single NUL.





+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+   time which is stored as the nanoseconds elapsed since midnight,
+   January 1, 1970.


Nit: Please specify signed or unsigned for these.  (I expect to be getting out 
of
cryosleep around 2262, and I want to know if my old git repos will keep 
working...)



While I'm not opposed to specifying unsigned, I did notice that the only 
place signed/unsigned is specified today is in "index entry." Everywhere 
else doesn't specify so I left it off for consistency.  I've not seen 
negative version numbers nor negative time so am not entirely sure it is 
necessary to clarify. :)



+  - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+is not CE_FSMONITOR_VALID.

Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file 
system
monitor to speed up detecting new or changed files.

+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}

This functions return values are backwards relative to the rest of the 
git_config_* functions.

I'm confused.  If core.fsmonitor is configured, it returns 1. If it is 
not configured, it returns 0. I don't make use of the -1 /* default 
value */ option as I didn't see any use/value in this case. What is 
backwards?

[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/

Did you intend to make a comment here?

[snip]

+int read_fsmonitor_extension(struct index_state *istate, const void *data,
+   unsigned long sz)
+{

If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak.

Good catch!  Thank you.

[snip]

+   /* a fsmonitor process can return '*' to indicate all entries are 
invalid */

That's not documented in your documentation.  Also, I'm not sure I like it: what
if I have a file whose name starts with '*'?  Yeah, that would be silly, but 
this indicates the need
for the protocol to have some sort of signaling mechanism that's out-of-band  
Maybe
have some key\0value\0 pairs and then \0\0 and then the list of files?  Or, if 
you want to keep
it really simple, allow an entry of '/' (which is an invalid filename) to mean 
'all'.

Yea, this was an optimization I added late in the game to get around an 
issue in Watchman where it returns the name of every file the first time 
you query it (rather than the set of files that have actually changed 
since the requested time).

I didn't realize the wild card '*' was a valid character for a filename. 
 I'll switch to '/' as you suggest as I don't want to complicate things 
unnecessarily to handle this relatively rare optimization.  I'll also 
get it documented properly.  Thanks!

+void add_fsmonitor(struct index_state *istate) {
+   int i;
+
+   if (!istate->fsmonitor_last_update) {

[snip]

+   /* reset the untracked cache */

Is this really necessary?  Shouldn't the untracked cache be in a correct state 
already?

When fsmonitor is not turned on, I'm not explicitly invalidating 
untracked cache directory entries as git makes changes to files. While I 
doubt the sequence happens of 1) git making changes to files, *then* 2) 
turning on fsmonitor - I thought it better safe than sorry to assume 
that pattern won't ever happen in the future.  Especially since turning 
on the extension is rare and the cost is low.

+/*
+ * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate

Nit: "s/entries/entry's/".

Re: [PATCH] test-drop-caches: mark file local symbols with static