Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
On 09/19/2017 02:08 AM, Stefan Beller wrote: >> I am hoping that this last one is not allowed and we can use the >> "same condition is checked every time we loop" version that hides >> the uglyness inside the macro. > > By which you are referring to Jonathans solution posted. > Maybe we can combine the two solutions (checking for thelist > to not be NULL once, by Jonathan) and using an outer structure > (SZEDERs solution) by replacing the condition by a for loop, > roughly (untested): > > #define for_each_string_list_item(item,list) \ > - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) > +for (; list; list = NULL) > +for (item = (list)->items; item < (list)->items + (list)->nr; ++item) > > as that would not mingle with any dangling else clause. > It is also just one statement, such that > > if (bla) > for_each_string_list_item { > baz(item); > } > else > foo; > > still works. > > Are there downsides to this combined approach? On the plus side, it's pleasantly devious; I wouldn't have thought of using a `for` loop for the initial test. But it doesn't work as written, because (1) we don't need to guard against `list` being NULL, but rather `list->items`; and (2) we don't have the liberty to set `list = NULL` (or `list->items = NULL`, because `list` is owned by the caller and we shouldn't modify it. The following is a bit closer: #define for_each_string_list_item(item,list) \ for (item = (list)->items; item; item = NULL) \ for (; item < (list)->items + (list)->nr; ++item) But I think that also fails, because a callsite that does for_each_string_list_item(myitem, mylist) if (myitem.util) break; would expect that `myitem` is still set after breaking out of the loop, whereas the outer `for` loop would reset it to NULL. If `break` were an expression we could do something like #define for_each_string_list_item(item,list) \ for (item = (list)->items; item; break) \ for (; item < (list)->items + (list)->nr; ++item) So I think we're still left with the suggestions of Jonathan or Gábor. Or the bigger change of initializing `string_list::items` to point at an empty sentinal array (similar to `strbuf_slopbuf`) rather than NULL. Personally, I think that Jonathan's approach makes the most sense, unless somebody wants to jump in an implement a `string_list_slopbuf`. By the way, I wonder if any open-coded loops over `string_lists` make the same mistake as the macro? Michael
Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted
On Tue, 2017-09-19 at 12:01 +0900, Junio C Hamano wrote: > > Hmph. I cannot shake this nagging feeling that this is probably a > solution that is overly narrow to a single problem that won't scale > into the future. > > [...snip good point...] > > If we drop the "verification" step from the above, that essentially > becomes an equivaent to "hash-object -t tag -w --stdin". > > So I now have to wonder if it may be sufficient to use "hash-object" > in filter-branch, without doing this "allow malformed data that we > would not permit if the tag were being created today, only to help > replaying an old, already broken data" change to "git mktag". > > Is there something that makes "hash-object" insufficient (like it > still does some extra checks we would want to disable and cannot > work as a replacement for your "--allow-missing-tagger")? I've done a couple of quick tests and it looks like it will work. I'll run a few more checks and repost. Ian.
[PATCH v2 03/21] packed_ref_cache: add a backlink to the associated `packed_ref_store`
It will prove convenient in upcoming patches. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index e411501871..a3d9210cb0 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -7,7 +7,15 @@ #include "../iterator.h" #include "../lockfile.h" +struct packed_ref_store; + struct packed_ref_cache { + /* +* A back-pointer to the packed_ref_store with which this +* cache is associated: +*/ + struct packed_ref_store *refs; + struct ref_cache *cache; /* @@ -154,7 +162,7 @@ static const char *parse_ref_line(struct strbuf *line, struct object_id *oid) } /* - * Read from `packed_refs_file` into a newly-allocated + * Read from the `packed-refs` file into a newly-allocated * `packed_ref_cache` and return it. The return value will already * have its reference count incremented. * @@ -182,7 +190,7 @@ static const char *parse_ref_line(struct strbuf *line, struct object_id *oid) * compatibility with older clients, but we do not require it * (i.e., "peeled" is a no-op if "fully-peeled" is set). */ -static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file) +static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) { FILE *f; struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs)); @@ -191,11 +199,12 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file) enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE; struct ref_dir *dir; + packed_refs->refs = refs; acquire_packed_ref_cache(packed_refs); packed_refs->cache = create_ref_cache(NULL, NULL); packed_refs->cache->root->flag &= ~REF_INCOMPLETE; - f = fopen(packed_refs_file, "r"); + f = fopen(refs->path, "r"); if (!f) { if (errno == ENOENT) { /* @@ -205,7 +214,7 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file) */ return packed_refs; } else { - die_errno("couldn't read %s", packed_refs_file); + die_errno("couldn't read %s", refs->path); } } @@ -218,7 +227,7 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file) const char *traits; if (!line.len || line.buf[line.len - 1] != '\n') - die("unterminated line in %s: %s", packed_refs_file, line.buf); + die("unterminated line in %s: %s", refs->path, line.buf); if (skip_prefix(line.buf, "# pack-refs with:", &traits)) { if (strstr(traits, " fully-peeled ")) @@ -258,7 +267,7 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file) last->flag |= REF_KNOWS_PEELED; } else { strbuf_setlen(&line, line.len - 1); - die("unexpected line in %s: %s", packed_refs_file, line.buf); + die("unexpected line in %s: %s", refs->path, line.buf); } } @@ -293,7 +302,7 @@ static struct packed_ref_cache *get_packed_ref_cache(struct packed_ref_store *re validate_packed_ref_cache(refs); if (!refs->cache) - refs->cache = read_packed_refs(refs->path); + refs->cache = read_packed_refs(refs); return refs->cache; } -- 2.14.1
[PATCH v2 19/21] ref_cache: remove support for storing peeled values
Now that the `packed-refs` backend doesn't use `ref_cache`, there is nobody left who might want to store peeled values of references in `ref_cache`. So remove that feature. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 9 - refs/ref-cache.c | 42 +- refs/ref-cache.h | 32 ++-- 3 files changed, 11 insertions(+), 72 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index b4a01637a6..b1571872fa 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -2,7 +2,6 @@ #include "../config.h" #include "../refs.h" #include "refs-internal.h" -#include "ref-cache.h" #include "packed-backend.h" #include "../iterator.h" #include "../lockfile.h" @@ -226,6 +225,14 @@ static NORETURN void die_invalid_line(const char *path, } +/* + * This value is set in `base.flags` if the peeled value of the + * current reference is known. In that case, `peeled` contains the + * correct peeled value for the reference, which might be `null_sha1` + * if the reference is not a tag or if it is broken. + */ +#define REF_KNOWS_PEELED 0x40 + /* * An iterator over a packed-refs file that is currently mmapped. */ diff --git a/refs/ref-cache.c b/refs/ref-cache.c index 54dfb5218c..4f850e1b5c 100644 --- a/refs/ref-cache.c +++ b/refs/ref-cache.c @@ -38,7 +38,6 @@ struct ref_entry *create_ref_entry(const char *refname, FLEX_ALLOC_STR(ref, name, refname); oidcpy(&ref->u.value.oid, oid); - oidclr(&ref->u.value.peeled); ref->flag = flag; return ref; } @@ -491,49 +490,10 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator) } } -enum peel_status peel_entry(struct ref_entry *entry, int repeel) -{ - enum peel_status status; - - if (entry->flag & REF_KNOWS_PEELED) { - if (repeel) { - entry->flag &= ~REF_KNOWS_PEELED; - oidclr(&entry->u.value.peeled); - } else { - return is_null_oid(&entry->u.value.peeled) ? - PEEL_NON_TAG : PEEL_PEELED; - } - } - if (entry->flag & REF_ISBROKEN) - return PEEL_BROKEN; - if (entry->flag & REF_ISSYMREF) - return PEEL_IS_SYMREF; - - status = peel_object(entry->u.value.oid.hash, entry->u.value.peeled.hash); - if (status == PEEL_PEELED || status == PEEL_NON_TAG) - entry->flag |= REF_KNOWS_PEELED; - return status; -} - static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator, struct object_id *peeled) { - struct cache_ref_iterator *iter = - (struct cache_ref_iterator *)ref_iterator; - struct cache_ref_iterator_level *level; - struct ref_entry *entry; - - level = &iter->levels[iter->levels_nr - 1]; - - if (level->index == -1) - die("BUG: peel called before advance for cache iterator"); - - entry = level->dir->entries[level->index]; - - if (peel_entry(entry, 0)) - return -1; - oidcpy(peeled, &entry->u.value.peeled); - return 0; + return peel_object(ref_iterator->oid->hash, peeled->hash); } static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator) diff --git a/refs/ref-cache.h b/refs/ref-cache.h index a082bfb06c..eda65e73ed 100644 --- a/refs/ref-cache.h +++ b/refs/ref-cache.h @@ -38,14 +38,6 @@ struct ref_value { * referred to by the last reference in the symlink chain. */ struct object_id oid; - - /* -* If REF_KNOWS_PEELED, then this field holds the peeled value -* of this reference, or null if the reference is known not to -* be peelable. See the documentation for peel_ref() for an -* exact definition of "peelable". -*/ - struct object_id peeled; }; /* @@ -97,21 +89,14 @@ struct ref_dir { * public values; see refs.h. */ -/* - * The field ref_entry->u.value.peeled of this value entry contains - * the correct peeled value for the reference, which might be - * null_sha1 if the reference is not a tag or if it is broken. - */ -#define REF_KNOWS_PEELED 0x10 - /* ref_entry represents a directory of references */ -#define REF_DIR 0x20 +#define REF_DIR 0x10 /* * Entry has not yet been read from disk (used only for REF_DIR * entries representing loose references) */ -#define REF_INCOMPLETE 0x40 +#define REF_INCOMPLETE 0x20 /* * A ref_entry represents either a reference or a "subdirectory" of @@ -252,17 +237,4 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache, const char *prefix, int prime_dir); -/* - * Peel the entry (if possible) and return its new peel_status. If - * repeel is true, re-peel the entry even if ther
[PATCH v2 21/21] packed-backend.c: rename a bunch of things and update comments
We've made huge changes to this file, and some of the old names and comments are no longer very fitting. So rename a bunch of things: * `struct packed_ref_cache` → `struct snapshot` * `acquire_packed_ref_cache()` → `acquire_snapshot()` * `release_packed_ref_buffer()` → `clear_snapshot_buffer()` * `release_packed_ref_cache()` → `release_snapshot()` * `clear_packed_ref_cache()` → `clear_snapshot()` * `struct packed_ref_entry` → `struct snapshot_record` * `cmp_packed_ref_entries()` → `cmp_packed_ref_records()` * `cmp_entry_to_refname()` → `cmp_record_to_refname()` * `sort_packed_refs()` → `sort_snapshot()` * `read_packed_refs()` → `create_snapshot()` * `validate_packed_ref_cache()` → `validate_snapshot()` * `get_packed_ref_cache()` → `get_snapshot()` * Renamed local variables and struct members accordingly. Also update a bunch of comments to reflect the renaming and the accumulated changes that the code has undergone. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 422 +++--- 1 file changed, 232 insertions(+), 190 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 92b837a237..b9351da843 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -37,10 +37,30 @@ static enum mmap_strategy mmap_strategy = MMAP_OK; struct packed_ref_store; -struct packed_ref_cache { +/* + * A `snapshot` represents one snapshot of a `packed-refs` file. + * + * Normally, this will be a mmapped view of the contents of the + * `packed-refs` file at the time the snapshot was created. However, + * if the `packed-refs` file was not sorted, this might point at heap + * memory holding the contents of the `packed-refs` file with its + * records sorted by refname. + * + * `snapshot` instances are reference counted (via + * `acquire_snapshot()` and `release_snapshot()`). This is to prevent + * an instance from disappearing while an iterator is still iterating + * over it. Instances are garbage collected when their `referrers` + * count goes to zero. + * + * The most recent `snapshot`, if available, is referenced by the + * `packed_ref_store`. Its freshness is checked whenever + * `get_snapshot()` is called; if the existing snapshot is obsolete, a + * new snapshot is taken. + */ +struct snapshot { /* * A back-pointer to the packed_ref_store with which this -* cache is associated: +* snapshot is associated: */ struct packed_ref_store *refs; @@ -61,26 +81,42 @@ struct packed_ref_cache { size_t header_len; /* -* What is the peeled state of this cache? (This is usually -* determined from the header of the "packed-refs" file.) +* What is the peeled state of the `packed-refs` file that +* this snapshot represents? (This is usually determined from +* the file's header.) */ enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled; /* -* Count of references to the data structure in this instance, -* including the pointer from files_ref_store::packed if any. -* The data will not be freed as long as the reference count -* is nonzero. +* Count of references to this instance, including the pointer +* from `packed_ref_store::snapshot`, if any. The instance +* will not be freed as long as the reference count is +* nonzero. */ unsigned int referrers; - /* The metadata from when this packed-refs cache was read */ + /* +* The metadata of the `packed-refs` file from which this +* snapshot was created, used to tell if the file has been +* replaced since we read it. +*/ struct stat_validity validity; }; /* - * A container for `packed-refs`-related data. It is not (yet) a - * `ref_store`. + * A `ref_store` representing references stored in a `packed-refs` + * file. It implements the `ref_store` interface, though it has some + * limitations: + * + * - It cannot store symbolic references. + * + * - It cannot store reflogs. + * + * - It does not support reference renaming (though it could). + * + * On the other hand, it can be locked outside of a reference + * transaction. In that case, it remains locked even after the + * transaction is done and the new `packed-refs` file is activated. */ struct packed_ref_store { struct ref_store base; @@ -91,10 +127,10 @@ struct packed_ref_store { char *path; /* -* A cache of the values read from the `packed-refs` file, if -* it might still be current; otherwise, NULL. +* A snapshot of the values read from the `packed-refs` file, +* if it might still be current; otherwise, NULL. */ - struct packed_ref_cache *cache; + struct snapshot *snapshot; /* * Lock used for the "packed-refs" file. Note that this (and @@ -111,43 +147,42 @@ struct packed_ref_store { }; /*
[PATCH v2 06/21] read_packed_refs(): only check for a header at the top of the file
This tightens up the parsing a bit; previously, stray header-looking lines would have been processed. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 35 --- 1 file changed, 24 insertions(+), 11 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 154abbd83a..141f02b9c8 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -255,11 +255,34 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) pos = buf; eof = buf + size; + /* If the file has a header line, process it: */ + if (pos < eof && *pos == '#') { + const char *traits; + + eol = memchr(pos, '\n', eof - pos); + if (!eol) + die_unterminated_line(refs->path, pos, eof - pos); + + strbuf_add(&line, pos, eol + 1 - pos); + + if (!skip_prefix(line.buf, "# pack-refs with:", &traits)) + die_invalid_line(refs->path, pos, eof - pos); + + if (strstr(traits, " fully-peeled ")) + peeled = PEELED_FULLY; + else if (strstr(traits, " peeled ")) + peeled = PEELED_TAGS; + /* perhaps other traits later as well */ + + /* The "+ 1" is for the LF character. */ + pos = eol + 1; + strbuf_reset(&line); + } + dir = get_ref_dir(packed_refs->cache->root); while (pos < eof) { struct object_id oid; const char *refname; - const char *traits; eol = memchr(pos, '\n', eof - pos); if (!eol) @@ -267,15 +290,6 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) strbuf_add(&line, pos, eol + 1 - pos); - if (skip_prefix(line.buf, "# pack-refs with:", &traits)) { - if (strstr(traits, " fully-peeled ")) - peeled = PEELED_FULLY; - else if (strstr(traits, " peeled ")) - peeled = PEELED_TAGS; - /* perhaps other traits later as well */ - goto next_line; - } - refname = parse_ref_line(&line, &oid); if (refname) { int flag = REF_ISPACKED; @@ -307,7 +321,6 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) die_invalid_line(refs->path, line.buf, line.len); } - next_line: /* The "+ 1" is for the LF character. */ pos = eol + 1; strbuf_reset(&line); -- 2.14.1
[PATCH v2 17/21] ref_store: implement `refs_peel_ref()` generically
We're about to stop storing packed refs in a `ref_cache`. That means that the only way we have left to optimize `peel_ref()` is by checking whether the reference being peeled is the one currently being iterated over (in `current_ref_iter`), and if so, using `ref_iterator_peel()`. But this can be done generically; it doesn't have to be implemented per-backend. So implement `refs_peel_ref()` in `refs.c` and remove the `peel_ref()` method from the refs API. This removes the last callers of a couple of functions, so delete them. More cleanup to come... Signed-off-by: Michael Haggerty --- refs.c| 18 +- refs/files-backend.c | 38 -- refs/packed-backend.c | 36 refs/refs-internal.h | 3 --- 4 files changed, 17 insertions(+), 78 deletions(-) diff --git a/refs.c b/refs.c index 101c107ee8..c5e6f79c77 100644 --- a/refs.c +++ b/refs.c @@ -1735,7 +1735,23 @@ int refs_pack_refs(struct ref_store *refs, unsigned int flags) int refs_peel_ref(struct ref_store *refs, const char *refname, unsigned char *sha1) { - return refs->be->peel_ref(refs, refname, sha1); + int flag; + unsigned char base[20]; + + if (current_ref_iter && current_ref_iter->refname == refname) { + struct object_id peeled; + + if (ref_iterator_peel(current_ref_iter, &peeled)) + return -1; + hashcpy(sha1, peeled.hash); + return 0; + } + + if (refs_read_ref_full(refs, refname, + RESOLVE_REF_READING, base, &flag)) + return -1; + + return peel_object(base, sha1); } int peel_ref(const char *refname, unsigned char *sha1) diff --git a/refs/files-backend.c b/refs/files-backend.c index 35648c89fc..7d12de88d0 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -655,43 +655,6 @@ static int lock_raw_ref(struct files_ref_store *refs, return ret; } -static int files_peel_ref(struct ref_store *ref_store, - const char *refname, unsigned char *sha1) -{ - struct files_ref_store *refs = - files_downcast(ref_store, REF_STORE_READ | REF_STORE_ODB, - "peel_ref"); - int flag; - unsigned char base[20]; - - if (current_ref_iter && current_ref_iter->refname == refname) { - struct object_id peeled; - - if (ref_iterator_peel(current_ref_iter, &peeled)) - return -1; - hashcpy(sha1, peeled.hash); - return 0; - } - - if (refs_read_ref_full(ref_store, refname, - RESOLVE_REF_READING, base, &flag)) - return -1; - - /* -* If the reference is packed, read its ref_entry from the -* cache in the hope that we already know its peeled value. -* We only try this optimization on packed references because -* (a) forcing the filling of the loose reference cache could -* be expensive and (b) loose references anyway usually do not -* have REF_KNOWS_PEELED. -*/ - if (flag & REF_ISPACKED && - !refs_peel_ref(refs->packed_ref_store, refname, sha1)) - return 0; - - return peel_object(base, sha1); -} - struct files_ref_iterator { struct ref_iterator base; @@ -3012,7 +2975,6 @@ struct ref_storage_be refs_be_files = { files_initial_transaction_commit, files_pack_refs, - files_peel_ref, files_create_symref, files_delete_refs, files_rename_ref, diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 2cf2f7f73d..6a930a440c 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -849,26 +849,6 @@ static struct packed_ref_cache *get_packed_ref_cache(struct packed_ref_store *re return refs->cache; } -static struct ref_dir *get_packed_ref_dir(struct packed_ref_cache *packed_ref_cache) -{ - return get_ref_dir(packed_ref_cache->cache->root); -} - -static struct ref_dir *get_packed_refs(struct packed_ref_store *refs) -{ - return get_packed_ref_dir(get_packed_ref_cache(refs)); -} - -/* - * Return the ref_entry for the given refname from the packed - * references. If it does not exist, return NULL. - */ -static struct ref_entry *get_packed_ref(struct packed_ref_store *refs, - const char *refname) -{ - return find_ref_entry(get_packed_refs(refs), refname); -} - static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, unsigned char *sha1, struct strbuf *referent, unsigned int *type) @@ -895,21 +875,6 @@ static int packed_read_raw_ref(struct ref_store *ref_store, return 0; } -static int packed_peel_ref(struct ref_store *ref_store, -
[PATCH v2 01/21] ref_iterator: keep track of whether the iterator output is ordered
References are iterated over in order by refname, but reflogs are not. Some consumers of reference iteration care about the difference. Teach each `ref_iterator` to keep track of whether its output is ordered. `overlay_ref_iterator` is one of the picky consumers. Add a sanity check in `overlay_ref_iterator_begin()` to verify that its inputs are ordered. Signed-off-by: Michael Haggerty --- refs.c| 4 refs/files-backend.c | 16 +--- refs/iterator.c | 15 ++- refs/packed-backend.c | 2 +- refs/ref-cache.c | 2 +- refs/ref-cache.h | 3 ++- refs/refs-internal.h | 23 +++ 7 files changed, 46 insertions(+), 19 deletions(-) diff --git a/refs.c b/refs.c index b0106b8162..101c107ee8 100644 --- a/refs.c +++ b/refs.c @@ -1309,6 +1309,10 @@ struct ref_iterator *refs_ref_iterator_begin( if (trim) iter = prefix_ref_iterator_begin(iter, "", trim); + /* Sanity check for subclasses: */ + if (!iter->ordered) + BUG("reference iterator is not ordered"); + return iter; } diff --git a/refs/files-backend.c b/refs/files-backend.c index 961424a4ea..35648c89fc 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -762,7 +762,7 @@ static struct ref_iterator *files_ref_iterator_begin( const char *prefix, unsigned int flags) { struct files_ref_store *refs; - struct ref_iterator *loose_iter, *packed_iter; + struct ref_iterator *loose_iter, *packed_iter, *overlay_iter; struct files_ref_iterator *iter; struct ref_iterator *ref_iterator; unsigned int required_flags = REF_STORE_READ; @@ -772,10 +772,6 @@ static struct ref_iterator *files_ref_iterator_begin( refs = files_downcast(ref_store, required_flags, "ref_iterator_begin"); - iter = xcalloc(1, sizeof(*iter)); - ref_iterator = &iter->base; - base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable); - /* * We must make sure that all loose refs are read before * accessing the packed-refs file; this avoids a race @@ -811,7 +807,13 @@ static struct ref_iterator *files_ref_iterator_begin( refs->packed_ref_store, prefix, 0, DO_FOR_EACH_INCLUDE_BROKEN); - iter->iter0 = overlay_ref_iterator_begin(loose_iter, packed_iter); + overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter); + + iter = xcalloc(1, sizeof(*iter)); + ref_iterator = &iter->base; + base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable, + overlay_iter->ordered); + iter->iter0 = overlay_iter; iter->flags = flags; return ref_iterator; @@ -2084,7 +2086,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st struct ref_iterator *ref_iterator = &iter->base; struct strbuf sb = STRBUF_INIT; - base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable); + base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0); files_reflog_path(refs, &sb, NULL); iter->dir_iterator = dir_iterator_begin(sb.buf); iter->ref_store = ref_store; diff --git a/refs/iterator.c b/refs/iterator.c index 4cf449ef66..c475360f0a 100644 --- a/refs/iterator.c +++ b/refs/iterator.c @@ -25,9 +25,11 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator) } void base_ref_iterator_init(struct ref_iterator *iter, - struct ref_iterator_vtable *vtable) + struct ref_iterator_vtable *vtable, + int ordered) { iter->vtable = vtable; + iter->ordered = !!ordered; iter->refname = NULL; iter->oid = NULL; iter->flags = 0; @@ -72,7 +74,7 @@ struct ref_iterator *empty_ref_iterator_begin(void) struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter)); struct ref_iterator *ref_iterator = &iter->base; - base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable); + base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable, 1); return ref_iterator; } @@ -205,6 +207,7 @@ static struct ref_iterator_vtable merge_ref_iterator_vtable = { }; struct ref_iterator *merge_ref_iterator_begin( + int ordered, struct ref_iterator *iter0, struct ref_iterator *iter1, ref_iterator_select_fn *select, void *cb_data) { @@ -219,7 +222,7 @@ struct ref_iterator *merge_ref_iterator_begin( * references through only if they exist in both iterators. */ - base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable); + base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable, ordered); iter->iter0 = iter0; iter->iter1 = iter1; iter->select = select; @@ -268,9 +271,11 @@ struct ref_itera
[PATCH v2 07/21] read_packed_refs(): make parsing of the header line more robust
The old code parsed the traits in the `packed-refs` header by looking for the string " trait " (i.e., the name of the trait with a space on either side) in the header line. This is fragile, because if any other implementation of Git forgets to write the trailing space, the last trait would silently be ignored (and the error might never be noticed). So instead, use `string_list_split_in_place()` to split the traits into tokens then use `unsorted_string_list_has_string()` to look for the tokens we are interested in. This means that we can read the traits correctly even if the header line is missing a trailing space (or indeed, if it is missing the space after the colon, or if it has multiple spaces somewhere). However, older Git clients (and perhaps other Git implementations) still require the surrounding spaces, so we still have to output the header with a trailing space. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 141f02b9c8..a45e3ff92f 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -257,25 +257,30 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) /* If the file has a header line, process it: */ if (pos < eof && *pos == '#') { - const char *traits; + char *p; + struct string_list traits = STRING_LIST_INIT_NODUP; eol = memchr(pos, '\n', eof - pos); if (!eol) die_unterminated_line(refs->path, pos, eof - pos); - strbuf_add(&line, pos, eol + 1 - pos); + strbuf_add(&line, pos, eol - pos); - if (!skip_prefix(line.buf, "# pack-refs with:", &traits)) + if (!skip_prefix(line.buf, "# pack-refs with:", (const char **)&p)) die_invalid_line(refs->path, pos, eof - pos); - if (strstr(traits, " fully-peeled ")) + string_list_split_in_place(&traits, p, ' ', -1); + + if (unsorted_string_list_has_string(&traits, "fully-peeled")) peeled = PEELED_FULLY; - else if (strstr(traits, " peeled ")) + else if (unsorted_string_list_has_string(&traits, "peeled")) peeled = PEELED_TAGS; /* perhaps other traits later as well */ /* The "+ 1" is for the LF character. */ pos = eol + 1; + + string_list_clear(&traits, 0); strbuf_reset(&line); } @@ -610,7 +615,11 @@ int packed_refs_is_locked(struct ref_store *ref_store) /* * The packed-refs header line that we write out. Perhaps other - * traits will be added later. The trailing space is required. + * traits will be added later. + * + * Note that earlier versions of Git used to parse these traits by + * looking for " trait " in the line. For this reason, the space after + * the colon and the trailing space are required. */ static const char PACKED_REFS_HEADER[] = "# pack-refs with: peeled fully-peeled \n"; -- 2.14.1
[PATCH v2 12/21] packed-backend.c: reorder some definitions
No code has been changed. This will make subsequent patches more self-contained. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 48 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 724c88631d..0fe41a7203 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -36,30 +36,6 @@ struct packed_ref_cache { struct stat_validity validity; }; -/* - * Increment the reference count of *packed_refs. - */ -static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs) -{ - packed_refs->referrers++; -} - -/* - * Decrease the reference count of *packed_refs. If it goes to zero, - * free *packed_refs and return true; otherwise return false. - */ -static int release_packed_ref_cache(struct packed_ref_cache *packed_refs) -{ - if (!--packed_refs->referrers) { - free_ref_cache(packed_refs->cache); - stat_validity_clear(&packed_refs->validity); - free(packed_refs); - return 1; - } else { - return 0; - } -} - /* * A container for `packed-refs`-related data. It is not (yet) a * `ref_store`. @@ -92,6 +68,30 @@ struct packed_ref_store { struct tempfile tempfile; }; +/* + * Increment the reference count of *packed_refs. + */ +static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs) +{ + packed_refs->referrers++; +} + +/* + * Decrease the reference count of *packed_refs. If it goes to zero, + * free *packed_refs and return true; otherwise return false. + */ +static int release_packed_ref_cache(struct packed_ref_cache *packed_refs) +{ + if (!--packed_refs->referrers) { + free_ref_cache(packed_refs->cache); + stat_validity_clear(&packed_refs->validity); + free(packed_refs); + return 1; + } else { + return 0; + } +} + struct ref_store *packed_ref_store_create(const char *path, unsigned int store_flags) { -- 2.14.1
[PATCH v2 20/21] mmapped_ref_iterator: inline into `packed_ref_iterator`
Since `packed_ref_iterator` is now delegating to `mmapped_ref_iterator` rather than `cache_ref_iterator` to do the heavy lifting, there is no need to keep the two iterators separate. So "inline" `mmapped_ref_iterator` into `packed_ref_iterator`. This removes a bunch of boilerplate. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 284 -- 1 file changed, 114 insertions(+), 170 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index b1571872fa..92b837a237 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -225,157 +225,6 @@ static NORETURN void die_invalid_line(const char *path, } -/* - * This value is set in `base.flags` if the peeled value of the - * current reference is known. In that case, `peeled` contains the - * correct peeled value for the reference, which might be `null_sha1` - * if the reference is not a tag or if it is broken. - */ -#define REF_KNOWS_PEELED 0x40 - -/* - * An iterator over a packed-refs file that is currently mmapped. - */ -struct mmapped_ref_iterator { - struct ref_iterator base; - - struct packed_ref_cache *packed_refs; - - /* The current position in the mmapped file: */ - const char *pos; - - /* The end of the mmapped file: */ - const char *eof; - - struct object_id oid, peeled; - - struct strbuf refname_buf; -}; - -static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator) -{ - struct mmapped_ref_iterator *iter = - (struct mmapped_ref_iterator *)ref_iterator; - const char *p = iter->pos, *eol; - - strbuf_reset(&iter->refname_buf); - - if (iter->pos == iter->eof) - return ref_iterator_abort(ref_iterator); - - iter->base.flags = REF_ISPACKED; - - if (iter->eof - p < GIT_SHA1_HEXSZ + 2 || - parse_oid_hex(p, &iter->oid, &p) || - !isspace(*p++)) - die_invalid_line(iter->packed_refs->refs->path, -iter->pos, iter->eof - iter->pos); - - eol = memchr(p, '\n', iter->eof - p); - if (!eol) - die_unterminated_line(iter->packed_refs->refs->path, - iter->pos, iter->eof - iter->pos); - - strbuf_add(&iter->refname_buf, p, eol - p); - iter->base.refname = iter->refname_buf.buf; - - if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) { - if (!refname_is_safe(iter->base.refname)) - die("packed refname is dangerous: %s", - iter->base.refname); - oidclr(&iter->oid); - iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN; - } - if (iter->packed_refs->peeled == PEELED_FULLY || - (iter->packed_refs->peeled == PEELED_TAGS && -starts_with(iter->base.refname, "refs/tags/"))) - iter->base.flags |= REF_KNOWS_PEELED; - - iter->pos = eol + 1; - - if (iter->pos < iter->eof && *iter->pos == '^') { - p = iter->pos + 1; - if (iter->eof - p < GIT_SHA1_HEXSZ + 1 || - parse_oid_hex(p, &iter->peeled, &p) || - *p++ != '\n') - die_invalid_line(iter->packed_refs->refs->path, -iter->pos, iter->eof - iter->pos); - iter->pos = p; - - /* -* Regardless of what the file header said, we -* definitely know the value of *this* reference. But -* we suppress it if the reference is broken: -*/ - if ((iter->base.flags & REF_ISBROKEN)) { - oidclr(&iter->peeled); - iter->base.flags &= ~REF_KNOWS_PEELED; - } else { - iter->base.flags |= REF_KNOWS_PEELED; - } - } else { - oidclr(&iter->peeled); - } - - return ITER_OK; -} - -static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator, - struct object_id *peeled) -{ - struct mmapped_ref_iterator *iter = - (struct mmapped_ref_iterator *)ref_iterator; - - if ((iter->base.flags & REF_KNOWS_PEELED)) { - oidcpy(peeled, &iter->peeled); - return is_null_oid(&iter->peeled) ? -1 : 0; - } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) { - return -1; - } else { - return !!peel_object(iter->oid.hash, peeled->hash); - } -} - -static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator) -{ - struct mmapped_ref_iterator *iter = - (struct mmapped_ref_iterator *)ref_iterator; - - release_packed_ref_cache(iter->packed_refs); - strbuf_release(&iter->refname_buf); - base_ref_iterator_free(ref_iterator); - return IT
[PATCH v2 09/21] packed_ref_cache: remember the file-wide peeling state
Rather than store the peeling state (i.e., the one defined by traits in the `packed-refs` file header line) in a local variable in `read_packed_refs()`, store it permanently in `packed_ref_cache`. This will be needed when we stop reading all packed refs at once. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 2b80f244c8..ae276f3445 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -18,6 +18,12 @@ struct packed_ref_cache { struct ref_cache *cache; + /* +* What is the peeled state of this cache? (This is usually +* determined from the header of the "packed-refs" file.) +*/ + enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled; + /* * Count of references to the data structure in this instance, * including the pointer from files_ref_store::packed if any. @@ -195,13 +201,13 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) char *buf; const char *pos, *eol, *eof; struct strbuf tmp = STRBUF_INIT; - enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE; struct ref_dir *dir; packed_refs->refs = refs; acquire_packed_ref_cache(packed_refs); packed_refs->cache = create_ref_cache(NULL, NULL); packed_refs->cache->root->flag &= ~REF_INCOMPLETE; + packed_refs->peeled = PEELED_NONE; fd = open(refs->path, O_RDONLY); if (fd < 0) { @@ -244,9 +250,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) string_list_split_in_place(&traits, p, ' ', -1); if (unsorted_string_list_has_string(&traits, "fully-peeled")) - peeled = PEELED_FULLY; + packed_refs->peeled = PEELED_FULLY; else if (unsorted_string_list_has_string(&traits, "peeled")) - peeled = PEELED_TAGS; + packed_refs->peeled = PEELED_TAGS; /* perhaps other traits later as well */ /* The "+ 1" is for the LF character. */ @@ -282,8 +288,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) oidclr(&oid); flag |= REF_BAD_NAME | REF_ISBROKEN; } - if (peeled == PEELED_FULLY || - (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/"))) + if (packed_refs->peeled == PEELED_FULLY || + (packed_refs->peeled == PEELED_TAGS && +starts_with(refname, "refs/tags/"))) flag |= REF_KNOWS_PEELED; entry = create_ref_entry(refname, &oid, flag); add_ref_entry(dir, entry); -- 2.14.1
[PATCH v2 16/21] packed_read_raw_ref(): read the reference from the mmapped buffer
Instead of reading the reference from the `ref_cache`, read it directly from the mmapped buffer. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 53d0b7e3d6..2cf2f7f73d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -875,18 +875,22 @@ static int packed_read_raw_ref(struct ref_store *ref_store, { struct packed_ref_store *refs = packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref"); - - struct ref_entry *entry; + struct packed_ref_cache *packed_refs = get_packed_ref_cache(refs); + const char *rec; *type = 0; - entry = get_packed_ref(refs, refname); - if (!entry) { + rec = find_reference_location(packed_refs, refname, 1); + + if (!rec) { + /* refname is not a packed reference. */ errno = ENOENT; return -1; } - hashcpy(sha1, entry->u.value.oid.hash); + if (get_sha1_hex(rec, sha1)) + die_invalid_line(refs->path, rec, packed_refs->eof - rec); + *type = REF_ISPACKED; return 0; } -- 2.14.1
[PATCH v2 11/21] mmapped_ref_iterator_advance(): no peeled value for broken refs
If a reference is broken, suppress its peeled value. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 312116a99d..724c88631d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -234,9 +234,15 @@ static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator) /* * Regardless of what the file header said, we -* definitely know the value of *this* reference: +* definitely know the value of *this* reference. But +* we suppress it if the reference is broken: */ - iter->base.flags |= REF_KNOWS_PEELED; + if ((iter->base.flags & REF_ISBROKEN)) { + oidclr(&iter->peeled); + iter->base.flags &= ~REF_KNOWS_PEELED; + } else { + iter->base.flags |= REF_KNOWS_PEELED; + } } else { oidclr(&iter->peeled); } -- 2.14.1
[PATCH v2 00/21] Read `packed-refs` using mmap()
This is v2 of a patch series that changes the reading and caching of the `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and Johannes for their comments about v1 [1]. The main change since v1 is to accommodate Windows, which doesn't let you replace a file using `rename()` if the file is currently mmapped. This is unfortunate, because it means that Windows will never get the O(N) → O(lg N) improvement for reading single references that more capable systems can now enjoy. The background was discussed on the mailing list [2]. The bottom line is that on Windows, keeping the `packed-refs` lock mmapped would be tantamount to holding reader lock on that file, preventing anybody (even unrelated processes) from changing the `packed-refs` file while it is mmapped. This is even worse than the situation for packfiles (which is solved using `close_all_packs()`), because a packfile, once created, never needs to be replaced—every packfile has a filename that is determined from its contents. The worst that can happen if a packfile is locked is that another process cannot remove it, but that is not critical for correctness. The `packed-refs` file, on the other hand, always has the same filename and needs to be overwritten for correctness. So the approach taken here is that a new compile-time option, `MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then the `packed-refs` file is read quickly into memory then closed. Even in that case, though, this branch brings significant performance benefits, because instead of parsing the whole file and storing it into lots of little objects in a `ref_cache` (which also involves a lot of small memory allocations), we copy the verbatim contents of the file into memory. Then we use the same binary search techniques to find any references that we need to read, just as we would do if the file were memory mapped. This means that we only have to fully parse the references that we are interested in, and hardly have to allocate any additional memory. I did some more careful benchmarks of this code vs. Git 2.14.1 on a repository that is not quite as pathological. The test repo has 110k references that are fully packed in a `packed-refs` file that has the `sorted` trait. The current version is compiled three ways: * `NO_MMAP=YesPlease`—prevents all use of `mmap()`. This variant is O(N) when reading a single reference. * `MMAP_PREVENTS_DELETE=YesPlease`—uses mmap for the initial read, but quickly copies the contents to heap-allocated memory and munmaps right away. This variant is also O(N) when reading a single reference. * default (mmap enabled)—the `packed-refs` file is kept mmapped for as long as it is in use. The commands that I timed were as follows: # for-each-ref, warm cache: $ git -C lots-of-refs for-each-ref --format="%(objectname) %(refname)" >/dev/null # rev-parse, warm cache (this command was run 10 times then the total # time divided by 10): $ git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733 # rev-parse, cold cache (but git binary warm): $ sync ; sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'; git rev-parse v1.0.0; time git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733 (Note that the `rev-parse` commands involve a handful of reference lookups as the argument is DWIMmed.) Results: for-each-ref rev-parse rev-parse warm cachewarm cachecold cache ---- v2.14.1 92 ms 23.7 ms 30 ms NO_MMAP=YesPlease 83 ms3.4 ms 10 ms MMAP_PREVENTS_DELETE=YesPlease82 ms3.5 ms 11 ms default (mmap enabled)81 ms0.8 ms 6 ms So this branch is a little bit faster at iterating over all references, but it really has big advantages when looking up single references. The advantage is smaller if `NO_MMAP` or `MMAP_PREVENTS_DELETE` is set, but is still quite significant. This branch is also available from my fork on GitHub as branch `mmap-packed-refs`. My main uncertainties are: 1. Does this code actually work on Windows? 2. Did I implement the new compile-time option correctly? (I just cargo-culted some existing options.) Is there some existing option that I could piggy-back off of instead of adding a new one? 3. Is a compile-time option sufficient, or would the `mmap()` option need to be configurable at runtime, or even tested at repository create time as is done for some other filesystem properties in `create_default_files()`? Michael [1] https://public-inbox.org/git/cover.1505319366.git.mhag...@alum.mit.edu/ [2] https://public-inbox.org/git/alpine.DEB.2.21.1.1709142101560.4132@virtualbox/ https://public-inbox.org/git/alpine.DEB.2.21.1.1709150012550.219280@virtualbox/ [3] https://github.com/mhagger/git Jeff King (1): pr
[PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix
From: Jeff King If the underlying iterator is ordered, then `prefix_ref_iterator` can stop as soon as it sees a refname that comes after the prefix. This will rarely make a big difference now, because `ref_cache_iterator` only iterates over the directory containing the prefix (and usually the prefix will span a whole directory anyway). But if *hint, hint* a future reference backend doesn't itself know where to stop the iteration, then this optimization will be a big win. Note that there is no guarantee that the underlying iterator doesn't include output preceding the prefix, so we have to skip over any unwanted references before we get to the ones that we want. Signed-off-by: Jeff King Signed-off-by: Michael Haggerty --- refs/iterator.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/refs/iterator.c b/refs/iterator.c index c475360f0a..bd35da4e62 100644 --- a/refs/iterator.c +++ b/refs/iterator.c @@ -287,6 +287,20 @@ struct prefix_ref_iterator { int trim; }; +/* Return -1, 0, 1 if refname is before, inside, or after the prefix. */ +static int compare_prefix(const char *refname, const char *prefix) +{ + while (*prefix) { + if (*refname != *prefix) + return ((unsigned char)*refname < (unsigned char)*prefix) ? -1 : +1; + + refname++; + prefix++; + } + + return 0; +} + static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator) { struct prefix_ref_iterator *iter = @@ -294,9 +308,25 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator) int ok; while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) { - if (!starts_with(iter->iter0->refname, iter->prefix)) + int cmp = compare_prefix(iter->iter0->refname, iter->prefix); + + if (cmp < 0) continue; + if (cmp > 0) { + /* +* If the source iterator is ordered, then we +* can stop the iteration as soon as we see a +* refname that comes after the prefix: +*/ + if (iter->iter0->ordered) { + ok = ref_iterator_abort(iter->iter0); + break; + } else { + continue; + } + } + if (iter->trim) { /* * It is nonsense to trim off characters that -- 2.14.1
[PATCH v2 15/21] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`
Now that we have an efficient way to iterate, in order, over the mmapped contents of the `packed-refs` file, we can use that directly to implement reference iteration for the `packed_ref_store`, rather than iterating over the `ref_cache`. This is the next step towards getting rid of the `ref_cache` entirely. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 109 -- 1 file changed, 106 insertions(+), 3 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index f336690f16..53d0b7e3d6 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -397,6 +397,27 @@ static int cmp_packed_ref_entries(const void *v1, const void *v2) } } +/* + * Compare a packed-refs record pointed to by `rec` to the specified + * NUL-terminated refname. + */ +static int cmp_entry_to_refname(const char *rec, const char *refname) +{ + const char *r1 = rec + GIT_SHA1_HEXSZ + 1; + const char *r2 = refname; + + while (1) { + if (*r1 == '\n') + return *r2 ? -1 : 0; + if (!*r2) + return 1; + if (*r1 != *r2) + return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; + r1++; + r2++; + } +} + /* * `packed_refs->buf` is not known to be sorted. Check whether it is, * and if not, sort it into new memory and munmap/free the old @@ -503,6 +524,17 @@ static const char *find_start_of_record(const char *buf, const char *p) return p; } +/* + * Return a pointer to the start of the record following the record + * that contains `*p`. If none is found before `end`, return `end`. + */ +static const char *find_end_of_record(const char *p, const char *end) +{ + while (++p < end && (p[-1] != '\n' || p[0] == '^')) + ; + return p; +} + /* * We want to be able to compare mmapped reference records quickly, * without totally parsing them. We can do so because the records are @@ -591,6 +623,65 @@ static int load_contents(struct packed_ref_cache *packed_refs) return 1; } +/* + * Find the place in `cache->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted. In the latter mode, `refname` doesn't have to be a proper + * reference name; for example, one could search for "refs/replace/" + * to find the start of any replace references. + * + * The record is sought using a binary search, so `cache->buf` must be + * sorted. + */ +static const char *find_reference_location(struct packed_ref_cache *cache, + const char *refname, int mustexist) +{ + /* +* This is not *quite* a garden-variety binary search, because +* the data we're searching is made up of records, and we +* always need to find the beginning of a record to do a +* comparison. A "record" here is one line for the reference +* itself and zero or one peel lines that start with '^'. Our +* loop invariant is described in the next two comments. +*/ + + /* +* A pointer to the character at the start of a record whose +* preceding records all have reference names that come +* *before* `refname`. +*/ + const char *lo = cache->buf + cache->header_len; + + /* +* A pointer to a the first character of a record whose +* reference name comes *after* `refname`. +*/ + const char *hi = cache->eof; + + while (lo < hi) { + const char *mid, *rec; + int cmp; + + mid = lo + (hi - lo) / 2; + rec = find_start_of_record(lo, mid); + cmp = cmp_entry_to_refname(rec, refname); + if (cmp < 0) { + lo = find_end_of_record(mid, hi); + } else if (cmp > 0) { + hi = rec; + } else { + return rec; + } + } + + if (mustexist) + return NULL; + else + return lo; +} + /* * Read from the `packed-refs` file into a newly-allocated * `packed_ref_cache` and return it. The return value will already @@ -887,6 +978,8 @@ static struct ref_iterator *packed_ref_iterator_begin( const char *prefix, unsigned int flags) { struct packed_ref_store *refs; + struct packed_ref_cache *packed_refs; + const char *start; struct packed_ref_iterator *iter; struct ref_iterator *ref_iterator; unsigned int required_flags = REF_STORE_READ; @@ -904,13 +997,23 @@ static struct ref_iterator *packed_ref_iterator_begin( * the packed-ref cache is up to date with what is on dis
[PATCH v2 10/21] mmapped_ref_iterator: add iterator over a packed-refs file
Add a new `mmapped_ref_iterator`, which can iterate over the references in an mmapped `packed-refs` file directly. Use this iterator from `read_packed_refs()` to fill the packed refs cache. Note that we are not yet willing to promise that the new iterator generates its output in order. That doesn't matter for now, because the packed refs cache doesn't care what order it is filled. This change adds a lot of boilerplate without providing any obvious benefits. The benefits will come soon, when we get rid of the `ref_cache` for packed references altogether. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 207 -- 1 file changed, 152 insertions(+), 55 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index ae276f3445..312116a99d 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -163,6 +163,141 @@ static NORETURN void die_invalid_line(const char *path, } +/* + * An iterator over a packed-refs file that is currently mmapped. + */ +struct mmapped_ref_iterator { + struct ref_iterator base; + + struct packed_ref_cache *packed_refs; + + /* The current position in the mmapped file: */ + const char *pos; + + /* The end of the mmapped file: */ + const char *eof; + + struct object_id oid, peeled; + + struct strbuf refname_buf; +}; + +static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator) +{ + struct mmapped_ref_iterator *iter = + (struct mmapped_ref_iterator *)ref_iterator; + const char *p = iter->pos, *eol; + + strbuf_reset(&iter->refname_buf); + + if (iter->pos == iter->eof) + return ref_iterator_abort(ref_iterator); + + iter->base.flags = REF_ISPACKED; + + if (iter->eof - p < GIT_SHA1_HEXSZ + 2 || + parse_oid_hex(p, &iter->oid, &p) || + !isspace(*p++)) + die_invalid_line(iter->packed_refs->refs->path, +iter->pos, iter->eof - iter->pos); + + eol = memchr(p, '\n', iter->eof - p); + if (!eol) + die_unterminated_line(iter->packed_refs->refs->path, + iter->pos, iter->eof - iter->pos); + + strbuf_add(&iter->refname_buf, p, eol - p); + iter->base.refname = iter->refname_buf.buf; + + if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) { + if (!refname_is_safe(iter->base.refname)) + die("packed refname is dangerous: %s", + iter->base.refname); + oidclr(&iter->oid); + iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN; + } + if (iter->packed_refs->peeled == PEELED_FULLY || + (iter->packed_refs->peeled == PEELED_TAGS && +starts_with(iter->base.refname, "refs/tags/"))) + iter->base.flags |= REF_KNOWS_PEELED; + + iter->pos = eol + 1; + + if (iter->pos < iter->eof && *iter->pos == '^') { + p = iter->pos + 1; + if (iter->eof - p < GIT_SHA1_HEXSZ + 1 || + parse_oid_hex(p, &iter->peeled, &p) || + *p++ != '\n') + die_invalid_line(iter->packed_refs->refs->path, +iter->pos, iter->eof - iter->pos); + iter->pos = p; + + /* +* Regardless of what the file header said, we +* definitely know the value of *this* reference: +*/ + iter->base.flags |= REF_KNOWS_PEELED; + } else { + oidclr(&iter->peeled); + } + + return ITER_OK; +} + +static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator, + struct object_id *peeled) +{ + struct mmapped_ref_iterator *iter = + (struct mmapped_ref_iterator *)ref_iterator; + + if ((iter->base.flags & REF_KNOWS_PEELED)) { + oidcpy(peeled, &iter->peeled); + return is_null_oid(&iter->peeled) ? -1 : 0; + } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) { + return -1; + } else { + return !!peel_object(iter->oid.hash, peeled->hash); + } +} + +static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator) +{ + struct mmapped_ref_iterator *iter = + (struct mmapped_ref_iterator *)ref_iterator; + + release_packed_ref_cache(iter->packed_refs); + strbuf_release(&iter->refname_buf); + base_ref_iterator_free(ref_iterator); + return ITER_DONE; +} + +static struct ref_iterator_vtable mmapped_ref_iterator_vtable = { + mmapped_ref_iterator_advance, + mmapped_ref_iterator_peel, + mmapped_ref_iterator_abort +}; + +struct ref_iterator *mmapped_ref_iterator_begin( + const char *packed_refs_file, +
[PATCH v2 18/21] packed_ref_store: get rid of the `ref_cache` entirely
Now that everything has been changed to read what it needs directly out of the `packed-refs` file, `packed_ref_store` doesn't need to maintain a `ref_cache` at all. So get rid of it. First of all, this will save a lot of memory and lots of little allocations. Instead of needing to store complicated parsed data structures in memory, we just mmap the file (potentially sharing memory with other processes) and parse only what we need. Moreover, since the mmapped access to the file reads only the parts of the file that it needs, this might save reading all of the data from disk at all (at least if the file starts out sorted). Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 29 ++--- 1 file changed, 2 insertions(+), 27 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 6a930a440c..b4a01637a6 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -45,8 +45,6 @@ struct packed_ref_cache { */ struct packed_ref_store *refs; - struct ref_cache *cache; - /* Is the `packed-refs` file currently mmapped? */ int mmapped; @@ -148,7 +146,6 @@ static void release_packed_ref_buffer(struct packed_ref_cache *packed_refs) static int release_packed_ref_cache(struct packed_ref_cache *packed_refs) { if (!--packed_refs->referrers) { - free_ref_cache(packed_refs->cache); stat_validity_clear(&packed_refs->validity); release_packed_ref_buffer(packed_refs); free(packed_refs); @@ -718,15 +715,10 @@ static const char *find_reference_location(struct packed_ref_cache *cache, static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) { struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs)); - struct ref_dir *dir; - struct ref_iterator *iter; int sorted = 0; - int ok; packed_refs->refs = refs; acquire_packed_ref_cache(packed_refs); - packed_refs->cache = create_ref_cache(NULL, NULL); - packed_refs->cache->root->flag &= ~REF_INCOMPLETE; packed_refs->peeled = PEELED_NONE; if (!load_contents(packed_refs)) @@ -799,23 +791,6 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) packed_refs->eof = buf_copy + size; } - dir = get_ref_dir(packed_refs->cache->root); - iter = mmapped_ref_iterator_begin( - packed_refs, - packed_refs->buf + packed_refs->header_len, - packed_refs->eof); - while ((ok = ref_iterator_advance(iter)) == ITER_OK) { - struct ref_entry *entry = - create_ref_entry(iter->refname, iter->oid, iter->flags); - - if ((iter->flags & REF_KNOWS_PEELED)) - ref_iterator_peel(iter, &entry->u.value.peeled); - add_ref_entry(dir, entry); - } - - if (ok != ITER_DONE) - die("error reading packed-refs file %s", refs->path); - return packed_refs; } @@ -974,8 +949,8 @@ static struct ref_iterator *packed_ref_iterator_begin( else start = packed_refs->buf + packed_refs->header_len; - iter->iter0 = mmapped_ref_iterator_begin( - packed_refs, start, packed_refs->eof); + iter->iter0 = mmapped_ref_iterator_begin(packed_refs, +start, packed_refs->eof); iter->flags = flags; -- 2.14.1
[PATCH v2 04/21] die_unterminated_line(), die_invalid_line(): new functions
Extract some helper functions for reporting errors. While we're at it, prevent them from spewing unlimited output to the terminal. These functions will soon have more callers. These functions accept the problematic line as a `(ptr, len)` pair rather than a NUL-terminated string, and `die_invalid_line()` checks for an EOL itself, because these calling conventions will be convenient for future callers. (Efficiency is not a concern here because these functions are only ever called if the `packed-refs` file is corrupt.) Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 28 +--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index a3d9210cb0..5c50c223ef 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -161,6 +161,29 @@ static const char *parse_ref_line(struct strbuf *line, struct object_id *oid) return ref; } +static NORETURN void die_unterminated_line(const char *path, + const char *p, size_t len) +{ + if (len < 80) + die("unterminated line in %s: %.*s", path, (int)len, p); + else + die("unterminated line in %s: %.75s...", path, p); +} + +static NORETURN void die_invalid_line(const char *path, + const char *p, size_t len) +{ + const char *eol = memchr(p, '\n', len); + + if (!eol) + die_unterminated_line(path, p, len); + else if (eol - p < 80) + die("unexpected line in %s: %.*s", path, (int)(eol - p), p); + else + die("unexpected line in %s: %.75s...", path, p); + +} + /* * Read from the `packed-refs` file into a newly-allocated * `packed_ref_cache` and return it. The return value will already @@ -227,7 +250,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) const char *traits; if (!line.len || line.buf[line.len - 1] != '\n') - die("unterminated line in %s: %s", refs->path, line.buf); + die_unterminated_line(refs->path, line.buf, line.len); if (skip_prefix(line.buf, "# pack-refs with:", &traits)) { if (strstr(traits, " fully-peeled ")) @@ -266,8 +289,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) */ last->flag |= REF_KNOWS_PEELED; } else { - strbuf_setlen(&line, line.len - 1); - die("unexpected line in %s: %s", refs->path, line.buf); + die_invalid_line(refs->path, line.buf, line.len); } } -- 2.14.1
[PATCH v2 08/21] read_packed_refs(): read references with minimal copying
Instead of copying data from the `packed-refs` file one line at time and then processing it, process the data in place as much as possible. Also, instead of processing one line per iteration of the main loop, process a reference line plus its corresponding peeled line (if present) together. Note that this change slightly tightens up the parsing of the `parse-ref` file. Previously, the parser would have accepted multiple "peeled" lines for a single reference (ignoring all but the last one). Now it would reject that. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 101 -- 1 file changed, 40 insertions(+), 61 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index a45e3ff92f..2b80f244c8 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -134,33 +134,6 @@ static void clear_packed_ref_cache(struct packed_ref_store *refs) } } -/* The length of a peeled reference line in packed-refs, including EOL: */ -#define PEELED_LINE_LENGTH 42 - -/* - * Parse one line from a packed-refs file. Write the SHA1 to sha1. - * Return a pointer to the refname within the line (null-terminated), - * or NULL if there was a problem. - */ -static const char *parse_ref_line(struct strbuf *line, struct object_id *oid) -{ - const char *ref; - - if (parse_oid_hex(line->buf, oid, &ref) < 0) - return NULL; - if (!isspace(*ref++)) - return NULL; - - if (isspace(*ref)) - return NULL; - - if (line->buf[line->len - 1] != '\n') - return NULL; - line->buf[--line->len] = 0; - - return ref; -} - static NORETURN void die_unterminated_line(const char *path, const char *p, size_t len) { @@ -221,8 +194,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) size_t size; char *buf; const char *pos, *eol, *eof; - struct ref_entry *last = NULL; - struct strbuf line = STRBUF_INIT; + struct strbuf tmp = STRBUF_INIT; enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE; struct ref_dir *dir; @@ -264,9 +236,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) if (!eol) die_unterminated_line(refs->path, pos, eof - pos); - strbuf_add(&line, pos, eol - pos); + strbuf_add(&tmp, pos, eol - pos); - if (!skip_prefix(line.buf, "# pack-refs with:", (const char **)&p)) + if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char **)&p)) die_invalid_line(refs->path, pos, eof - pos); string_list_split_in_place(&traits, p, ' ', -1); @@ -281,61 +253,68 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) pos = eol + 1; string_list_clear(&traits, 0); - strbuf_reset(&line); + strbuf_reset(&tmp); } dir = get_ref_dir(packed_refs->cache->root); while (pos < eof) { + const char *p = pos; struct object_id oid; const char *refname; + int flag = REF_ISPACKED; + struct ref_entry *entry = NULL; - eol = memchr(pos, '\n', eof - pos); + if (eof - pos < GIT_SHA1_HEXSZ + 2 || + parse_oid_hex(p, &oid, &p) || + !isspace(*p++)) + die_invalid_line(refs->path, pos, eof - pos); + + eol = memchr(p, '\n', eof - p); if (!eol) die_unterminated_line(refs->path, pos, eof - pos); - strbuf_add(&line, pos, eol + 1 - pos); + strbuf_add(&tmp, p, eol - p); + refname = tmp.buf; + + if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) { + if (!refname_is_safe(refname)) + die("packed refname is dangerous: %s", refname); + oidclr(&oid); + flag |= REF_BAD_NAME | REF_ISBROKEN; + } + if (peeled == PEELED_FULLY || + (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/"))) + flag |= REF_KNOWS_PEELED; + entry = create_ref_entry(refname, &oid, flag); + add_ref_entry(dir, entry); - refname = parse_ref_line(&line, &oid); - if (refname) { - int flag = REF_ISPACKED; + pos = eol + 1; + + if (pos < eof && *pos == '^') { + p = pos + 1; + if (eof - p < GIT_SHA1_HEXSZ + 1 || + parse_oid_hex(p, &entry->u.value.peeled, &p) || + *p++ != '\n')
[PATCH v2 05/21] read_packed_refs(): use mmap to read the `packed-refs` file
It's still done in a pretty stupid way, involving more data copying than necessary. That will improve in future commits. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 42 -- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 5c50c223ef..154abbd83a 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -215,8 +215,12 @@ static NORETURN void die_invalid_line(const char *path, */ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) { - FILE *f; struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs)); + int fd; + struct stat st; + size_t size; + char *buf; + const char *pos, *eol, *eof; struct ref_entry *last = NULL; struct strbuf line = STRBUF_INIT; enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE; @@ -227,8 +231,8 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) packed_refs->cache = create_ref_cache(NULL, NULL); packed_refs->cache->root->flag &= ~REF_INCOMPLETE; - f = fopen(refs->path, "r"); - if (!f) { + fd = open(refs->path, O_RDONLY); + if (fd < 0) { if (errno == ENOENT) { /* * This is OK; it just means that no @@ -241,16 +245,27 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) } } - stat_validity_update(&packed_refs->validity, fileno(f)); + stat_validity_update(&packed_refs->validity, fd); + + if (fstat(fd, &st) < 0) + die_errno("couldn't stat %s", refs->path); + + size = xsize_t(st.st_size); + buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + pos = buf; + eof = buf + size; dir = get_ref_dir(packed_refs->cache->root); - while (strbuf_getwholeline(&line, f, '\n') != EOF) { + while (pos < eof) { struct object_id oid; const char *refname; const char *traits; - if (!line.len || line.buf[line.len - 1] != '\n') - die_unterminated_line(refs->path, line.buf, line.len); + eol = memchr(pos, '\n', eof - pos); + if (!eol) + die_unterminated_line(refs->path, pos, eof - pos); + + strbuf_add(&line, pos, eol + 1 - pos); if (skip_prefix(line.buf, "# pack-refs with:", &traits)) { if (strstr(traits, " fully-peeled ")) @@ -258,7 +273,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) else if (strstr(traits, " peeled ")) peeled = PEELED_TAGS; /* perhaps other traits later as well */ - continue; + goto next_line; } refname = parse_ref_line(&line, &oid); @@ -291,11 +306,18 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs) } else { die_invalid_line(refs->path, line.buf, line.len); } + + next_line: + /* The "+ 1" is for the LF character. */ + pos = eol + 1; + strbuf_reset(&line); } - fclose(f); - strbuf_release(&line); + if (munmap(buf, size)) + die_errno("error ummapping packed-refs file"); + close(fd); + strbuf_release(&line); return packed_refs; } -- 2.14.1
[PATCH v2 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible
Keep a copy of the `packed-refs` file contents in memory for as long as a `packed_ref_cache` object is in use: * If the system allows it, keep the `packed-refs` file mmapped. * If not (either because the system doesn't support `mmap()` at all, or because a file that is currently mmapped cannot be replaced via `rename()`), then make a copy of the file's contents in heap-allocated space, and keep that around instead. We base the choice of behavior on a new build-time switch, `MMAP_PREVENTS_DELETE`. By default, this switch is set for Windows variants. This whole change is still pointless, because we only read the `packed-refs` file contents immediately after instantiating the `packed_ref_cache`. But that will soon change. Signed-off-by: Michael Haggerty --- Makefile | 10 +++ config.mak.uname | 3 + refs/packed-backend.c | 184 ++ 3 files changed, 155 insertions(+), 42 deletions(-) diff --git a/Makefile b/Makefile index f2bb7f2f63..7a49f99c4f 100644 --- a/Makefile +++ b/Makefile @@ -200,6 +200,9 @@ all:: # # Define NO_MMAP if you want to avoid mmap. # +# Define MMAP_PREVENTS_DELETE if a file that is currently mmapped cannot be +# deleted or cannot be replaced using rename(). +# # Define NO_SYS_POLL_H if you don't have sys/poll.h. # # Define NO_POLL if you do not have or don't want to use poll(). @@ -1383,6 +1386,13 @@ else COMPAT_OBJS += compat/win32mmap.o endif endif +ifdef MMAP_PREVENTS_DELETE + BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE +else + ifdef USE_WIN32_MMAP + BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE + endif +endif ifdef OBJECT_CREATION_USES_RENAMES COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1 endif diff --git a/config.mak.uname b/config.mak.uname index 6604b130f8..685a80d138 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -184,6 +184,7 @@ ifeq ($(uname_O),Cygwin) UNRELIABLE_FSTAT = UnfortunatelyYes SPARSE_FLAGS = -isystem /usr/include/w32api -Wno-one-bit-signed-bitfield OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo + MMAP_PREVENTS_DELETE = UnfortunatelyYes COMPAT_OBJS += compat/cygwin.o FREAD_READS_DIRECTORIES = UnfortunatelyYes endif @@ -353,6 +354,7 @@ ifeq ($(uname_S),Windows) NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease NO_NSEC = YesPlease USE_WIN32_MMAP = YesPlease + MMAP_PREVENTS_DELETE = UnfortunatelyYes # USE_NED_ALLOCATOR = YesPlease UNRELIABLE_FSTAT = UnfortunatelyYes OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo @@ -501,6 +503,7 @@ ifneq (,$(findstring MINGW,$(uname_S))) NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease NO_NSEC = YesPlease USE_WIN32_MMAP = YesPlease + MMAP_PREVENTS_DELETE = UnfortunatelyYes USE_NED_ALLOCATOR = YesPlease UNRELIABLE_FSTAT = UnfortunatelyYes OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 0fe41a7203..4981516f1e 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -7,6 +7,35 @@ #include "../iterator.h" #include "../lockfile.h" +enum mmap_strategy { + /* +* Don't use mmap() at all for reading `packed-refs`. +*/ + MMAP_NONE, + + /* +* Can use mmap() for reading `packed-refs`, but the file must +* not remain mmapped. This is the usual option on Windows, +* where you cannot rename a new version of a file onto a file +* that is currently mmapped. +*/ + MMAP_TEMPORARY, + + /* +* It is OK to leave the `packed-refs` file mmapped while +* arbitrary other code is running. +*/ + MMAP_OK +}; + +#if defined(NO_MMAP) +static enum mmap_strategy mmap_strategy = MMAP_NONE; +#elif defined(MMAP_PREVENTS_DELETE) +static enum mmap_strategy mmap_strategy = MMAP_TEMPORARY; +#else +static enum mmap_strategy mmap_strategy = MMAP_OK; +#endif + struct packed_ref_store; struct packed_ref_cache { @@ -18,6 +47,21 @@ struct packed_ref_cache { struct ref_cache *cache; + /* Is the `packed-refs` file currently mmapped? */ + int mmapped; + + /* +* The contents of the `packed-refs` file. If the file is +* mmapped, this points at the mmapped contents of the file. +* If not, this points at heap-allocated memory containing the +* contents. If there were no contents (e.g., because the file +* didn't exist), `buf` and `eof` are both NULL. +*/ + char *buf, *eof; + + /* The size of the header line, if any; otherwise, 0: */ + size_t header_len; + /* * What is the peeled state of this cache? (This is usually * determined from the header of the "packed-refs" file.) @@ -76,6 +120,26 @@ static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs) packed_refs->
[PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read
It doesn't actually matter now, because the references are only iterated over to fill the associated `ref_cache`, which itself puts them in the correct order. But we want to get rid of the `ref_cache`, so we want to be able to iterate directly over the `packed-refs` buffer, and then the iteration will need to be ordered correctly. In fact, we already write the `packed-refs` file sorted, but it is possible that other Git clients don't get it right. So let's not assume that a `packed-refs` file is sorted unless it is explicitly declared to be so via a `sorted` trait in its header line. If it is *not* declared to be sorted, then scan quickly through the file to check. If it is found to be out of order, then sort the records into a new memory-only copy. This checking and sorting is done quickly, without parsing the full file contents. However, it needs a little bit of care to avoid reading past the end of the buffer even if the `packed-refs` file is corrupt. Since *we* always write the file correctly sorted, include that trait when we write or rewrite a `packed-refs` file. This means that the scan described in the previous paragraph should only have to be done for `packed-refs` files that were written by older versions of the Git command-line client, or by other clients that haven't yet learned to write the `sorted` trait. If `packed-refs` was already sorted, then (if the system allows it) we can use the mmapped file contents directly. But if the system doesn't allow a file that is currently mmapped to be replaced using `rename()`, then it would be bad for us to keep the file mmapped for any longer than necessary. So, on such systems, always make a copy of the file contents, either as part of the sorting process, or afterwards. Signed-off-by: Michael Haggerty --- refs/packed-backend.c | 223 +++--- 1 file changed, 212 insertions(+), 11 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 4981516f1e..f336690f16 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -51,11 +51,12 @@ struct packed_ref_cache { int mmapped; /* -* The contents of the `packed-refs` file. If the file is -* mmapped, this points at the mmapped contents of the file. -* If not, this points at heap-allocated memory containing the -* contents. If there were no contents (e.g., because the file -* didn't exist), `buf` and `eof` are both NULL. +* The contents of the `packed-refs` file. If the file was +* already sorted, this points at the mmapped contents of the +* file. If not, this points at heap-allocated memory +* containing the contents, sorted. If there were no contents +* (e.g., because the file didn't exist), `buf` and `eof` are +* both NULL. */ char *buf, *eof; @@ -358,7 +359,7 @@ struct ref_iterator *mmapped_ref_iterator_begin( if (!packed_refs->buf) return empty_ref_iterator_begin(); - base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 0); + base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 1); iter->packed_refs = packed_refs; acquire_packed_ref_cache(iter->packed_refs); @@ -371,6 +372,170 @@ struct ref_iterator *mmapped_ref_iterator_begin( return ref_iterator; } +struct packed_ref_entry { + const char *start; + size_t len; +}; + +static int cmp_packed_ref_entries(const void *v1, const void *v2) +{ + const struct packed_ref_entry *e1 = v1, *e2 = v2; + const char *r1 = e1->start + GIT_SHA1_HEXSZ + 1; + const char *r2 = e2->start + GIT_SHA1_HEXSZ + 1; + + while (1) { + if (*r1 == '\n') + return *r2 == '\n' ? 0 : -1; + if (*r1 != *r2) { + if (*r2 == '\n') + return 1; + else + return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; + } + r1++; + r2++; + } +} + +/* + * `packed_refs->buf` is not known to be sorted. Check whether it is, + * and if not, sort it into new memory and munmap/free the old + * storage. + */ +static void sort_packed_refs(struct packed_ref_cache *packed_refs) +{ + struct packed_ref_entry *entries = NULL; + size_t alloc = 0, nr = 0; + int sorted = 1; + const char *pos, *eof, *eol; + size_t len, i; + char *new_buffer, *dst; + + pos = packed_refs->buf + packed_refs->header_len; + eof = packed_refs->eof; + len = eof - pos; + + if (!len) + return; + + /* +* Initialize entries based on a crude estimate of the number +* of references in the file (we'll grow it below if needed): +*/ + ALLOC_GROW(entries, len / 80 + 20, alloc); + + while (pos < eof) { +
Re: RFC: Design and code of partial clones (now, missing commits and trees OK)
Jonathan Tan writes: > For those interested in partial clones and/or missing objects in repos, > I've updated my original partialclone patches to not require an explicit > list of promises. Fetch/clone still only permits exclusion of blobs, but > the infrastructure is there for a local repo to support missing trees > and commits as well. > ... > Demo > > > Obtain a repository. > > $ make prefix=$HOME/local install > $ cd $HOME/tmp > $ git clone https://github.com/git/git > > Make it advertise the new feature and allow requests for arbitrary blobs. > > $ git -C git config uploadpack.advertiseblobmaxbytes 1 > $ git -C git config uploadpack.allowanysha1inwant 1 > > Perform the partial clone and check that it is indeed smaller. Specify > "file://" in order to test the partial clone mechanism. (If not, Git will > perform a local clone, which unselectively copies every object.) > > $ git clone --blob-max-bytes=10 "file://$(pwd)/git" git2 > $ git clone "file://$(pwd)/git" git3 > $ du -sh git2 git3 > 116M git2 > 129M git3 > > Observe that the new repo is automatically configured to fetch missing objects > from the original repo. Subsequent fetches will also be partial. > > $ cat git2/.git/config > [core] > repositoryformatversion = 1 > filemode = true > bare = false > logallrefupdates = true > [remote "origin"] > url = [snip] > fetch = +refs/heads/*:refs/remotes/origin/* > blobmaxbytes = 10 > [extensions] > partialclone = origin > [branch "master"] > remote = origin > merge = refs/heads/master The above sequence of events make quite a lot of sense. And the following description of how it is designed (snipped) is clear enough (at least to me) to allow me to say that I quite like it.
Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer
On Mon, Sep 18, 2017 at 04:17:24PM -0700, Jonathan Nieder wrote: > phionah bugosi wrote: > > > Just to reecho a previous change requested before in one of the mail > > threads, we currently have two global variables declared: > > > > struct mru packed_git_mru_storage; > > struct mru *packed_git_mru = &packed_git_mru_storage; > > > > We normally use pointers in C to point or refer to the same location > > or space in memory from multiple places. That means in situations > > where we will have the pointer be assigned to in many places in our > > code. But in this case, we do not have any other logic refering or > > assigning to the pointer packed_git_mru. In simple words we actually > > dont need a pointer in this case. > > > > Therefore this patch makes packed_git_mru global a value instead of > > a pointer and makes use of list.h > > > > Signed-off-by: phionah bugosi > > --- > > builtin/pack-objects.c | 5 +++-- > > cache.h| 7 --- > > list.h | 6 ++ > > packfile.c | 12 ++-- > > 4 files changed, 15 insertions(+), 15 deletions(-) > > *puzzled* This appears to already be in "pu", with a different author. > Did you independently make the same change? Or are you asking for a > progress report on that change, and just phrasing it in a confusing > way? I pointed Phionah at your #leftoverbits comment in: https://public-inbox.org/git/20170912172839.gb144...@aiede.mtv.corp.google.com/ about moving packed_git_mru to list.h. But I'm afraid I wasn't very clear in giving further guidance. The goal is to build on _top_ of the patch in that message, and convert the doubly-linked list implementation used in "struct mru" to use the shared code in list.h. That should mostly involve touching mru.c and mru.h, though I think we'd need to tweak the name of the "next" pointer during the traversal, too. -Peff
Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors
On Mon, Sep 18, 2017 at 07:46:24PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > When we fail to open $GIT_DIR/info/alternates, we silently > > assume there are no alternates. This is the right thing to > > do for ENOENT, but not for other errors. > > > > A hard error is probably overkill here. If we fail to read > > an alternates file then either we'll complete our operation > > anyway, or we'll fail to find some needed object. Either > > way, a warning is good idea. And we already have a helper > > function to handle this pattern; let's just call > > warn_on_fopen_error(). > > I think I prefer a hard error. What kind of cases are you imagining > where it would be better to warn? > > E.g. for EIO, erroring out so that the user can try again seems better > than hoping that the application will be able to cope with the more > subtle error that comes from discovering some objects are missing. > > For EACCES, I can see how it makes sense to warn and move on, but no > other errors like that are occuring to me. > > Thoughts? Yes, EACCES is exactly the example that came to mind. But most importantly, we don't know at this point whether the alternate is even relevant to the current operation. So calling die() here would make some cases fail that would otherwise succeed. That's usually not a big deal, but we've had cases where being lazy about dying helps people use git itself to help repair the case. Admittedly most of those chicken-and-egg problems are centered around git-config (e.g., using "git config --edit" to repair broken config), so it's perhaps less important here. But it seems like a reasonable principle to follow in general. If there's a compelling reason to die hard, I'd reconsider it. But I can't really think of one (if we were _writing_ a new alternates file and encountered a read error of the old data we're copying, then I think it would be very important to bail before claiming a successful write. But that's not what's happening here). -Peff
Re: [PATCH 1/2] read_info_alternates: read contents into strbuf
On Mon, Sep 18, 2017 at 07:42:53PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > Reported-by: Michael Haggerty > > Signed-off-by: Jeff King > > --- > > sha1_file.c | 29 + > > 1 file changed, 9 insertions(+), 20 deletions(-) > > Thanks for tracking it down. To be fair, Michael did most of the work in identifying and bisecting the bug. He even wrote a very similar patch in parallel; I just swooped in at the end. > > path = xstrfmt("%s/info/alternates", relative_base); > > - fd = git_open(path); > > - free(path); > > - if (fd < 0) > > - return; > > - if (fstat(fd, &st) || (st.st_size == 0)) { > > - close(fd); > > + if (strbuf_read_file(&buf, path, 1024) < 0) { > > + free(path); > > return; > > strbuf_read_file is careful to release buf on failure, so this doesn't > leak (but it's a bit subtle). Yep. I didn't think it was worth calling out with a comment since the "don't allocate on failure" pattern is common to most of the strbuf functions. > What happened to the !st.st_size case? Is it eliminated for > simplicity? Good question, and the answer is yes. Obviously we can bail early on an empty file, but I don't think there's any reason to complicate the code with it (the original seems to come from d5a63b9983 (Alternate object pool mechanism updates., 2005-08-14), where it avoided a corner case that has long since been deleted. -Peff
Re: [PATCH 0/2] fix read past end of array in alternates files
On Mon, Sep 18, 2017 at 07:36:03PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > This series fixes a regression in v2.11.1 where we might read past the > > end of an mmap'd buffer. It was introduced in cf3c635210, > > The above information is super helpful. Can it go in one of the commit > messages? Er, didn't I? > > base the patch on there, for a few reasons: > > > > 1. There's a trivial conflict when merging up (because of > > git_open_noatime() becoming just git_open() in the inerim). > > > > 2. The reproduction advice relies on our SANITIZE Makefile knob, which > > didn't exist back then. > > > > 3. The second patch does not apply there because we don't have > > warn_on_fopen_errors(). Though admittedly it could be applied > > separately after merging up; it's just a clean-up on top. > > Even this part could go in a commit message, but it's fine for it not > to. IMHO this kind of meta information doesn't belong in the commit message. It's useful to the maintainer to know where to apply the patch, but I don't think it helps somebody who is reading "git log" output. -Peff
Re: [PATCH v2] Improve performance of git status --ignored
Jameson Miller writes: > Improve the performance of the directory listing logic when it wants to list > non-empty ignored directories. In order to show non-empty ignored directories, > the existing logic will recursively iterate through all contents of an ignored > directory. This change introduces the optimization to stop iterating through > the contents once it finds the first file. Wow, such an obviously correct optimization. Very nicely explained, too. > This can have a significant > improvement in 'git status --ignored' performance in repositories with a large > number of files in ignored directories. > > For an example of the performance difference on an example repository with > 196,000 files in 400 ignored directories: > > | Command| Time (s) | > | -- | - | > | git status | 1.2 | > | git status --ignored (old) | 3.9 | > | git status --ignored (new) | 1.4 | > > Signed-off-by: Jameson Miller > --- I wish all the contributions I have to accept are as nicely done as this one ;-) Thanks. > dir.c | 47 +-- > 1 file changed, 41 insertions(+), 6 deletions(-) > > diff --git a/dir.c b/dir.c > index 1c55dc3..1d17b80 100644 > --- a/dir.c > +++ b/dir.c > @@ -49,7 +49,7 @@ struct cached_dir { > static enum path_treatment read_directory_recursive(struct dir_struct *dir, > struct index_state *istate, const char *path, int len, > struct untracked_cache_dir *untracked, > - int check_only, const struct pathspec *pathspec); > + int check_only, int stop_at_first_file, const struct pathspec > *pathspec); We might want to make check_only and stop_at_first_file into a single "unsigned flags" used as a collection of bits, but we can wait until we start feeling the urge to add the third boolean parameter to this function (at which point I'd probably demand a preliminary clean-up to merge these two into a single flags word before adding the third one as a new bit in that word). > @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct > dir_struct *dir, > > untracked = lookup_untracked(dir->untracked, untracked, >dirname + baselen, len - baselen); > + > + /* > + * If this is an excluded directory, then we only need to check if > + * the directory contains any files. > + */ > return read_directory_recursive(dir, istate, dirname, len, > - untracked, 1, pathspec); > + untracked, 1, exclude, pathspec); Nicely explained in the in-code comment. I'd assume that you want your microsoft e-mail address used on the signed-off-by line appear as the author, so I'll tweak this a bit to make it so (otherwise, your 8...@gmail.com would become the author). Thanks.
Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository
Jonathan Nieder writes: >> test_expect_success 'showing the superproject correctly' ' > > With the two tweaks mentioned above, > Reviewed-by: Jonathan Nieder I agree with the fixes to the test titles suggested, so I'll queue the patch with the fixes squashed in. Hearing "yeah, the titles were copy-pasted without adjusting, thanks for fixing, Jonathan!" sent by Øystein would be super nice. Thanks, both.
Re: [PATCH] t/README: fix typo and grammatically improve a sentence
Kaartic Sivaraam writes: > Signed-off-by: Kaartic Sivaraam > --- > t/README | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/t/README b/t/README > index 2f95860369751..4b079e4494d93 100644 > --- a/t/README > +++ b/t/README > @@ -265,12 +265,12 @@ or: > > $ sh ./t9200-git-cvsexport-commit.sh --run='-3 21' > > -As noted above, the test set is built going though items left to > -right, so this: > +As noted above, the test set is built by going through the items > +from left to right, so this: > > $ sh ./t9200-git-cvsexport-commit.sh --run='1-4 !3' > > -will run tests 1, 2, and 4. Items that comes later have higher > +will run tests 1, 2, and 4. Items that come later have higher > precedence. It means that this: > > $ sh ./t9200-git-cvsexport-commit.sh --run='!3 1-4' > > -- > https://github.com/git/git/pull/404 Both of these look to me obvious improvements. I'll queue them unless other people object. Thanks.
Re: [RFC PATCH 0/2] Add named reference to latest push cert
Junio C Hamano writes: > But the point is that we do not want such an overhead in core, as > all of that is a useless waste of the cycle for a site that wants to > store the push certificate away outside of the repository itself. By this I do not mean that cert blobs shouldn't become part of the in-repository record in _all_ installations that receive signed push certificates. An easy to reuse example shipped together with our source would be a good thing, and an easy to enable sample hook may even be better. It's just that I do not want to have any "solution" in the core part that everybody that wants to accept push certs must run, if the "solution" is not something we can endorse as the best current practice. And I do not yet see how anything along the lines of the patch that started this thread, or its extention by wrapping them with commit chain, would become that "best current practice". A sample hook, on the other hand, is simpler for people to experiment and we might even come up with an unversally useful best current prctice out of such an experiment, though.
Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted
Ian Campbell writes: > This can be useful e.g. in `filter-branch` when rewritting tags produced by > older versions of Git, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel > source tree: > > $ git cat-file tag v2.6.12-rc2 > object 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 > type commit > tag v2.6.12-rc2 > > Linux v2.6.12-rc2 release > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.2.4 (GNU/Linux) > > iD8DBQBCbW8ZF3YsRnbiHLsRAgFRAKCq/TkuDaEombFABkPqYgGCgWN2lQCcC0qc > wznDbFU45A54dZC8RZ5JxyE= > =ESRP > -END PGP SIGNATURE- > > $ git cat-file tag v2.6.12-rc2 | git mktag > error: char76: could not find "tagger " > fatal: invalid tag signature file > $ git cat-file tag v2.6.12-rc2 | git mktag --allow-missing-tagger > 9e734775f7c22d2f89943ad6c745571f1930105f > > To that end, pass the new option to `mktag` in `filter-branch`. Hmph. I cannot shake this nagging feeling that this is probably a solution that is overly narrow to a single problem that won't scale into the future. As there is no guarantee that "missing-tagger" will stay to be the only kind of broken tag objects we'd produce in one version, later we notice our mistake and then forbid in another version, with the approach to add '--allow-missing-tagger' optoin would imply that we'd end up adding more and more such options, and filter-branch will need to use all these '--allow-this-breakage' options we would ever add. Even though I fully agree with the problem you are trying to solve (i.e. we want to be able to replay an old history without our tool rejecting the data we have), it was my first reaction when I read this patch. IOW, my first reaction was "perhaps a single option '--allow-broken' to cover the currently known and any future shape of malformat over tag data is more appropriate". But then, if we look at the body of cmd_mktag(), it looks like this: int cmd_mktag(...) { read input into strbuf buf; call verify_tag on buf to sanity check; call write_sha1_file() the contents of buf as a tag; report the object name; } If we drop the "verification" step from the above, that essentially becomes an equivaent to "hash-object -t tag -w --stdin". So I now have to wonder if it may be sufficient to use "hash-object" in filter-branch, without doing this "allow malformed data that we would not permit if the tag were being created today, only to help replaying an old, already broken data" change to "git mktag". Is there something that makes "hash-object" insufficient (like it still does some extra checks we would want to disable and cannot work as a replacement for your "--allow-missing-tagger")? Thanks. > Signed-off-by: Ian Campbell > --- > Documentation/git-mktag.txt | 9 +++- > builtin/mktag.c | 100 > +--- > git-filter-branch.sh| 2 +- > t/t3800-mktag.sh| 33 ++- > 4 files changed, 98 insertions(+), 46 deletions(-) > > diff --git a/Documentation/git-mktag.txt b/Documentation/git-mktag.txt > index fa6a75612..c720c7419 100644 > --- a/Documentation/git-mktag.txt > +++ b/Documentation/git-mktag.txt > @@ -9,7 +9,7 @@ git-mktag - Creates a tag object > SYNOPSIS > > [verse] > -'git mktag' > +'git mktag' [--allow-missing-tagger] > > DESCRIPTION > --- > @@ -34,6 +34,13 @@ exists, is separated by a blank line from the header. The > message part may contain a signature that Git itself doesn't > care about, but that can be verified with gpg. > > +OPTIONS > +--- > +--allow-missing-tagger:: > + Allow the `tagger` line in the header to be omitted. This is > + rarely desirable but may be useful in recreating tags created > + by older Git. > + > GIT > --- > Part of the linkgit:git[1] suite > diff --git a/builtin/mktag.c b/builtin/mktag.c > index 031b750f0..0f5dae8d5 100644 > --- a/builtin/mktag.c > +++ b/builtin/mktag.c > @@ -1,4 +1,5 @@ > #include "builtin.h" > +#include "parse-options.h" > #include "tag.h" > > /* > @@ -15,6 +16,8 @@ > * the shortest possible tagger-line. > */ > > +static int allow_missing_tagger; > + > /* > * We refuse to tag something we can't verify. Just because. > */ > @@ -41,8 +44,9 @@ static int verify_tag(char *buffer, unsigned long size) > unsigned char sha1[20]; > const char *object, *type_line, *tag_line, *tagger_line, *lb, *rb; > size_t len; > + const unsigned long min_size = allow_missing_tagger ? 71 : 84; > > - if (size < 84) > + if (size < min_size) > return error("wanna fool me ? you obviously got the size wrong > !"); > > buffer[size] = 0; > @@ -98,46 +102,46 @@ static int verify_tag(char *buffer, unsigned long size) > /* Verify the tagger line */ > tagger_line = tag_line; > > -
Re: [PATCH 2/2] t9010-*.sh: skip all tests if the PIPE prereq is missing
Ramsay Jones wrote: > Every test in this file, except one, is marked with the PIPE prereq. > However, that lone test ('set up svn repo'), only performs some setup > work and checks whether the following test should be executed (by > setting an additional SVNREPO prerequisite). Since the following test > also requires the PIPE prerequisite, performing the setup test, when the > PIPE preequisite is missing, is simply wasted effort. Use the skip-all > test facility to skip all tests when the PIPE prerequisite is missing. > > Signed-off-by: Ramsay Jones > --- > t/t9010-svn-fe.sh | 55 > --- > 1 file changed, 28 insertions(+), 27 deletions(-) It was always the intention that this test script eventually be able to run on Windows, but it was not meant to be. It would need to use a socket or something for backflow to work on Windows. Reviewed-by: Jonathan Nieder
Re: [PATCH 1/2] test-lib: use more compact expression in PIPE prerequisite
Ramsay Jones wrote: > Signed-off-by: Ramsay Jones > --- > t/test-lib.sh | 10 ++ > 1 file changed, 2 insertions(+), 8 deletions(-) Reviewed-by: Jonathan Nieder
Re: [PATCH 2/2] read_info_alternates: warn on non-trivial errors
Jeff King wrote: > When we fail to open $GIT_DIR/info/alternates, we silently > assume there are no alternates. This is the right thing to > do for ENOENT, but not for other errors. > > A hard error is probably overkill here. If we fail to read > an alternates file then either we'll complete our operation > anyway, or we'll fail to find some needed object. Either > way, a warning is good idea. And we already have a helper > function to handle this pattern; let's just call > warn_on_fopen_error(). I think I prefer a hard error. What kind of cases are you imagining where it would be better to warn? E.g. for EIO, erroring out so that the user can try again seems better than hoping that the application will be able to cope with the more subtle error that comes from discovering some objects are missing. For EACCES, I can see how it makes sense to warn and move on, but no other errors like that are occuring to me. Thoughts? Thanks, Jonathan > Note that technically the errno from strbuf_read_file() > might be from a read() error, not open(). But since read() > would never return ENOENT or ENOTDIR, and since it produces > a generic "unable to access" error, it's suitable for > handling errors from either.
Re: [PATCH 1/2] read_info_alternates: read contents into strbuf
Jeff King wrote: > Reported-by: Michael Haggerty > Signed-off-by: Jeff King > --- > sha1_file.c | 29 + > 1 file changed, 9 insertions(+), 20 deletions(-) Thanks for tracking it down. Reviewed-by: Jonathan Nieder [...] > +++ b/sha1_file.c [...] > @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int > len, int sep, > > static void read_info_alternates(const char * relative_base, int depth) > { > - char *map; > - size_t mapsz; > - struct stat st; > char *path; > - int fd; > + struct strbuf buf = STRBUF_INIT; > > path = xstrfmt("%s/info/alternates", relative_base); > - fd = git_open(path); > - free(path); > - if (fd < 0) > - return; > - if (fstat(fd, &st) || (st.st_size == 0)) { > - close(fd); > + if (strbuf_read_file(&buf, path, 1024) < 0) { > + free(path); > return; strbuf_read_file is careful to release buf on failure, so this doesn't leak (but it's a bit subtle). What happened to the !st.st_size case? Is it eliminated for simplicity? The rest looks good. Thanks, Jonathan
Re: [PATCH 0/2] fix read past end of array in alternates files
Jeff King wrote: > This series fixes a regression in v2.11.1 where we might read past the > end of an mmap'd buffer. It was introduced in cf3c635210, The above information is super helpful. Can it go in one of the commit messages? > but I didn't > base the patch on there, for a few reasons: > > 1. There's a trivial conflict when merging up (because of > git_open_noatime() becoming just git_open() in the inerim). > > 2. The reproduction advice relies on our SANITIZE Makefile knob, which > didn't exist back then. > > 3. The second patch does not apply there because we don't have > warn_on_fopen_errors(). Though admittedly it could be applied > separately after merging up; it's just a clean-up on top. Even this part could go in a commit message, but it's fine for it not to. Thanks, Jonathan
Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository
Hi, Øystein Walle wrote: > Running `git fetch --unshallow` on a repo that is not in fact shallow > produces a fatal error message. Hm, can you say more about the context? From a certain point of view, it might make sense for that command to succeed instead: if the repo is already unshallow, then why should't "fetch --unshallow" complain instead of declaring victory? > Add a helper to rev-parse that scripters > can use to determine whether a repo is shallow or not. > > Signed-off-by: Øystein Walle > --- > Documentation/git-rev-parse.txt | 3 +++ > builtin/rev-parse.c | 5 + > t/t1500-rev-parse.sh| 15 +++ > 3 files changed, 23 insertions(+) Regardless, this new rev-parse --is-shallow helper looks like a good feature. [...] > --- a/builtin/rev-parse.c > +++ b/builtin/rev-parse.c > @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const > char *prefix) > : "false"); > continue; > } > + if (!strcmp(arg, "--is-shallow-repository")) { > + printf("%s\n", is_repository_shallow() ? "true" > + : "false"); > + continue; > + } The implementation is straightforward and correct. [...] > --- a/t/t1500-rev-parse.sh > +++ b/t/t1500-rev-parse.sh Thanks for writing tests. \o/ > @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' ' > test_cmp expect actual > ' > > +test_expect_success 'git-path shallow repository' ' What does git-path mean here? I wonder if it's a copy/paste error. Did you mean something like test_expect_success 'rev-parse --is-shallow-repository in shallow repo' ' ? > + test_commit test_commit && > + echo true >expect && > + git clone --depth 1 --no-local . shallow && > + test_when_finished "rm -rf shallow" && > + git -C shallow rev-parse --is-shallow-repository >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'git-path notshallow repository' ' Likewise: should this be test_expect_success 'rev-parse --is-shallow-repository in non-shallow repo' ' ? > + echo false >expect && > + git rev-parse --is-shallow-repository >actual && > + test_cmp expect actual > +' > + > test_expect_success 'showing the superproject correctly' ' With the two tweaks mentioned above, Reviewed-by: Jonathan Nieder Thanks.
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Johannes Schindelin writes: >> Do you have a concrete suggestion to make these individual entries more >> helpful for people who may want go back to the original thread in doing >> so? In-reply-to: or References: fields of the "What's cooking" report >> would not help. I often have the message IDs that made/helped me make >> these individual comments in the description; they alone would not react >> to mouse clicks, though. > > Oh gawd, not even more stuff piled onto the mail format. Please stop. > ... > It probably tries to serve too many purposes at the same time, and thereby > none. Well, this was started as my attempt to give a public service that shows a summary of what is happening in the entire integration tree, as there was nothing like that before (and going to github.com and looking at 'pu' branch would not give you an easy overview). As many people contribute many topics to the project, complaining that it talks about too many topics would not get you anywhere. If you find "What's cooking" report not serving your needs, and if no one finds it not serving his or her needs, then I can stop sending these out, of course, but I am not getting the impression that we are at that point, at least not yet.
Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)
Lars Schneider writes: > SZEDER noticing a bug in this series that I was about to fix: > https://public-inbox.org/git/3b175d35-5b1c-43cd-a7e9-85693335b...@gmail.com/ > > I assume at this point a new patch is better than a re-roll, right? Thanks, yes indeed.
Re: [RFC PATCH 0/2] Add named reference to latest push cert
Santiago Torres writes: > - *if there is a hook* the blob is computed, but it is up to the > hook itself to store it *somewhere*. This makes me feel like it's > somewhat of a useless waste of computation if the hook is not > meant to handle it anyway(which is just a post-receive hook). I > find it rather weird that --signed is a builtin flag, and is > handled on the server side only partially (just my two cents). The way it was envisioned to be used is that the repository meant to be protected by collected push certs may not be trusted as the permanent store for push certs by all hosting sites. The hook may be told the name of a blob to read its contents and is expected to store it away to somewhere else. The only reason why we use blob is because creating a blob in respose to pushes being executed in parallel will result in different blobs unless there is hash collision. Instead of us having to come up with and use a different mechanism to create a unique temporary filename and feed that to hook, reusing blob as such was the simplest. > I understand the concurrency concerns, so I agree with stefan's > solution, although I don't know how big of a deal it would be, if git > already supports --atomic pushes (admittedly, I haven't checked if there > are any guards in place for someone who pushes millions of refs > atomically). It'd be completely understandable to have a setting to > disable hadnling of --signed pushes and, ideally, a way to warn the user > of this feature being disabled if --signed is sent. I do not think atomic helps at all, when one atomic push updates branch A while another atomic push updates branch B. They can still go in parallel, and their certificates must both be stored. You can somehow serialize them and create a single strand of pearls to advance a single ref, or you can let both to fork two histories to store the push certs from these two pushes and have somebody create a merge commit to join the history. But the point is that we do not want such an overhead in core, as all of that is a useless waste of the cycle for a site that wants to store the push certificate away outside of the repository itself.
Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()
Derrick Stolee writes: >> But I do not think we want this "clever" optimization that involves >> 'n' in the first place. + while (count++ < 10) { >>> + for (i = 0; i < n; i++) >>> + ((unsigned int*)oid.hash)[i] = hash_base; >> >> Does it make sense to assume that uint is always 4-byte (so this >> code won't work if it is 8-byte on your platform) and doing this is >> faster than using platform-optimized memcpy()? > > I'm not sure what you mean by using memcpy to improve this, because > it would require calling memcpy in the inner loop, such as > > for (i = 0; i < n; i++) > memcpy(oid.hash + i * sizeof(unsigned), &hash_base, > sizeof(unsigned)); Sorry, I left it without saying as I thought it was obvious, but what I meant was to use a whole "struct oid", not just a single unsigned (repeated), as the hash that is tested. If you have an array of object names you use in the test, then for (count = 0; count < limit; count++) { hashcpy(&oid.hash, &samples[count]); ... do the probing ... } > First, this doesn't just measure the time it takes to determine non- > existence, Sorry, my phrasing was indeed misleading. I know the time we spend to see if we have or do not have the object is the largest cycle spender in these codepaths (and even if it were, I do not think that is what you are trying to optimize in these patches anyway). But if I recall correctly, the way we come up with the unique abbreviation for an object that exists and an object that does not exist are different? And because most of the time (think: "git log -p" output) we would be finding abbreviation for objects that we do have, benchmarking and tweaking the code that comes up with an object that does not exist is not optimizing for the right case. Back when I wrote that initial response, I didn't check how different the code was between the two cases, but now I did. It seems that in both cases we start from the shortest-allowed length and then extend the same way, and the only difference between these two cases is that we return immediately when our candidate name is long enough not to match any existing object when the full name refers to an object we do not have, while we return only when disambiguity is resolved. I _think_ these amount to the same computation (i.e. when an object with the full name we have exists, the amount of computation we need to come up with its unique abbreviation is the same as the computation we need to find the unique abbreviation for the same name in another repository that has identical set of objects, minus that single object), so from that point of view, throwing random hashes, most of which would not name any existing object, and measuring how much time it takes to run get_short_oid() to compute the minimum length of the unique prefix should be sufficient. Thanks.
Re: [PATCH] describe: teach --match to handle branches and remotes
On Mon, Sep 18, 2017 at 4:52 PM, Junio C Hamano wrote: > Max Kirillov writes: > >> --match :: >> Only consider tags matching the given `glob(7)` pattern, >> - excluding the "refs/tags/" prefix. This can be used to avoid >> - leaking private tags from the repository. If given multiple times, a >> - list of patterns will be accumulated, and tags matching any of the >> - patterns will be considered. Use `--no-match` to clear and reset the >> - list of patterns. >> + excluding the "refs/tags/" prefix. If used with `--all`, it also >> + considers local branches and remote-tracking references matching the >> + pattern, excluding respectively "refs/heads/" and "refs/remotes/" >> + prefix; references of other types are never considered. If given >> + multiple times, a list of patterns will be accumulated, and tags >> + matching any of the patterns will be considered. Use `--no-match` to >> + clear and reset the list of patterns. >> >> --exclude :: >> Do not consider tags matching the given `glob(7)` pattern, excluding >> - the "refs/tags/" prefix. This can be used to narrow the tag space and >> - find only tags matching some meaningful criteria. If given multiple >> - times, a list of patterns will be accumulated and tags matching any >> - of the patterns will be excluded. When combined with --match a tag will >> - be considered when it matches at least one --match pattern and does not >> + the "refs/tags/" prefix. If used with `--all`, it also does not >> consider >> + local branches and remote-tracking references matching the pattern, >> + excluding respectively "refs/heads/" and "refs/remotes/" prefix; >> + references of other types are never considered. If given multiple >> times, >> + a list of patterns will be accumulated and tags matching any of the >> + patterns will be excluded. When combined with --match a tag will be >> + considered when it matches at least one --match pattern and does not >> match any of the --exclude patterns. Use `--no-exclude` to clear and >> reset the list of patterns. > > OK, I find this written clearly enough. > >> diff --git a/builtin/describe.c b/builtin/describe.c >> index 94ff2fba0b..2a2e998063 100644 >> --- a/builtin/describe.c >> +++ b/builtin/describe.c >> @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path, >> } >> } >> >> +/* Drops prefix. Returns NULL if the path is not expected with current >> settings. */ >> +static const char *get_path_to_match(int is_tag, int all, const char *path) >> +{ >> + if (is_tag) >> + return path + 10; > > This is a faithful conversion of the existing code that wants to > behave the same as original, but a bit more on this later. > >> + else if (all) { >> + if (starts_with(path, "refs/heads/")) >> + return path + 11; /* "refs/heads/..." */ >> + else if (starts_with(path, "refs/remotes/")) >> + return path + 13; /* "refs/remotes/..." */ >> + else >> + return 0; > > I think you can use skip_prefix() to avoid counting the length of > the prefix yourself, i.e. > > else if all { > const char *body; > > if (skip_prefix(path, "refs/heads/", &body)) > return body; > else if (skip_prefix(path, "refs/remotes/", &body)) > ... > } > > Whether you do the above or not, the last one that returns 0 should > return NULL (to the language it is the same thing, but to humans, we > write NULL when it is the null pointer, not the number 0). > >> + } else >> + return NULL; >> +} > > Perhaps the whole thing may want to be a bit more simplified, like: > > static const *skip_ref_prefix(const char *path, int all) > { > const char *prefix[] = { > "refs/tags/", "refs/heads/", "refs/remotes/" > }; > const char *body; > int cnt; > int bound = all ? ARRAY_SIZE(prefix) : 1; > I found the implicit use of "bound = 1" means "we only care about tags" to be a bit weird here. I guess it's not really that big a deal overall, and this is definitely cleaner than the original implementation. > for (cnt = 0; cnt < bound; cnt++) > if (skip_prefix(path, prefix[cnt], &body); > return body; > return NULL; > } > > The hardcoded +10 for "is_tag" case assumes that anything other than > "refs/tags/something" would ever be used to call into this function > when is_tag is true, and that may well be true in the current code > and have been so ever since the original code was written, but it > still smells like an invitation for future bugs. > > I dunno. > >> + >> static int get_name(const ch
Re: [PATCH 1/2] read_info_alternates: read contents into strbuf
Jeff King writes: > We could also just make a NUL-terminated copy of the input > bytes and operate on that. But since all but one caller > already is passing a string, instead let's just fix that > caller to provide NUL-terminated input in the first place, > by swapping out mmap for strbuf_read_file(). > ... > Let's also drop the "len" parameter entirely from > link_alt_odb_entries(), since it's completely ignored. That > will avoid any new callers re-introducing a similar bug. Both design decisions make perfect sense to me. > sha1_file.c | 29 + > 1 file changed, 9 insertions(+), 20 deletions(-) And diffstat also agrees that it is a good change ;-)
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
> I am hoping that this last one is not allowed and we can use the > "same condition is checked every time we loop" version that hides > the uglyness inside the macro. By which you are referring to Jonathans solution posted. Maybe we can combine the two solutions (checking for thelist to not be NULL once, by Jonathan) and using an outer structure (SZEDERs solution) by replacing the condition by a for loop, roughly (untested): #define for_each_string_list_item(item,list) \ - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) +for (; list; list = NULL) +for (item = (list)->items; item < (list)->items + (list)->nr; ++item) as that would not mingle with any dangling else clause. It is also just one statement, such that if (bla) for_each_string_list_item { baz(item); } else foo; still works. Are there downsides to this combined approach?
Re: [PATCH] describe: teach --match to handle branches and remotes
Max Kirillov writes: > --match :: > Only consider tags matching the given `glob(7)` pattern, > - excluding the "refs/tags/" prefix. This can be used to avoid > - leaking private tags from the repository. If given multiple times, a > - list of patterns will be accumulated, and tags matching any of the > - patterns will be considered. Use `--no-match` to clear and reset the > - list of patterns. > + excluding the "refs/tags/" prefix. If used with `--all`, it also > + considers local branches and remote-tracking references matching the > + pattern, excluding respectively "refs/heads/" and "refs/remotes/" > + prefix; references of other types are never considered. If given > + multiple times, a list of patterns will be accumulated, and tags > + matching any of the patterns will be considered. Use `--no-match` to > + clear and reset the list of patterns. > > --exclude :: > Do not consider tags matching the given `glob(7)` pattern, excluding > - the "refs/tags/" prefix. This can be used to narrow the tag space and > - find only tags matching some meaningful criteria. If given multiple > - times, a list of patterns will be accumulated and tags matching any > - of the patterns will be excluded. When combined with --match a tag will > - be considered when it matches at least one --match pattern and does not > + the "refs/tags/" prefix. If used with `--all`, it also does not consider > + local branches and remote-tracking references matching the pattern, > + excluding respectively "refs/heads/" and "refs/remotes/" prefix; > + references of other types are never considered. If given multiple times, > + a list of patterns will be accumulated and tags matching any of the > + patterns will be excluded. When combined with --match a tag will be > + considered when it matches at least one --match pattern and does not > match any of the --exclude patterns. Use `--no-exclude` to clear and > reset the list of patterns. OK, I find this written clearly enough. > diff --git a/builtin/describe.c b/builtin/describe.c > index 94ff2fba0b..2a2e998063 100644 > --- a/builtin/describe.c > +++ b/builtin/describe.c > @@ -124,6 +124,22 @@ static void add_to_known_names(const char *path, > } > } > > +/* Drops prefix. Returns NULL if the path is not expected with current > settings. */ > +static const char *get_path_to_match(int is_tag, int all, const char *path) > +{ > + if (is_tag) > + return path + 10; This is a faithful conversion of the existing code that wants to behave the same as original, but a bit more on this later. > + else if (all) { > + if (starts_with(path, "refs/heads/")) > + return path + 11; /* "refs/heads/..." */ > + else if (starts_with(path, "refs/remotes/")) > + return path + 13; /* "refs/remotes/..." */ > + else > + return 0; I think you can use skip_prefix() to avoid counting the length of the prefix yourself, i.e. else if all { const char *body; if (skip_prefix(path, "refs/heads/", &body)) return body; else if (skip_prefix(path, "refs/remotes/", &body)) ... } Whether you do the above or not, the last one that returns 0 should return NULL (to the language it is the same thing, but to humans, we write NULL when it is the null pointer, not the number 0). > + } else > + return NULL; > +} Perhaps the whole thing may want to be a bit more simplified, like: static const *skip_ref_prefix(const char *path, int all) { const char *prefix[] = { "refs/tags/", "refs/heads/", "refs/remotes/" }; const char *body; int cnt; int bound = all ? ARRAY_SIZE(prefix) : 1; for (cnt = 0; cnt < bound; cnt++) if (skip_prefix(path, prefix[cnt], &body); return body; return NULL; } The hardcoded +10 for "is_tag" case assumes that anything other than "refs/tags/something" would ever be used to call into this function when is_tag is true, and that may well be true in the current code and have been so ever since the original code was written, but it still smells like an invitation for future bugs. I dunno. > + > static int get_name(const char *path, const struct object_id *oid, int flag, > void *cb_data) > { > int is_tag = starts_with(path, "refs/tags/"); > @@ -140,12 +156,13 @@ static int get_name(const char *path, const struct > object_id *oid, int flag, voi >*/ > if (exclude_patterns.nr) { > struct string_list_item *item; > + const char *path_to_match = get_path_to_match(is_tag, all, > path); > +
Re: [OUTREACHY] pack: make packed_git_mru global a value instead of a pointer
Hi, phionah bugosi wrote: > Just to reecho a previous change requested before in one of the mail > threads, we currently have two global variables declared: > > struct mru packed_git_mru_storage; > struct mru *packed_git_mru = &packed_git_mru_storage; > > We normally use pointers in C to point or refer to the same location > or space in memory from multiple places. That means in situations > where we will have the pointer be assigned to in many places in our > code. But in this case, we do not have any other logic refering or > assigning to the pointer packed_git_mru. In simple words we actually > dont need a pointer in this case. > > Therefore this patch makes packed_git_mru global a value instead of > a pointer and makes use of list.h > > Signed-off-by: phionah bugosi > --- > builtin/pack-objects.c | 5 +++-- > cache.h| 7 --- > list.h | 6 ++ > packfile.c | 12 ++-- > 4 files changed, 15 insertions(+), 15 deletions(-) *puzzled* This appears to already be in "pu", with a different author. Did you independently make the same change? Or are you asking for a progress report on that change, and just phrasing it in a confusing way? Confused, Jonathan
Re: RFC v3: Another proposed hash function transition plan
Hi, Gilles Van Assche wrote: > Hi Johannes, >> SHA-256 got much more cryptanalysis than SHA3-256 […]. > > I do not think this is true. Keccak/SHA-3 actually got (and is still > getting) a lot of cryptanalysis, with papers published at renowned > crypto conferences [1]. > > Keccak/SHA-3 is recognized to have a significant safety margin. E.g., > one can cut the number of rounds in half (as in Keyak or KangarooTwelve) > and still get a very strong function. I don't think we could say the > same for SHA-256 or SHA-512… I just wanted to thank you for paying attention to this conversation and weighing in. Most of the regulars in the git project are not crypto experts. This kind of extra information (and e.g. [2]) is very useful to us. Thanks, Jonathan > Kind regards, > Gilles, for the Keccak team > > [1] https://keccak.team/third_party.html [2] https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/
Re: RFC v3: Another proposed hash function transition plan
Hi Gilles, On Mon, 18 Sep 2017, Gilles Van Assche wrote: > > SHA-256 got much more cryptanalysis than SHA3-256 […]. > > I do not think this is true. Please read what I said again: SHA-256 got much more cryptanalysis than SHA3-256. I never said that SHA3-256 got little cryptanalysis. Personally, I think that SHA3-256 got a ton more cryptanalysis than SHA-1, and that SHA-256 *still* got more cryptanalysis. But my opinion does not count, really. However, the two experts I pestered with questions over questions left me with that strong impression, and their opinion does count. > Keccak/SHA-3 actually got (and is still getting) a lot of cryptanalysis, > with papers published at renowned crypto conferences [1]. > > Keccak/SHA-3 is recognized to have a significant safety margin. E.g., > one can cut the number of rounds in half (as in Keyak or KangarooTwelve) > and still get a very strong function. I don't think we could say the > same for SHA-256 or SHA-512… Again, I do not want to criticize SHA3/Keccak. Personally, I have a lot of respect for Keccak. I also have a lot of respect for everybody who scrutinized the SHA2 family of algorithms. I also respect the fact that there are more implementations of SHA-256, and thanks to everybody seeming to demand SHA-256 checksums instead of SHA-1 or MD5 for downloads, bugs in those implementations are probably discovered relatively quickly, and I also cannot ignore the prospect of hardware support for SHA-256. In any case, having SHA3 as a fallback in case SHA-256 gets broken seems like a very good safety net to me. Ciao, Johannes
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
>> instead compared to the proposed configuration above. >> (Even better yet, then people could play around with "v1 only" >> and see how it falls apart on old servers) > > Except we can't start with an explicit whitelist because we must > fallback to v0 if v1 isn't supported otherwise we would break people. > > That is unless we have the semantics of: If not configured v0 will be > used, otherwise only use the configured protocol versions. > Good point. Thinking about this, how about: If not configured, we do as we want. (i.e. Git has full control over it's decision making process, which for now is "favor v0 over v1 as we are experimenting with v1". This strategy may change in the future to "prefer highest version number that both client and server can speak".) If it is configured, "use highest configured number from the given set". ?
Re: [PATCH v2] add test for bug in git-mv for recursive submodules
>> Took a little while but here is a more clean patch creating individual >> submodules for the nesting. >> >> Cheers Heiko Thanks for writing this test! > > Thanks. Stefan, does this look good to you now? Yes, though there are nits below. > It is not quite clear which step is expected to fail with the > current code by reading the test or the proposed log message. Does > "mv" refuse to work and we do not get to run "status", or does > "status" report a failure, or do we fail well before that? git-mv failing seems like a new possibility without incurring another process spawn with the new repository object. (Though then we could also just fix the recursed submodule) > The log message that only says "This does not work when ..." is not > helpful in figuring it out, either. Something like "This does not > work and fails to update the paths for its configurations" or > whatever that describes "what actually happens" (in contrast to > "what ought to happen", which you described clearly) should be > there. > > Description on how you happened to have discovered the issue feels a > lot less relevant compared to that, and it is totally useless if it > is unclear what the issue is in the first place. > >> t/t7001-mv.sh | 25 + >> 1 file changed, 25 insertions(+) >> >> diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh >> index e365d1ff77..cbc5fb37fe 100755 >> --- a/t/t7001-mv.sh >> +++ b/t/t7001-mv.sh >> @@ -491,4 +491,29 @@ test_expect_success 'moving a submodule in nested >> directories' ' >> test_cmp actual expect >> ' >> >> +test_expect_failure 'moving nested submodules' ' >> + git commit -am "cleanup commit" && >> + mkdir sub_nested_nested && >> + (cd sub_nested_nested && We seem to have different styles for nested shell. I prefer outside command && ( first nested command here && ... as that aligns indentation to the nesting level. I have seen the style you use a lot in the test suite, and we do not have a guideline in Documentation/CodingGuidelines, so I do not complain too loudly. ;) >> + touch nested_level2 && >> + git init && >> + git add . && >> + git commit -m "nested level 2" >> + ) && >> + mkdir sub_nested && >> + (cd sub_nested && >> + touch nested_level1 && >> + git init && >> + git add . && >> + git commit -m "nested level 1" >> + git submodule add ../sub_nested_nested && >> + git commit -m "add nested level 2" >> + ) && >> + git submodule add ./sub_nested nested_move && >> + git commit -m "add nested_move" && >> + git submodule update --init --recursive && So far a nice setup! >> + git mv nested_move sub_nested_moved && This is the offending command that produces the bug, as it will break most subsequent commands, such as >> + git status git-status is one of the basic commands. Without status to function, I think it is hard to recover your repo without a lot of in-depth knowledge of Git (submodules). I wonder if git-status should complain more gracefully and fallback to one of --ignore-submodules={dirty, all}, that actually still works. Maybe we could introduce a new default mode for this flag, that is "none-except-on-error", though this sounds as if we're fixing symptoms instead of the root cause.
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
On 09/18, Stefan Beller wrote: > >> From a users POV this may be frustrating as I would imagine that > >> people want to run > >> > >> git config --global protocol.version 2 > >> > >> to try it out and then realize that some of their hosts do not speak > >> 2, so they have to actually configure it per repo/remote. > > > > The point would be to be able to set this globally, not per-repo. Even > > if a repo doesn't speak v2 then it should be able to gracefully degrade > > to v1 without the user having to do anything. The reason why there is > > this escape hatch is if doing the protocol negotiation out of band > > causing issues with communicating with a server that it can be shut off. > > In the current situation it is easy to assume that if v1 (and not v0) > is configured, that the users intent is "to try out v1 and fallback > gracefully to v0". > > But this will change over time in the future! > > Eventually people will have the desire to say: > "Use version N+1, but never version N", because N has > performance or security issues; the user might not want > to bother to try N or even actively want to be affirmed that > Git will never use version N. > > In this future we need a mechanism, that either contains a > white list or black list of protocols. To keep it simple (I assume > there won't be many protocol versions), a single white list will do. > > However transitioning from the currently proposed "try the new > configured thing and fallback to whatever" to "this is the exact list > of options that Git will try for you" will be hard, as we may break people > if we do not unconditionally fall back to v0. > > That is why I propose to start with an explicit white list as then we > do not have to have a transition plan or otherwise work around the > issue. Also it doesn't hurt now to use > > git config --global protocol.version v1,v0 > > instead compared to the proposed configuration above. > (Even better yet, then people could play around with "v1 only" > and see how it falls apart on old servers) Except we can't start with an explicit whitelist because we must fallback to v0 if v1 isn't supported otherwise we would break people. That is unless we have the semantics of: If not configured v0 will be used, otherwise only use the configured protocol versions. -- Brandon Williams
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
>> From a users POV this may be frustrating as I would imagine that >> people want to run >> >> git config --global protocol.version 2 >> >> to try it out and then realize that some of their hosts do not speak >> 2, so they have to actually configure it per repo/remote. > > The point would be to be able to set this globally, not per-repo. Even > if a repo doesn't speak v2 then it should be able to gracefully degrade > to v1 without the user having to do anything. The reason why there is > this escape hatch is if doing the protocol negotiation out of band > causing issues with communicating with a server that it can be shut off. In the current situation it is easy to assume that if v1 (and not v0) is configured, that the users intent is "to try out v1 and fallback gracefully to v0". But this will change over time in the future! Eventually people will have the desire to say: "Use version N+1, but never version N", because N has performance or security issues; the user might not want to bother to try N or even actively want to be affirmed that Git will never use version N. In this future we need a mechanism, that either contains a white list or black list of protocols. To keep it simple (I assume there won't be many protocol versions), a single white list will do. However transitioning from the currently proposed "try the new configured thing and fallback to whatever" to "this is the exact list of options that Git will try for you" will be hard, as we may break people if we do not unconditionally fall back to v0. That is why I propose to start with an explicit white list as then we do not have to have a transition plan or otherwise work around the issue. Also it doesn't hurt now to use git config --global protocol.version v1,v0 instead compared to the proposed configuration above. (Even better yet, then people could play around with "v1 only" and see how it falls apart on old servers)
RE: [PATCH v6 12/12] fsmonitor: add a performance test
> -Original Message- > From: Johannes Schindelin [mailto:johannes.schinde...@gmx.de] > Sent: Monday, September 18, 2017 10:25 AM > To: Ben Peart > Cc: david.tur...@twosigma.com; ava...@gmail.com; > christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; > pclo...@gmail.com; p...@peff.net > Subject: Re: [PATCH v6 12/12] fsmonitor: add a performance test > > Hi Ben, > > sorry for not catching this earlier: > > On Fri, 15 Sep 2017, Ben Peart wrote: > > > [...] > > + > > +int cmd_dropcaches(void) > > +{ > > + HANDLE hProcess = GetCurrentProcess(); > > + HANDLE hToken; > > + int status; > > + > > + if (!OpenProcessToken(hProcess, TOKEN_QUERY | > TOKEN_ADJUST_PRIVILEGES, &hToken)) > > + return error("Can't open current process token"); > > + > > + if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1)) > > + return error("Can't get SeProfileSingleProcessPrivilege"); > > + > > + CloseHandle(hToken); > > + > > + HMODULE ntdll = LoadLibrary("ntdll.dll"); > Thanks Johannes, I'll fix that. > Git's source code still tries to abide by C90, and for simplicity's sake, this > extends to the Windows-specific part. Therefore, the `ntdll` variable needs to > be declared at the beginning of the function (I do agree that it makes for > better code to reduce the scope of variables, but C90 simply did not allow > variables to be declared in the middle of functions). > > I wanted to send a patch address this in the obvious way, but then I > encountered these lines: > > > + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) = > > + (DWORD(WINAPI *)(INT, PVOID, > ULONG))GetProcAddress(ntdll, "NtSetSystemInformation"); > > + if (!NtSetSystemInformation) > > + return error("Can't get function addresses, wrong Windows > > +version?"); > > It turns out that we have seen this plenty of times in Git for Windows' > fork, so much so that we came up with a nice helper to make this all a bit > more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and > INIT_PROC_ADDR helpers in compat/win32/lazyload.h. > > Maybe this would be the perfect excuse to integrate this patch into upstream > Git? This patch is pretty hefty already. How about you push this capability upstream and I take advantage of it in a later patch. :) This would be the patch (you can also cherry-pick it from > 25c4dc3a73352e72e995594cf1b4afa46e93d040 in > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub. > com%2Fdscho%2Fgit&data=02%7C01%7CBen.Peart%40microsoft.com%7C96 > 4027bdc1f34a033c1f08d4fea1056e%7C72f988bf86f141af91ab2d7cd011db47 > %7C1%7C0%7C636413414914282865&sdata=jyvu6G7myRY9UA1XxWx2tDZ% > 2BWsIWqLTRMT8WfzEGe5g%3D&reserved=0): > > -- snip -- > From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00 > 2001 > From: Johannes Schindelin > Date: Tue, 10 Jan 2017 23:14:20 +0100 > Subject: [PATCH] Win32: simplify loading of DLL functions > > Dynamic loading of DLL functions is duplicated in several places in Git for > Windows' source code. > > This patch adds a pair of macros to simplify the process: the > DECLARE_PROC_ADDR(, , , > ..) macro to be used at the beginning of a > code block, and the INIT_PROC_ADDR() macro to call before > using the declared function. The return value of the INIT_PROC_ADDR() call > has to be checked; If it is NULL, the function was not found in the specified > DLL. > > Example: > > DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, > LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); > > if (!INIT_PROC_ADDR(CreateHardLinkW)) > return error("Could not find CreateHardLinkW() function"; > > if (!CreateHardLinkW(source, target, NULL)) > return error("could not create hardlink from %S to %S", >source, target); > return 0; > > Signed-off-by: Karsten Blees > Signed-off-by: Johannes Schindelin > --- > compat/win32/lazyload.h | 44 > > 1 file changed, 44 insertions(+) > create mode 100644 compat/win32/lazyload.h > > diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file > mode 100644 index 000..91c10dad2fb > --- /dev/null > +++ b/compat/win32/lazyload.h > @@ -0,0 +1,44 @@ > +#ifndef LAZYLOAD_H > +#define LAZYLOAD_H > + > +/* simplify loading of DLL functions */ > + > +struct proc_addr { > + const char *const dll; > + const char *const function; > + FARPROC pfunction; > + unsigned initialized : 1; > +}; > + > +/* Declares a function to be loaded dynamically from a DLL. */ #define > +DECLARE_PROC_ADDR(dll, rettype, function, ...) \ > + static struct proc_addr proc_addr_##function = \ > + { #dll, #function, NULL, 0 }; \ > + static rettype (WINAPI *function)(__VA_ARGS__) > + > +/* > + * Loads a function from a DLL (once-only). > + * Returns non-NULL function pointer on success. > + * Returns
Re: [RFC PATCH 0/2] Add named reference to latest push cert
On Mon, Sep 18, 2017 at 7:22 AM, Santiago Torres wrote: > Hello, everyone. > > Sorry for being late in this thread, I was making sure I didn't say > anything outrageously wrong. > >> That's Stefan; I wouldn't have suggested any approach that uses the >> blob whose sole purpose is to serve as a temporary storage area to >> pass the information to the hook as part of the permanent record. >> I put out a vague design that seemed to have more advantages in my mind at the time of writing. :) > > Hmm, as far as I understand *this* is the status quo. We get an > ephemeral sha1/oid only if you have a hook attached. Otherwise, you will > never see the object at all, even if you have --signed set. > > I suggested preparing this RFC/Patch because of the following reasons > (please let me know if my understanding of any of this is wrong...): > > - I find it weird that the cli allows for a --signed push and > nowhere in the receive-pack's feedback you're told if the push > certificate is compute/stored/handled at all. I think that, at the > latest, receive pack should let the user know whether the signed > push succeeded, or if there is no hook attached to handle it. How would a user benefit from it? (Are there cases where the organisation wants the user to not know deliberately what happened to their push cert? Do we care about these cases?) > - *if there is a hook* the blob is computed, but it is up to the > hook itself to store it *somewhere*. This makes me feel like it's > somewhat of a useless waste of computation if the hook is not > meant to handle it anyway(which is just a post-receive hook). I > find it rather weird that --signed is a builtin flag, and is > handled on the server side only partially (just my two cents). I agree, but many features in Git start out small and only partially. > - To me, the way push certificates are handled now feels like having > git commit -S producing a detached signature that you have to > embed somehow in the resulting commit object. (As a matter of > fact, many points on [1] seem to back this notion, and even recall > some drawbacks on push certificates the way they are handled > today) > > I understand the concurrency concerns, so I agree with stefan's > solution, although I don't know how big of a deal it would be, if git > already supports --atomic pushes (admittedly, I haven't checked if there > are any guards in place for someone who pushes millions of refs > atomically). It'd be completely understandable to have a setting to > disable hadnling of --signed pushes and, ideally, a way to warn the user > of this feature being disabled if --signed is sent. That makes sense.
Re: [PATCH] pack: make packed_git_mru global a value instead of a pointer
On Sun, Sep 17, 2017 at 1:44 PM, Christian Couder wrote: > On Sun, Sep 17, 2017 at 10:42 PM, Christian Couder > wrote: >> On Sun, Sep 17, 2017 at 10:17 PM, phionah bugosi wrote: >>> Signed-off-by: phionah bugosi >>> --- >>> builtin/pack-objects.c | 5 +++-- >>> cache.h| 7 --- >>> list.h | 6 ++ >>> packfile.c | 12 ++-- >>> 4 files changed, 15 insertions(+), 15 deletions(-) >>> >>> This patch makes packed_git_mru global a value instead of a pointer and >>> makes use of list.h >> >> Please explain _why_ you are doing that (why is the resulting code better). > > Also the explanations should be in the commit message, that is above > your "Signed-off-by: ..." line. A similar patch exists at origin/jn/per-repo-object-store-fixes^^ (I haven't checked if there are differences and if so which patch is better, all I am noting here, is that there has been work very similar to this one a couple days prior)
[PATCH v2] Improve performance of git status --ignored
Improve the performance of the directory listing logic when it wants to list non-empty ignored directories. In order to show non-empty ignored directories, the existing logic will recursively iterate through all contents of an ignored directory. This change introduces the optimization to stop iterating through the contents once it finds the first file. This can have a significant improvement in 'git status --ignored' performance in repositories with a large number of files in ignored directories. For an example of the performance difference on an example repository with 196,000 files in 400 ignored directories: | Command| Time (s) | | -- | - | | git status | 1.2 | | git status --ignored (old) | 3.9 | | git status --ignored (new) | 1.4 | Signed-off-by: Jameson Miller --- dir.c | 47 +-- 1 file changed, 41 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index 1c55dc3..1d17b80 100644 --- a/dir.c +++ b/dir.c @@ -49,7 +49,7 @@ struct cached_dir { static enum path_treatment read_directory_recursive(struct dir_struct *dir, struct index_state *istate, const char *path, int len, struct untracked_cache_dir *untracked, - int check_only, const struct pathspec *pathspec); + int check_only, int stop_at_first_file, const struct pathspec *pathspec); static int get_dtype(struct dirent *de, struct index_state *istate, const char *path, int len); @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + + /* +* If this is an excluded directory, then we only need to check if +* the directory contains any files. +*/ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, pathspec); + untracked, 1, exclude, pathspec); } /* @@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, * with check_only set. */ return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, pathspec); + cdir->ucd, 1, 0, pathspec); /* * We get path_recurse in the first run when * directory_exists_in_index() returns index_nonexistent. We @@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir) * Also, we ignore the name ".git" (even if it is not a directory). * That likely will not change. * + * If 'stop_at_first_file' is specified, 'path_excluded' is returned + * to signal that a file was found. This is the least significant value that + * indicates that a file was encountered that does not depend on the order of + * whether an untracked or exluded path was encountered first. + * * Returns the most significant path_treatment value encountered in the scan. + * If 'stop_at_first_file' is specified, `path_excluded` is the most + * significant path_treatment value that will be returned. */ + static enum path_treatment read_directory_recursive(struct dir_struct *dir, struct index_state *istate, const char *base, int baselen, struct untracked_cache_dir *untracked, int check_only, - const struct pathspec *pathspec) + int stop_at_first_file, const struct pathspec *pathspec) { struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; @@ -1832,12 +1845,34 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, subdir_state = read_directory_recursive(dir, istate, path.buf, path.len, ud, -check_only, pathspec); +check_only, stop_at_first_file, pathspec); if (subdir_state > dir_state) dir_state = subdir_state; } if (check_only) { + if (stop_at_first_file) { + /* +* If stopping at first file, then +* signal that a file was found by +* returning `path_excluded`. This is +* to return a consistent value +* regardless of whether an ignored or +* excluded file happened to be +* encountered 1st. +* +
[PATCH v2] Improve performance of git status --ignored
This is the second version of my patches to improve handling of ignored files I have decided to break the original patch series into two parts: 1) Perf improvements to handling ignored directories 2) Expose extra options to control which ignored files are displayed by git status. This patch will address #1, and I will follow up with another patch for #2. This patch improves the performance of 'git status --ignored' by cutting out unnecessary work when determining if an ignored directory is empty or not. The current logic will recursively enumerate through all contents of an ignored directory to determine whether it is empty or not. The new logic will return after the 1st file is encountered. This can result in a significant speedup in work dirs with a lot of files in ignored directories. As an example of the performance improvement, here is a representative example of a repository with ~190,000 files in ~400 ignored directories: | Command| Time (s) | | -- | | | git status |1.2 | | git status --ignored (old) |3.9 | | git status --ignored (new) |1.4 | Jameson Miller (1): Improve performance of git status --ignored dir.c | 47 +-- 1 file changed, 41 insertions(+), 6 deletions(-) -- 2.7.4
[OUTREACHY] pack: make packed_git_mru global a value instead of a pointer
Just to reecho a previous change requested before in one of the mail threads, we currently have two global variables declared: struct mru packed_git_mru_storage; struct mru *packed_git_mru = &packed_git_mru_storage; We normally use pointers in C to point or refer to the same location or space in memory from multiple places. That means in situations where we will have the pointer be assigned to in many places in our code. But in this case, we do not have any other logic refering or assigning to the pointer packed_git_mru. In simple words we actually dont need a pointer in this case. Therefore this patch makes packed_git_mru global a value instead of a pointer and makes use of list.h Signed-off-by: phionah bugosi --- builtin/pack-objects.c | 5 +++-- cache.h| 7 --- list.h | 6 ++ packfile.c | 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a57b4f0..189123f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1,5 +1,6 @@ #include "builtin.h" #include "cache.h" +#include "list.h" #include "config.h" #include "attr.h" #include "object.h" @@ -1012,7 +1013,7 @@ static int want_object_in_pack(const unsigned char *sha1, return want; } - for (entry = packed_git_mru->head; entry; entry = entry->next) { + for (entry = packed_git_mru.head; entry; entry = entry->next) { struct packed_git *p = entry->item; off_t offset; @@ -1030,7 +1031,7 @@ static int want_object_in_pack(const unsigned char *sha1, } want = want_found_object(exclude, p); if (!exclude && want > 0) - mru_mark(packed_git_mru, entry); + mru_mark(&packed_git_mru, entry); if (want != -1) return want; } diff --git a/cache.h b/cache.h index a916bc7..c8d7086 100644 --- a/cache.h +++ b/cache.h @@ -1585,13 +1585,6 @@ extern struct packed_git { char pack_name[FLEX_ARRAY]; /* more */ } *packed_git; -/* - * A most-recently-used ordered version of the packed_git list, which can - * be iterated instead of packed_git (and marked via mru_mark). - */ -struct mru; -extern struct mru *packed_git_mru; - struct pack_entry { off_t offset; unsigned char sha1[20]; diff --git a/list.h b/list.h index a226a87..525703b 100644 --- a/list.h +++ b/list.h @@ -26,6 +26,12 @@ #define LIST_H 1 /* + * A most-recently-used ordered version of the packed_git list, which can + * be iterated instead of packed_git (and marked via mru_mark). + */ +extern struct mru packed_git_mru + +/* * The definitions of this file are adopted from those which can be * found in the Linux kernel headers to enable people familiar with the * latter find their way in these sources as well. diff --git a/packfile.c b/packfile.c index f86fa05..61a61aa 100644 --- a/packfile.c +++ b/packfile.c @@ -1,4 +1,5 @@ #include "cache.h" +#include "list.h" #include "mru.h" #include "pack.h" #include "dir.h" @@ -41,8 +42,7 @@ static size_t peak_pack_mapped; static size_t pack_mapped; struct packed_git *packed_git; -static struct mru packed_git_mru_storage; -struct mru *packed_git_mru = &packed_git_mru_storage; +struct mru packed_git_mru; #define SZ_FMT PRIuMAX static inline uintmax_t sz_fmt(size_t s) { return s; } @@ -861,9 +861,9 @@ static void prepare_packed_git_mru(void) { struct packed_git *p; - mru_clear(packed_git_mru); + mru_clear(&packed_git_mru); for (p = packed_git; p; p = p->next) - mru_append(packed_git_mru, p); + mru_append(&packed_git_mru, p); } static int prepare_packed_git_run_once = 0; @@ -1832,9 +1832,9 @@ int find_pack_entry(const unsigned char *sha1, struct pack_entry *e) if (!packed_git) return 0; - for (p = packed_git_mru->head; p; p = p->next) { + for (p = packed_git_mru.head; p; p = p->next) { if (fill_pack_entry(sha1, e, p->item)) { - mru_mark(packed_git_mru, p); + mru_mark(&packed_git_mru, p); return 1; } } -- 2.7.4
[PATCH] rev-parse: rev-parse: add --is-shallow-repository
Running `git fetch --unshallow` on a repo that is not in fact shallow produces a fatal error message. Add a helper to rev-parse that scripters can use to determine whether a repo is shallow or not. Signed-off-by: Øystein Walle --- Documentation/git-rev-parse.txt | 3 +++ builtin/rev-parse.c | 5 + t/t1500-rev-parse.sh| 15 +++ 3 files changed, 23 insertions(+) diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt index b1293f24b..0917b8207 100644 --- a/Documentation/git-rev-parse.txt +++ b/Documentation/git-rev-parse.txt @@ -235,6 +235,9 @@ print a message to stderr and exit with nonzero status. --is-bare-repository:: When the repository is bare print "true", otherwise "false". +--is-shallow-repository:: + When the repository is shallow print "true", otherwise "false". + --resolve-git-dir :: Check if is a valid repository or a gitfile that points at a valid repository, and print the location of the diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index 2bd28d3c0..c923207f2 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -868,6 +868,11 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) : "false"); continue; } + if (!strcmp(arg, "--is-shallow-repository")) { + printf("%s\n", is_repository_shallow() ? "true" + : "false"); + continue; + } if (!strcmp(arg, "--shared-index-path")) { if (read_cache() < 0) die(_("Could not read the index")); diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh index 03d3c7f6d..9d3433a30 100755 --- a/t/t1500-rev-parse.sh +++ b/t/t1500-rev-parse.sh @@ -116,6 +116,21 @@ test_expect_success 'git-path inside sub-dir' ' test_cmp expect actual ' +test_expect_success 'git-path shallow repository' ' + test_commit test_commit && + echo true >expect && + git clone --depth 1 --no-local . shallow && + test_when_finished "rm -rf shallow" && + git -C shallow rev-parse --is-shallow-repository >actual && + test_cmp expect actual +' + +test_expect_success 'git-path notshallow repository' ' + echo false >expect && + git rev-parse --is-shallow-repository >actual && + test_cmp expect actual +' + test_expect_success 'showing the superproject correctly' ' git rev-parse --show-superproject-working-tree >out && test_must_be_empty out && -- 2.11.0.485.g4e59582
Re: [PATCH 2/8] protocol: introduce protocol extention mechanisms
On 09/13, Stefan Beller wrote: > On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williams wrote: > > Create protocol.{c,h} and provide functions which future servers and > > clients can use to determine which protocol to use or is being used. > > > > Also introduce the 'GIT_PROTOCOL' environment variable which will be > > used to communicate a colon separated list of keys with optional values > > to a server. Unknown keys and values must be tolerated. This mechanism > > is used to communicate which version of the wire protocol a client would > > like to use with a server. > > > > Signed-off-by: Brandon Williams > > --- > > Documentation/config.txt | 16 +++ > > Documentation/git.txt| 5 > > Makefile | 1 + > > cache.h | 7 + > > protocol.c | 72 > > > > protocol.h | 15 ++ > > 6 files changed, 116 insertions(+) > > create mode 100644 protocol.c > > create mode 100644 protocol.h > > > > diff --git a/Documentation/config.txt b/Documentation/config.txt > > index dc4e3f58a..d5b28a32c 100644 > > --- a/Documentation/config.txt > > +++ b/Documentation/config.txt > > @@ -2517,6 +2517,22 @@ The protocol names currently used by git are: > > `hg` to allow the `git-remote-hg` helper) > > -- > > > > +protocol.version:: > > It would be cool to set a set of versions that are good. (I am not sure > if that can be deferred to a later patch.) > > Consider we'd have versions 0,1,2,3,4 in the future: > In an ideal world the client and server would talk using v4 > as it is the most advanced protocol, right? > Maybe a security/performance issue is found on the server side > with say protocol v3. Still no big deal as we are speaking v4. > But then consider an issue is found on the client side with v4. > Then the client would happily talk 0..3 while the server would > love to talk using 0,1,2,4. > > The way I think about protocol version negotiation is that > both parties involved have a set of versions that they tolerate > to talk (they might understand more than the tolerated set, but the > user forbade some), and the goal of the negotiation is to find > the highest version number that is part of both the server set > and client set. So quite naturally with this line of thinking the > configuration is to configure a set of versions, which is what > I propose here. Maybe even in the wire format, separated > with colons? I'm sure it wouldn't take too much to change this to be a multi-valued config. Because after this series there is just v0 and v1 I didn't think through this case too much. If others agree then I can go ahead and make it so in a reroll. > > > + If set, clients will attempt to communicate with a server using > > + the specified protocol version. If unset, no attempt will be > > + made by the client to communicate using a particular protocol > > + version, this results in protocol version 0 being used. > > This sounds as if we're going to be really shy at first and only > users that care will try out new versions at their own risk. > From a users POV this may be frustrating as I would imagine that > people want to run > > git config --global protocol.version 2 > > to try it out and then realize that some of their hosts do not speak > 2, so they have to actually configure it per repo/remote. The point would be to be able to set this globally, not per-repo. Even if a repo doesn't speak v2 then it should be able to gracefully degrade to v1 without the user having to do anything. The reason why there is this escape hatch is if doing the protocol negotiation out of band causing issues with communicating with a server that it can be shut off. > > + Supported versions: > > > +* `0` - the original wire protocol. > > In the future this may be misleading as it doesn't specify the date of > when it was original. e.g. are capabilities already supported in "original"? > > Maybe phrase it as "wire protocol as of v2.14" ? (Though this sounds > as if new capabilities added in the future are not allowed) Yeah I can see how this could be misleading, though I'm not sure how best to word it to avoid that. > > > > + > > +extern enum protocol_version parse_protocol_version(const char *value); > > +extern enum protocol_version get_protocol_version_config(void); > > +extern enum protocol_version determine_protocol_version_server(void); > > +extern enum protocol_version determine_protocol_version_client(const char > > *server_response); > > Here is a good place to have some comments. -- Brandon Williams
Re: [PATCH 3/8] daemon: recognize hidden request arguments
On 09/13, Stefan Beller wrote: > On Wed, Sep 13, 2017 at 2:54 PM, Brandon Williams wrote: > > A normal request to git-daemon is structured as > > "command path/to/repo\0host=..\0" and due to a bug in an old version of > > git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the > > command, 2009-06-04) we aren't able to place any extra args (separated > > by NULs) besides the host. > > > > In order to get around this limitation teach git-daemon to recognize > > additional request arguments hidden behind a second NUL byte. Requests > > can then be structured like: > > "command path/to/repo\0host=..\0\0version=1\0key=value\0". git-daemon > > can then parse out the extra arguments and set 'GIT_PROTOCOL' > > accordingly. > > > > Signed-off-by: Brandon Williams > > --- > > daemon.c | 71 > > +++- > > 1 file changed, 61 insertions(+), 10 deletions(-) > > > > diff --git a/daemon.c b/daemon.c > > index 30747075f..250dbf82c 100644 > > --- a/daemon.c > > +++ b/daemon.c > > @@ -282,7 +282,7 @@ static const char *path_ok(const char *directory, > > struct hostinfo *hi) > > return NULL;/* Fallthrough. Deny by default */ > > } > > > > -typedef int (*daemon_service_fn)(void); > > +typedef int (*daemon_service_fn)(const struct argv_array *env); > > struct daemon_service { > > const char *name; > > const char *config_name; > > @@ -363,7 +363,7 @@ static int run_access_hook(struct daemon_service > > *service, const char *dir, > > } > > > > static int run_service(const char *dir, struct daemon_service *service, > > - struct hostinfo *hi) > > + struct hostinfo *hi, const struct argv_array *env) > > { > > const char *path; > > int enabled = service->enabled; > > @@ -422,7 +422,7 @@ static int run_service(const char *dir, struct > > daemon_service *service, > > */ > > signal(SIGTERM, SIG_IGN); > > > > - return service->fn(); > > + return service->fn(env); > > } > > > > static void copy_to_log(int fd) > > @@ -462,25 +462,34 @@ static int run_service_command(struct child_process > > *cld) > > return finish_command(cld); > > } > > > > -static int upload_pack(void) > > +static int upload_pack(const struct argv_array *env) > > { > > struct child_process cld = CHILD_PROCESS_INIT; > > argv_array_pushl(&cld.args, "upload-pack", "--strict", NULL); > > argv_array_pushf(&cld.args, "--timeout=%u", timeout); > > + > > + argv_array_pushv(&cld.env_array, env->argv); > > + > > return run_service_command(&cld); > > } > > > > -static int upload_archive(void) > > +static int upload_archive(const struct argv_array *env) > > { > > struct child_process cld = CHILD_PROCESS_INIT; > > argv_array_push(&cld.args, "upload-archive"); > > + > > + argv_array_pushv(&cld.env_array, env->argv); > > + > > return run_service_command(&cld); > > } > > > > -static int receive_pack(void) > > +static int receive_pack(const struct argv_array *env) > > { > > struct child_process cld = CHILD_PROCESS_INIT; > > argv_array_push(&cld.args, "receive-pack"); > > + > > + argv_array_pushv(&cld.env_array, env->argv); > > + > > return run_service_command(&cld); > > } > > > > @@ -574,7 +583,7 @@ static void canonicalize_client(struct strbuf *out, > > const char *in) > > /* > > * Read the host as supplied by the client connection. > > */ > > -static void parse_host_arg(struct hostinfo *hi, char *extra_args, int > > buflen) > > +static char *parse_host_arg(struct hostinfo *hi, char *extra_args, int > > buflen) > > { > > char *val; > > int vallen; > > @@ -602,6 +611,39 @@ static void parse_host_arg(struct hostinfo *hi, char > > *extra_args, int buflen) > > if (extra_args < end && *extra_args) > > die("Invalid request"); > > } > > + > > + return extra_args; > > +} > > + > > +static void parse_extra_args(struct argv_array *env, const char > > *extra_args, > > +int buflen) > > +{ > > + const char *end = extra_args + buflen; > > + struct strbuf git_protocol = STRBUF_INIT; > > + > > + for (; extra_args < end; extra_args += strlen(extra_args) + 1) { > > + const char *arg = extra_args; > > + > > + /* > > +* Parse the extra arguments, adding most to 'git_protocol' > > +* which will be used to set the 'GIT_PROTOCOL' envvar in > > the > > +* service that will be run. > > +* > > +* If there ends up being a particular arg in the future > > that > > +* git-daemon needs to parse specificly (like the 'host' > > arg) > > +* then it can be parsed here and not added to > > 'git_protocol'. > > +*/ >
Re: [PATCH 2/4] push, fetch: error out for submodule entries not pointing to commits
On 09/12, Jonathan Nieder wrote: > From: Stefan Beller > > The check_has_commit helper uses resolves a submodule entry to a > commit, when validating its existence. As a side effect this means > tolerates a submodule entry pointing to a tag, which is not a valid > submodule entry that git commands would know how to cope with. > > Tighten the check to require an actual commit, not a tag pointing to a > commit. > > Also improve the error handling when a submodule entry points to > non-commit (e.g., a blob) to error out instead of warning and > pretending the pointed to object doesn't exist. > > Signed-off-by: Stefan Beller > Signed-off-by: Jonathan Nieder Looks good! > --- > submodule.c| 33 + > t/t5531-deep-submodule-push.sh | 10 ++ > 2 files changed, 35 insertions(+), 8 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 3cea8221e0..e0da55920d 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -767,19 +767,36 @@ static int append_oid_to_argv(const struct object_id > *oid, void *data) > return 0; > } > > +struct has_commit_data { > + int result; > + const char *path; > +}; > + > static int check_has_commit(const struct object_id *oid, void *data) > { > - int *has_commit = data; > + struct has_commit_data *cb = data; > > - if (!lookup_commit_reference(oid)) > - *has_commit = 0; > + enum object_type type = sha1_object_info(oid->hash, NULL); > > - return 0; > + switch (type) { > + case OBJ_COMMIT: > + return 0; > + case OBJ_BAD: > + /* > + * Object is missing or invalid. If invalid, an error message > + * has already been printed. > + */ > + cb->result = 0; > + return 0; > + default: > + die(_("submodule entry '%s' (%s) is a %s, not a commit"), > + cb->path, oid_to_hex(oid), typename(type)); > + } > } > > static int submodule_has_commits(const char *path, struct oid_array *commits) > { > - int has_commit = 1; > + struct has_commit_data has_commit = { 1, path }; > > /* >* Perform a cheap, but incorrect check for the existence of 'commits'. > @@ -795,7 +812,7 @@ static int submodule_has_commits(const char *path, struct > oid_array *commits) > > oid_array_for_each_unique(commits, check_has_commit, &has_commit); > > - if (has_commit) { > + if (has_commit.result) { > /* >* Even if the submodule is checked out and the commit is >* present, make sure it exists in the submodule's object store > @@ -814,12 +831,12 @@ static int submodule_has_commits(const char *path, > struct oid_array *commits) > cp.dir = path; > > if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) || out.len) > - has_commit = 0; > + has_commit.result = 0; > > strbuf_release(&out); > } > > - return has_commit; > + return has_commit.result; > } > > static int submodule_needs_pushing(const char *path, struct oid_array > *commits) > diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh > index 0f84a53146..39cb2c1c34 100755 > --- a/t/t5531-deep-submodule-push.sh > +++ b/t/t5531-deep-submodule-push.sh > @@ -298,6 +298,16 @@ test_expect_success 'push succeeds if submodule commit > disabling recursion from > ) > ' > > +test_expect_success 'submodule entry pointing at a tag is error' ' > + git -C work/gar/bage tag -a test1 -m "tag" && > + tag=$(git -C work/gar/bage rev-parse test1^{tag}) && > + git -C work update-index --cacheinfo 16 "$tag" gar/bage && > + git -C work commit -m "bad commit" && > + test_when_finished "git -C work reset --hard HEAD^" && > + test_must_fail git -C work push --recurse-submodules=on-demand > ../pub.git master 2>err && > + test_i18ngrep "is a tag, not a commit" err > +' > + > test_expect_success 'push fails if recurse submodules option passed as yes' ' > ( > cd work/gar/bage && > -- > 2.14.1.690.gbb1197296e > -- Brandon Williams
Re: [RFC PATCH v2 1/2] implement fetching of moved submodules
On 09/15, Heiko Voigt wrote: > We store the changed submodules paths to calculate which submodule needs > fetching. This does not work for moved submodules since their paths do > not stay the same in case of a moved submodules. In case of new > submodules we do not have a path in the current checkout, since they > just appeared in this fetch. > > It is more general to collect the submodule names for changes instead of > their paths to include the above cases. > > With the change described above we implement 'on-demand' fetching of > changes in moved submodules. > > Note: This does only work when repositories have a .gitmodules file. In > other words: It breaks if we do not get a name for a repository. > IIRC, consensus was that this is a requirement to get nice submodule > handling these days? > > NEEDSWORK: This breaks t5531 and t5545 because they do not use a > .gitmodules file. I will add a fallback to paths to help such users. > > Signed-off-by: Heiko Voigt > --- > This an update of the previous series[1] to the current master. The > fallback is still missing but now it should not conflict with any topics > in flight anymore (hopefully). So the idea is to collect changed submodule's name, instead of their path, so that if they happened to moved you don't have to worry about the path changing underneath you. This should be good once those tests get fixed. Thanks for working on cleaning this up! :) > > Cheers Heiko > > [1] https://public-inbox.org/git/20170817105349.gc52...@book.hvoigt.net/ > > submodule.c | 91 > + > t/t5526-fetch-submodules.sh | 35 + > 2 files changed, 85 insertions(+), 41 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 3cea8221e0..38b9905e43 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -21,7 +21,7 @@ > #include "parse-options.h" > > static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; > -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP; > +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP; > static int initialized_fetch_ref_tips; > static struct oid_array ref_tips_before_fetch; > static struct oid_array ref_tips_after_fetch; > @@ -667,11 +667,11 @@ const struct submodule *submodule_from_ce(const struct > cache_entry *ce) > } > > static struct oid_array *submodule_commits(struct string_list *submodules, > -const char *path) > +const char *name) > { > struct string_list_item *item; > > - item = string_list_insert(submodules, path); > + item = string_list_insert(submodules, name); > if (item->util) > return (struct oid_array *) item->util; > > @@ -680,39 +680,34 @@ static struct oid_array *submodule_commits(struct > string_list *submodules, > return (struct oid_array *) item->util; > } > > +struct collect_changed_submodules_cb_data { > + struct string_list *changed; > + const struct object_id *commit_oid; > +}; > + > static void collect_changed_submodules_cb(struct diff_queue_struct *q, > struct diff_options *options, > void *data) > { > + struct collect_changed_submodules_cb_data *me = data; > + struct string_list *changed = me->changed; > + const struct object_id *commit_oid = me->commit_oid; > int i; > - struct string_list *changed = data; > > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; > struct oid_array *commits; > + const struct submodule *submodule; > + > if (!S_ISGITLINK(p->two->mode)) > continue; > > - if (S_ISGITLINK(p->one->mode)) { > - /* > - * NEEDSWORK: We should honor the name configured in > - * the .gitmodules file of the commit we are examining > - * here to be able to correctly follow submodules > - * being moved around. > - */ > - commits = submodule_commits(changed, p->two->path); > - oid_array_append(commits, &p->two->oid); > - } else { > - /* Submodule is new or was moved here */ > - /* > - * NEEDSWORK: When the .git directories of submodules > - * live inside the superprojects .git directory some > - * day we should fetch new submodules directly into > - * that location too when config or options request > - * that so they can be checked out from there. > - */ > + submodule = submodule_from_path(commit_oid, p->two->path); > + if (!submodule) >
Re: [RFC PATCH v2 2/2] submodule: simplify decision tree whether to or not to fetch
On 09/15, Heiko Voigt wrote: > To make extending this logic later easier. > > Signed-off-by: Heiko Voigt I like the result of this patch, its much cleaner. > --- > submodule.c | 74 > ++--- > 1 file changed, 37 insertions(+), 37 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 38b9905e43..fa44fc59f2 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1118,6 +1118,31 @@ struct submodule_parallel_fetch { > }; > #define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0} > > +static int get_fetch_recurse_config(const struct submodule *submodule, > + struct submodule_parallel_fetch *spf) > +{ > + if (spf->command_line_option != RECURSE_SUBMODULES_DEFAULT) > + return spf->command_line_option; > + > + if (submodule) { > + char *key; > + const char *value; > + > + int fetch_recurse = submodule->fetch_recurse; > + key = xstrfmt("submodule.%s.fetchRecurseSubmodules", > submodule->name); > + if (!repo_config_get_string_const(the_repository, key, &value)) > { > + fetch_recurse = parse_fetch_recurse_submodules_arg(key, > value); > + } > + free(key); > + > + if (fetch_recurse != RECURSE_SUBMODULES_NONE) > + /* local config overrules everything except commandline > */ > + return fetch_recurse; > + } > + > + return spf->default_option; > +} > + > static int get_next_submodule(struct child_process *cp, > struct strbuf *err, void *data, void **task_cb) > { > @@ -1137,46 +1162,21 @@ static int get_next_submodule(struct child_process > *cp, > > submodule = submodule_from_path(&null_oid, ce->name); > > - default_argv = "yes"; > - if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) { > - int fetch_recurse = RECURSE_SUBMODULES_NONE; > - > - if (submodule) { > - char *key; > - const char *value; > - > - fetch_recurse = submodule->fetch_recurse; > - key = > xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name); > - if > (!repo_config_get_string_const(the_repository, key, &value)) { > - fetch_recurse = > parse_fetch_recurse_submodules_arg(key, value); > - } > - free(key); > - } > - > - if (fetch_recurse != RECURSE_SUBMODULES_NONE) { > - if (fetch_recurse == RECURSE_SUBMODULES_OFF) > - continue; > - if (fetch_recurse == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(&changed_submodule_names, > - > submodule->name)) > - continue; > - default_argv = "on-demand"; > - } > - } else { > - if (spf->default_option == > RECURSE_SUBMODULES_OFF) > - continue; > - if (spf->default_option == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(&changed_submodule_names, > - > submodule->name)) > - continue; > - default_argv = "on-demand"; > - } > - } > - } else if (spf->command_line_option == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(&changed_submodule_names, > + switch (get_fetch_recurse_config(submodule, spf)) > + { > + default: > + case RECURSE_SUBMODULES_DEFAULT: > + case RECURSE_SUBMODULES_ON_DEMAND: > + if (!submodule || > !unsorted_string_list_lookup(&changed_submodule_names, >submodule->name)) > continue; > default_argv = "on-demand"; > + break; > + case RECURSE_SUBMODULES_ON: > + default_argv = "yes"; > + break; > + case RECURSE_SUBMODULES_OFF: > + continue; > } > > strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name); > -- > 2.14.1.145.gb3622a4 > -- Bra
Re: [PATCH] protocol: make parse_protocol_version() private
On 09/17, Ramsay Jones wrote: > > Signed-off-by: Ramsay Jones > --- > > Hi Brandon, > > If you need to re-roll your 'bw/protocol-v1' branch, could you please > squash this into the relevant patch (commit 45954f179e, "protocol: > introduce protocol extention mechanisms", 13-09-2017). > > This assumes you agree that this symbol does not need to be public; if > not, then please just ignore! ;-) Thanks! I've updated my local version of the series to reflect this. > > Thanks! > > ATB, > Ramsay Jones > > protocol.c | 2 +- > protocol.h | 1 - > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/protocol.c b/protocol.c > index 1b16c7b9a..369503065 100644 > --- a/protocol.c > +++ b/protocol.c > @@ -2,7 +2,7 @@ > #include "config.h" > #include "protocol.h" > > -enum protocol_version parse_protocol_version(const char *value) > +static enum protocol_version parse_protocol_version(const char *value) > { > if (!strcmp(value, "0")) > return protocol_v0; > diff --git a/protocol.h b/protocol.h > index 2fa6486d0..18f9a5235 100644 > --- a/protocol.h > +++ b/protocol.h > @@ -7,7 +7,6 @@ enum protocol_version { > protocol_v1 = 1, > }; > > -extern enum protocol_version parse_protocol_version(const char *value); > extern enum protocol_version get_protocol_version_config(void); > extern enum protocol_version determine_protocol_version_server(void); > extern enum protocol_version determine_protocol_version_client(const char > *server_response); > -- > 2.14.0 -- Brandon Williams
RE: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
> -Original Message- > From: Torsten Bögershausen [mailto:tbo...@web.de] > Sent: Monday, September 18, 2017 11:43 AM > To: Ben Peart ; Junio C Hamano > ; Ben Peart > Cc: david.tur...@twosigma.com; ava...@gmail.com; > christian.cou...@gmail.com; git@vger.kernel.org; > johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > Subject: Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index > extension > > On 2017-09-18 15:38, Ben Peart wrote: > > > > > > On 9/17/2017 4:02 AM, Junio C Hamano wrote: > >> Ben Peart writes: > >> > >>> diff --git a/t/helper/test-dump-fsmonitor.c > >>> b/t/helper/test-dump-fsmonitor.c new file mode 100644 index > >>> 00..482d749bb9 > >>> --- /dev/null > >>> +++ b/t/helper/test-dump-fsmonitor.c > >>> @@ -0,0 +1,21 @@ > >>> +#include "cache.h" > >>> + > >>> +int cmd_main(int ac, const char **av) { > >>> + struct index_state *istate = &the_index; > >>> + int i; > >>> + > >>> + setup_git_directory(); > >>> + if (do_read_index(istate, get_index_file(), 0) < 0) > >>> + die("unable to read index file"); > >>> + if (!istate->fsmonitor_last_update) { > >>> + printf("no fsmonitor\n"); > >>> + return 0; > >>> + } > >>> + printf("fsmonitor last update %"PRIuMAX"\n", > >>> istate->fsmonitor_last_update); > >> > >> After pushing this out and had Travis complain, I queued a squash on > >> top of this to cast the argument to (uintmax_t), like you did in an > >> earlier step (I think it was [PATCH 04/12]). > >> > > > > Thanks. I'll update this to cast it as (uint64_t) as that is what > > get/put_be64 use. As far as I can tell they both map to the same > > thing (unsigned long long) so there isn't functional difference. > (Just to double-check): This is the way to print "PRIuMAX" correctly (on all > platforms): > > printf("fsmonitor last update %"PRIuMAX"\n", (uintmax_t)istate- > >fsmonitor_last_update); > Should I just make the variable type itself uintmax_t and then just skip the cast altogether? I went with uint64_t because that is what getnanotime returned.
[PATCH 2/2] read_info_alternates: warn on non-trivial errors
When we fail to open $GIT_DIR/info/alternates, we silently assume there are no alternates. This is the right thing to do for ENOENT, but not for other errors. A hard error is probably overkill here. If we fail to read an alternates file then either we'll complete our operation anyway, or we'll fail to find some needed object. Either way, a warning is good idea. And we already have a helper function to handle this pattern; let's just call warn_on_fopen_error(). Note that technically the errno from strbuf_read_file() might be from a read() error, not open(). But since read() would never return ENOENT or ENOTDIR, and since it produces a generic "unable to access" error, it's suitable for handling errors from either. Signed-off-by: Jeff King --- I'm pretty comfortable with the rationale in the final paragraph. But if we're concerned that warn_on_fopen_errors() might change, we can swap out strbuf_read_file() for fopen()+strbuf_read(). sha1_file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sha1_file.c b/sha1_file.c index b1e4193679..9cec326298 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -432,6 +432,7 @@ static void read_info_alternates(const char * relative_base, int depth) path = xstrfmt("%s/info/alternates", relative_base); if (strbuf_read_file(&buf, path, 1024) < 0) { + warn_on_fopen_errors(path); free(path); return; } -- 2.14.1.1014.g252e627ae0
[PATCH 1/2] read_info_alternates: read contents into strbuf
The link_alt_odb_entries() function has always taken a ptr/len pair as input. Until cf3c635210 (alternates: accept double-quoted paths, 2016-12-12), we made a copy of those bytes in a string. But after that commit, we switched to parsing the input left-to-right, and we ignore "len" totally, instead reading until we hit a NUL. This has mostly gone unnoticed for a few reasons: 1. All but one caller passes a NUL-terminated string, with "len" pointing to the NUL. 2. The remaining caller, read_info_alternates(), passes in an mmap'd file. Unless the file is an exact multiple of the page size, it will generally be followed by NUL padding to the end of the page, which just works. The easiest way to demonstrate the problem is to build with: make SANITIZE=address NO_MMAP=Nope test Any test which involves $GIT_DIR/info/alternates will fail, as the mmap emulation (correctly) does not add an extra NUL, and ASAN complains about reading past the end of the buffer. One solution would be to teach link_alt_odb_entries() to respect "len". But it's actually a bit tricky, since we depend on unquote_c_style() under the hood, and it has no ptr/len variant. We could also just make a NUL-terminated copy of the input bytes and operate on that. But since all but one caller already is passing a string, instead let's just fix that caller to provide NUL-terminated input in the first place, by swapping out mmap for strbuf_read_file(). There's no advantage to using mmap on the alternates file. It's not expected to be large (and anyway, we're copying its contents into an in-memory linked list). Nor is using git_open() buying us anything here, since we don't keep the descriptor open for a long period of time. Let's also drop the "len" parameter entirely from link_alt_odb_entries(), since it's completely ignored. That will avoid any new callers re-introducing a similar bug. Reported-by: Michael Haggerty Signed-off-by: Jeff King --- I didn't reproduce with the mmap even-page-size thing, but I think it's the same reason we didn't notice the "git log -G" read-past-mmap issues for a long time. Which makes testing with ASAN and NO_MMAP an interesting combination, as it should find out any similar cases (and after this, the whole test suite passes with that configuration). sha1_file.c | 29 + 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 5f71bbac3e..b1e4193679 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -398,7 +398,7 @@ static const char *parse_alt_odb_entry(const char *string, return end; } -static void link_alt_odb_entries(const char *alt, int len, int sep, +static void link_alt_odb_entries(const char *alt, int sep, const char *relative_base, int depth) { struct strbuf objdirbuf = STRBUF_INIT; @@ -427,28 +427,17 @@ static void link_alt_odb_entries(const char *alt, int len, int sep, static void read_info_alternates(const char * relative_base, int depth) { - char *map; - size_t mapsz; - struct stat st; char *path; - int fd; + struct strbuf buf = STRBUF_INIT; path = xstrfmt("%s/info/alternates", relative_base); - fd = git_open(path); - free(path); - if (fd < 0) - return; - if (fstat(fd, &st) || (st.st_size == 0)) { - close(fd); + if (strbuf_read_file(&buf, path, 1024) < 0) { + free(path); return; } - mapsz = xsize_t(st.st_size); - map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0); - close(fd); - link_alt_odb_entries(map, mapsz, '\n', relative_base, depth); - - munmap(map, mapsz); + link_alt_odb_entries(buf.buf, '\n', relative_base, depth); + strbuf_release(&buf); } struct alternate_object_database *alloc_alt_odb(const char *dir) @@ -503,7 +492,7 @@ void add_to_alternates_file(const char *reference) if (commit_lock_file(lock)) die_errno("unable to move new alternates file into place"); if (alt_odb_tail) - link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0); + link_alt_odb_entries(reference, '\n', NULL, 0); } free(alts); } @@ -516,7 +505,7 @@ void add_to_alternates_memory(const char *reference) */ prepare_alt_odb(); - link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0); + link_alt_odb_entries(reference, '\n', NULL, 0); } /* @@ -619,7 +608,7 @@ void prepare_alt_odb(void) if (!alt) alt = ""; alt_odb_tail = &alt_odb_list; - link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0); + link_alt_odb_entries(alt, PATH_SEP, NULL, 0); read_info_alternates(get_object_directory(), 0); } -- 2.14.1.1014.g252e627ae0
[PATCH 0/2] fix read past end of array in alternates files
This series fixes a regression in v2.11.1 where we might read past the end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't base the patch on there, for a few reasons: 1. There's a trivial conflict when merging up (because of git_open_noatime() becoming just git_open() in the inerim). 2. The reproduction advice relies on our SANITIZE Makefile knob, which didn't exist back then. 3. The second patch does not apply there because we don't have warn_on_fopen_errors(). Though admittedly it could be applied separately after merging up; it's just a clean-up on top. It does apply on the current "maint". [1/2]: read_info_alternates: read contents into strbuf [2/2]: read_info_alternates: warn on non-trivial errors sha1_file.c | 30 ++ 1 file changed, 10 insertions(+), 20 deletions(-) -Peff
Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
On 2017-09-18 15:38, Ben Peart wrote: > > > On 9/17/2017 4:02 AM, Junio C Hamano wrote: >> Ben Peart writes: >> >>> diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c >>> new file mode 100644 >>> index 00..482d749bb9 >>> --- /dev/null >>> +++ b/t/helper/test-dump-fsmonitor.c >>> @@ -0,0 +1,21 @@ >>> +#include "cache.h" >>> + >>> +int cmd_main(int ac, const char **av) >>> +{ >>> + struct index_state *istate = &the_index; >>> + int i; >>> + >>> + setup_git_directory(); >>> + if (do_read_index(istate, get_index_file(), 0) < 0) >>> + die("unable to read index file"); >>> + if (!istate->fsmonitor_last_update) { >>> + printf("no fsmonitor\n"); >>> + return 0; >>> + } >>> + printf("fsmonitor last update %"PRIuMAX"\n", >>> istate->fsmonitor_last_update); >> >> After pushing this out and had Travis complain, I queued a squash on >> top of this to cast the argument to (uintmax_t), like you did in an >> earlier step (I think it was [PATCH 04/12]). >> > > Thanks. I'll update this to cast it as (uint64_t) as that is what get/put_be64 > use. As far as I can tell they both map to the same thing (unsigned long > long) > so there isn't functional difference. (Just to double-check): This is the way to print "PRIuMAX" correctly (on all platforms): printf("fsmonitor last update %"PRIuMAX"\n", (uintmax_t)istate->fsmonitor_last_update);
Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
On 9/17/2017 12:47 AM, Junio C Hamano wrote: Ben Peart writes: +write_integration_script() { + write_script .git/hooks/fsmonitor-test<<-\EOF + if [ "$#" -ne 2 ]; then + echo "$0: exactly 2 arguments expected" + exit 2 + fi + if [ "$1" != 1 ]; then + echo -e "Unsupported core.fsmonitor hook version.\n" >&2 + exit 1 + fi In addition to "echo -e" thing pointed out earlier, these look somewhat unusual in our shell scripts, relative to what Documentation/CodingGuidelines tells us to do: I'm happy to make these changes. I understand the difficulty of creating a consistent coding style especially after the fact. Copy/paste is usually a developers best friend as it allows you to avoid a lot of errors by reusing existing tested code. One of the times it backfires is when that code doesn't match the current desired coding style. I only point these out to help lend some additional impetus to the effort to formalize the coding style and to provide tooling to handle what should mostly be a mechanical process. IMO, the goal should be to save the maintainer and contributors the cost of having to write up and respond to formatting feedback. :) Some stats on these same coding style errors in the current bash scripts: 298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space) 140 instances of "if \[ .* \]" (not using the preferred "test") 293 instances of "if .*; then" Wouldn't it be great not to have to write up style feedback for when these all get copy/pasted into new scripts? :) - We prefer a space between the function name and the parentheses, and no space inside the parentheses. The opening "{" should also be on the same line. (incorrect) my_function(){ ... (correct) my_function () { ... - We prefer "test" over "[ ... ]". - Do not write control structures on a single line with semicolon. "then" should be on the next line for if statements, and "do" should be on the next line for "while" and "for". (incorrect) if test -f hello; then do this fi (correct) if test -f hello then do this fi diff --git a/t/t7519/fsmonitor-watchman b/t/t7519/fsmonitor-watchman new file mode 100755 index 00..aaee5d1fe3 --- /dev/null +++ b/t/t7519/fsmonitor-watchman @@ -0,0 +1,128 @@ +#!/usr/bin/perl + +use strict; +use warnings; +use IPC::Open2; + ... + open (my $fh, ">", ".git/watchman-query.json"); + print $fh "[\"query\", \"$git_work_tree\", { \ + \"since\": $time, \ + \"fields\": [\"name\"], \ + \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", \"exists\"]]] \ + }]"; + close $fh; + + print CHLD_IN "[\"query\", \"$git_work_tree\", { \ + \"since\": $time, \ + \"fields\": [\"name\"], \ + \"expression\": [\"not\", [\"allof\", [\"since\", $time, \"cclock\"], [\"not\", \"exists\"]]] \ + }]"; This look painful to read, write and maintain. IIRC, Perl supports the < I agree! I'm definitely *not* a perl developer so was unaware of this construct. A few minutes with stack overflow and now I can clean this up. +} \ No newline at end of file Oops. Thanks.
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Hi Junio, On Sat, 16 Sep 2017, Junio C Hamano wrote: > And as you alluded to, we may need to see if we can help making it > easier to do the latter when needed. That mistakes it for "Someone Else's Problem". > Johannes Schindelin writes: > > > That's what you are buying into by having these "What's cooking" mails > > that are in no usable way connected to the original threads. > > For the above reason, I do not think this is a particularly useful > stance to take. I agree, but this is the process you chose to use. > Do you have a concrete suggestion to make these individual entries more > helpful for people who may want go back to the original thread in doing > so? In-reply-to: or References: fields of the "What's cooking" report > would not help. I often have the message IDs that made/helped me make > these individual comments in the description; they alone would not react > to mouse clicks, though. Oh gawd, not even more stuff piled onto the mail format. Please stop. > Having said that, I'd expect that individual contributors have past > messages pertaining to the smaller numbers of their own topics in flight > in their mailbox than the project wide "What's cooking" report covers, > so perhaps an affort to devise such a mechanism may result in an > over-engineering waste nobody finds useful. I dunno. I frequently find myself putting off that What's cooking mail because it simply takes too long to study. It probably tries to serve too many purposes at the same time, and thereby none. In the same vein as "to a hammer, everything looks like a nail": when it comes to project management, a dedicated tool will always beat a general-purpose mailing list. Ciao, Dscho
Re: [PATCH v6 12/12] fsmonitor: add a performance test
Hi Ben, sorry for not catching this earlier: On Fri, 15 Sep 2017, Ben Peart wrote: > [...] > + > +int cmd_dropcaches(void) > +{ > + HANDLE hProcess = GetCurrentProcess(); > + HANDLE hToken; > + int status; > + > + if (!OpenProcessToken(hProcess, TOKEN_QUERY | TOKEN_ADJUST_PRIVILEGES, > &hToken)) > + return error("Can't open current process token"); > + > + if (!GetPrivilege(hToken, "SeProfileSingleProcessPrivilege", 1)) > + return error("Can't get SeProfileSingleProcessPrivilege"); > + > + CloseHandle(hToken); > + > + HMODULE ntdll = LoadLibrary("ntdll.dll"); Git's source code still tries to abide by C90, and for simplicity's sake, this extends to the Windows-specific part. Therefore, the `ntdll` variable needs to be declared at the beginning of the function (I do agree that it makes for better code to reduce the scope of variables, but C90 simply did not allow variables to be declared in the middle of functions). I wanted to send a patch address this in the obvious way, but then I encountered these lines: > + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) = > + (DWORD(WINAPI *)(INT, PVOID, ULONG))GetProcAddress(ntdll, > "NtSetSystemInformation"); > + if (!NtSetSystemInformation) > + return error("Can't get function addresses, wrong Windows > version?"); It turns out that we have seen this plenty of times in Git for Windows' fork, so much so that we came up with a nice helper to make this all a bit more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and INIT_PROC_ADDR helpers in compat/win32/lazyload.h. Maybe this would be the perfect excuse to integrate this patch into upstream Git? This would be the patch (you can also cherry-pick it from 25c4dc3a73352e72e995594cf1b4afa46e93d040 in https://github.com/dscho/git): -- snip -- >From 25c4dc3a73352e72e995594cf1b4afa46e93d040 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Tue, 10 Jan 2017 23:14:20 +0100 Subject: [PATCH] Win32: simplify loading of DLL functions Dynamic loading of DLL functions is duplicated in several places in Git for Windows' source code. This patch adds a pair of macros to simplify the process: the DECLARE_PROC_ADDR(, , , ..) macro to be used at the beginning of a code block, and the INIT_PROC_ADDR() macro to call before using the declared function. The return value of the INIT_PROC_ADDR() call has to be checked; If it is NULL, the function was not found in the specified DLL. Example: DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); if (!INIT_PROC_ADDR(CreateHardLinkW)) return error("Could not find CreateHardLinkW() function"; if (!CreateHardLinkW(source, target, NULL)) return error("could not create hardlink from %S to %S", source, target); return 0; Signed-off-by: Karsten Blees Signed-off-by: Johannes Schindelin --- compat/win32/lazyload.h | 44 1 file changed, 44 insertions(+) create mode 100644 compat/win32/lazyload.h diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file mode 100644 index 000..91c10dad2fb --- /dev/null +++ b/compat/win32/lazyload.h @@ -0,0 +1,44 @@ +#ifndef LAZYLOAD_H +#define LAZYLOAD_H + +/* simplify loading of DLL functions */ + +struct proc_addr { + const char *const dll; + const char *const function; + FARPROC pfunction; + unsigned initialized : 1; +}; + +/* Declares a function to be loaded dynamically from a DLL. */ +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \ + static struct proc_addr proc_addr_##function = \ + { #dll, #function, NULL, 0 }; \ + static rettype (WINAPI *function)(__VA_ARGS__) + +/* + * Loads a function from a DLL (once-only). + * Returns non-NULL function pointer on success. + * Returns NULL + errno == ENOSYS on failure. + */ +#define INIT_PROC_ADDR(function) \ + (function = get_proc_addr(&proc_addr_##function)) + +static inline void *get_proc_addr(struct proc_addr *proc) +{ + /* only do this once */ + if (!proc->initialized) { + HANDLE hnd; + proc->initialized = 1; + hnd = LoadLibraryExA(proc->dll, NULL, +LOAD_LIBRARY_SEARCH_SYSTEM32); + if (hnd) + proc->pfunction = GetProcAddress(hnd, proc->function); + } + /* set ENOSYS if DLL or function was not found */ + if (!proc->pfunction) + errno = ENOSYS; + return proc->pfunction; +} + +#endif -- snap -- With this patch, this fixup to your patch would make things compile (you can also cherry-pick d05996fb61027512b8ab31a36c4a7a677dea11bb from my fork): -- snipsnap -- >From d05996fb61027512b8ab31a36c4a7a677dea11bb Mon Sep 17 00:00:00 2001 From: Johannes Schinde
Re: [RFC PATCH 0/2] Add named reference to latest push cert
Hello, everyone. Sorry for being late in this thread, I was making sure I didn't say anything outrageously wrong. > That's Stefan; I wouldn't have suggested any approach that uses the > blob whose sole purpose is to serve as a temporary storage area to > pass the information to the hook as part of the permanent record. > Hmm, as far as I understand *this* is the status quo. We get an ephemeral sha1/oid only if you have a hook attached. Otherwise, you will never see the object at all, even if you have --signed set. I suggested preparing this RFC/Patch because of the following reasons (please let me know if my understanding of any of this is wrong...): - I find it weird that the cli allows for a --signed push and nowhere in the receive-pack's feedback you're told if the push certificate is compute/stored/handled at all. I think that, at the latest, receive pack should let the user know whether the signed push succeeded, or if there is no hook attached to handle it. - *if there is a hook* the blob is computed, but it is up to the hook itself to store it *somewhere*. This makes me feel like it's somewhat of a useless waste of computation if the hook is not meant to handle it anyway(which is just a post-receive hook). I find it rather weird that --signed is a builtin flag, and is handled on the server side only partially (just my two cents). - To me, the way push certificates are handled now feels like having git commit -S producing a detached signature that you have to embed somehow in the resulting commit object. (As a matter of fact, many points on [1] seem to back this notion, and even recall some drawbacks on push certificates the way they are handled today) I understand the concurrency concerns, so I agree with stefan's solution, although I don't know how big of a deal it would be, if git already supports --atomic pushes (admittedly, I haven't checked if there are any guards in place for someone who pushes millions of refs atomically). It'd be completely understandable to have a setting to disable hadnling of --signed pushes and, ideally, a way to warn the user of this feature being disabled if --signed is sent. Maybe I'm missing the bigger picture, but to me it feels that a named ref to the latest push certificate would make it easier to handle/tool/sync around the push certificate solution? Thanks, -Santiago. [1] https://public-inbox.org/git/CAJo=hJvWbjEM9E5AjPHgmQ=eY8xf=Q=xtukeu2ur7auuqea...@mail.gmail.com/ signature.asc Description: PGP signature
Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
On 9/16/2017 11:27 AM, Torsten Bögershausen wrote: On 2017-09-15 21:20, Ben Peart wrote: +if [ "$1" != 1 ] +then + echo -e "Unsupported core.fsmonitor hook version.\n" >&2 + exit 1 +fi The echo -e not portable (It was detected by a tighter version of the lint script, which I have here, but not yet send to the list :-( This will do: echo "Unsupported core.fsmonitor hook version." >&2 Thanks, I'll fix these and the ones in the t/t7519 directory.
Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
On 9/18/2017 9:32 AM, David Turner wrote: -Original Message- From: Ben Peart [mailto:peart...@gmail.com] Sent: Monday, September 18, 2017 9:07 AM To: David Turner ; 'Ben Peart' Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. Thanks for taking the time to review/provide feedback! On 9/15/2017 5:35 PM, David Turner wrote: -Original Message- From: Ben Peart [mailto:benpe...@microsoft.com] Sent: Friday, September 15, 2017 3:21 PM To: benpe...@microsoft.com Cc: David Turner ; ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. +int git_config_get_fsmonitor(void) +{ + if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor)) + core_fsmonitor = getenv("GIT_FSMONITOR_TEST"); + + if (core_fsmonitor && !*core_fsmonitor) + core_fsmonitor = NULL; + + if (core_fsmonitor) + return 1; + + return 0; +} This functions return values are backwards relative to the rest of the git_config_* functions. I'm confused. If core.fsmonitor is configured, it returns 1. If it is not configured, it returns 0. I don't make use of the -1 /* default value */ option as I didn't see any use/value in this case. What is backwards? The other git_config_* functions return 1 for error and 0 for success. Hmm, I followed the model (ie copy/paste :)) used by the tracked cache. If you look at how it uses, the return value, it is 0 == false == remove the extension, 1 == true == add the extension. I'm doing the same with fsmonitor. static void tweak_untracked_cache(struct index_state *istate) { switch (git_config_get_untracked_cache()) { case -1: /* keep: do nothing */ break; case 0: /* false */ remove_untracked_cache(istate); break; case 1: /* true */ add_untracked_cache(istate); break; default: /* unknown value: do nothing */ break; } } void tweak_fsmonitor(struct index_state *istate) { switch (git_config_get_fsmonitor()) { case -1: /* keep: do nothing */ break; case 0: /* false */ remove_fsmonitor(istate); break; case 1: /* true */ add_fsmonitor(istate); break; default: /* unknown value: do nothing */ break; } } [snip] +> /* +>* With fsmonitor, we can trust the untracked cache's valid field. +>*/ Did you intend to make a comment here? Sorry. I was going to make a comment that I didn't see how that could work since we weren't touching the untracked cache here, but then I saw the bit further down. I'm still not sure it works (see comment on 10/12), but at least it could in theory work. The logic here assumes that when the index is written out, it is valid including the untracked cache and the CE_FSMONITOR_VALID bits. Therefore it should still all be valid as of the time the fsmonitor was queried and the index saved. Another way of thinking about this is that the fsmonitor extension is simply adding another persisted bit on each index entry. It just gets persisted/restored differently than the other persisted bits. Obviously, before we can use it assuming it reflects the *current* state of the working directory, we have to refresh the bits via the refresh logic.
Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
On 9/17/2017 4:02 AM, Junio C Hamano wrote: Ben Peart writes: diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c new file mode 100644 index 00..482d749bb9 --- /dev/null +++ b/t/helper/test-dump-fsmonitor.c @@ -0,0 +1,21 @@ +#include "cache.h" + +int cmd_main(int ac, const char **av) +{ + struct index_state *istate = &the_index; + int i; + + setup_git_directory(); + if (do_read_index(istate, get_index_file(), 0) < 0) + die("unable to read index file"); + if (!istate->fsmonitor_last_update) { + printf("no fsmonitor\n"); + return 0; + } + printf("fsmonitor last update %"PRIuMAX"\n", istate->fsmonitor_last_update); After pushing this out and had Travis complain, I queued a squash on top of this to cast the argument to (uintmax_t), like you did in an earlier step (I think it was [PATCH 04/12]). Thanks. I'll update this to cast it as (uint64_t) as that is what get/put_be64 use. As far as I can tell they both map to the same thing (unsigned long long) so there isn't functional difference.
Re: [PATCH] t2018: remove unwanted empty line
On Monday 18 September 2017 12:52 AM, Kevin Daudt wrote: Why is this empty line unwanted? This kind of whitespace can help separate logical sections, just like paragraphs would. You're right. I seem to have sent a fix precariously because I haven't such separation before (forgetting the fact that git has contributors whose way of writing test vary diversly). Should better stop back and think the next time rather than spamming the list. Sorry, for this. --- Kaartic
RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
> -Original Message- > From: Ben Peart [mailto:peart...@gmail.com] > Sent: Monday, September 18, 2017 9:07 AM > To: David Turner ; 'Ben Peart' > > Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; > gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; > p...@peff.net > Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a > file > system monitor to speed up detecting new or changed files. > > Thanks for taking the time to review/provide feedback! > > On 9/15/2017 5:35 PM, David Turner wrote: > >> -Original Message- > >> From: Ben Peart [mailto:benpe...@microsoft.com] > >> Sent: Friday, September 15, 2017 3:21 PM > >> To: benpe...@microsoft.com > >> Cc: David Turner ; ava...@gmail.com; > >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; > >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > >> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize > >> a file system monitor to speed up detecting new or changed files. > > > >> +int git_config_get_fsmonitor(void) > >> +{ > >> + if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor)) > >> + core_fsmonitor = getenv("GIT_FSMONITOR_TEST"); > >> + > >> + if (core_fsmonitor && !*core_fsmonitor) > >> + core_fsmonitor = NULL; > >> + > >> + if (core_fsmonitor) > >> + return 1; > >> + > >> + return 0; > >> +} > > > > This functions return values are backwards relative to the rest of the > git_config_* functions. > > I'm confused. If core.fsmonitor is configured, it returns 1. If it is not > configured, it returns 0. I don't make use of the -1 /* default value */ > option > as I didn't see any use/value in this case. What is backwards? The other git_config_* functions return 1 for error and 0 for success. > > [snip] > > > > +> /* > > +> * With fsmonitor, we can trust the untracked cache's valid field. > > +> */ > > > > Did you intend to make a comment here? Sorry. I was going to make a comment that I didn't see how that could work since we weren't touching the untracked cache here, but then I saw the bit further down. I'm still not sure it works (see comment on 10/12), but at least it could in theory work.
Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.
On 9/17/2017 4:03 AM, Junio C Hamano wrote: Ben Peart writes: +[[fsmonitor-watchman]] +fsmonitor-watchman +~~~ I've queued a mini squash on top to make sure the line aligns with the length of the string above it by adding three ~'s here. Thanks, I'll do the same assuming there will be another re-roll.
Re: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension.
On 9/15/2017 3:43 PM, David Turner wrote: -Original Message- From: Ben Peart [mailto:benpe...@microsoft.com] Sent: Friday, September 15, 2017 3:21 PM To: benpe...@microsoft.com Cc: David Turner ; ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: [PATCH v6 05/12] fsmonitor: add documentation for the fsmonitor extension. This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension. Signed-off-by: Ben Peart --- Documentation/config.txt | 6 ++ Documentation/githooks.txt | 23 +++ Documentation/technical/index-format.txt | 19 +++ 3 files changed, 48 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index dc4e3f58a2..c196007a27 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -413,6 +413,12 @@ core.protectNTFS:: 8.3 "short" names. Defaults to `true` on Windows, and `false` elsewhere. +core.fsmonitor:: + If set, the value of this variable is used as a command which + will identify all files that may have changed since the + requested date/time. This information is used to speed up git by + avoiding unnecessary processing of files that have not changed. I'm confused here. You have a file called "fsmonitor-watchman", which seems to discuss the protocol for core.fsmonitor scripts in general, and you have this documentation, which does not link to that file. Can you clarify this? I'll add the missing link to the documentation in githooks.txt. The documentation should be enough for someone to develop another integration script. The fsmonitor-watchman script allows people to easily use this patch series with the existing Watchman monitor but it can certainly also be used as a sample for how to integrate with another file system monitor. +The hook should output to stdout the list of all files in the working +directory that may have changed since the requested time. The logic +should be inclusive so that it does not miss any potential changes. +"It is OK to include files which have not actually changed. Newly-created and deleted files should also be included. When files are renamed, both the old and the new name should be included." Also, please discuss case sensitivity issues (e.g. on OS X). +The paths should be relative to the root of the working directory and +be separated by a single NUL. + - 32-bit version number: the current supported version is 1. + + - 64-bit time: the extension data reflects all changes through the given + time which is stored as the nanoseconds elapsed since midnight, + January 1, 1970. Nit: Please specify signed or unsigned for these. (I expect to be getting out of cryosleep around 2262, and I want to know if my old git repos will keep working...) While I'm not opposed to specifying unsigned, I did notice that the only place signed/unsigned is specified today is in "index entry." Everywhere else doesn't specify so I left it off for consistency. I've not seen negative version numbers nor negative time so am not entirely sure it is necessary to clarify. :) + - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. + + - An ewah bitmap, the n-th bit indicates whether the n-th index entry +is not CE_FSMONITOR_VALID.
Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
Thanks for taking the time to review/provide feedback! On 9/15/2017 5:35 PM, David Turner wrote: -Original Message- From: Ben Peart [mailto:benpe...@microsoft.com] Sent: Friday, September 15, 2017 3:21 PM To: benpe...@microsoft.com Cc: David Turner ; ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. +int git_config_get_fsmonitor(void) +{ + if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor)) + core_fsmonitor = getenv("GIT_FSMONITOR_TEST"); + + if (core_fsmonitor && !*core_fsmonitor) + core_fsmonitor = NULL; + + if (core_fsmonitor) + return 1; + + return 0; +} This functions return values are backwards relative to the rest of the git_config_* functions. I'm confused. If core.fsmonitor is configured, it returns 1. If it is not configured, it returns 0. I don't make use of the -1 /* default value */ option as I didn't see any use/value in this case. What is backwards? [snip] +> /* +>* With fsmonitor, we can trust the untracked cache's valid field. +>*/ Did you intend to make a comment here? [snip] +int read_fsmonitor_extension(struct index_state *istate, const void *data, + unsigned long sz) +{ If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak. Good catch! Thank you. [snip] + /* a fsmonitor process can return '*' to indicate all entries are invalid */ That's not documented in your documentation. Also, I'm not sure I like it: what if I have a file whose name starts with '*'? Yeah, that would be silly, but this indicates the need for the protocol to have some sort of signaling mechanism that's out-of-band Maybe have some key\0value\0 pairs and then \0\0 and then the list of files? Or, if you want to keep it really simple, allow an entry of '/' (which is an invalid filename) to mean 'all'. Yea, this was an optimization I added late in the game to get around an issue in Watchman where it returns the name of every file the first time you query it (rather than the set of files that have actually changed since the requested time). I didn't realize the wild card '*' was a valid character for a filename. I'll switch to '/' as you suggest as I don't want to complicate things unnecessarily to handle this relatively rare optimization. I'll also get it documented properly. Thanks! +void add_fsmonitor(struct index_state *istate) { + int i; + + if (!istate->fsmonitor_last_update) { [snip] + /* reset the untracked cache */ Is this really necessary? Shouldn't the untracked cache be in a correct state already? When fsmonitor is not turned on, I'm not explicitly invalidating untracked cache directory entries as git makes changes to files. While I doubt the sequence happens of 1) git making changes to files, *then* 2) turning on fsmonitor - I thought it better safe than sorry to assume that pattern won't ever happen in the future. Especially since turning on the extension is rare and the cost is low. +/* + * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate Nit: "s/entries/entry's/".
Re: [PATCH] test-drop-caches: mark file local symbols with static
On 9/17/2017 12:14 PM, Ramsay Jones wrote: Signed-off-by: Ramsay Jones --- Hi Ben, If you need to re-roll your 'bp/fsmonitor' branch, could you please squash this into the relevant patch (commit c6b5a28941, "fsmonitor: add a performance test", 15-09-2017). Absolutely. Thanks! I'd really appreciate some feedback on whether this works properly on platform other than Windows. Thanks! ATB, Ramsay Jones t/helper/test-drop-caches.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c index 717079865..17590170f 100644 --- a/t/helper/test-drop-caches.c +++ b/t/helper/test-drop-caches.c @@ -2,7 +2,7 @@ #if defined(GIT_WINDOWS_NATIVE) -int cmd_sync(void) +static int cmd_sync(void) { char Buffer[MAX_PATH]; DWORD dwRet; @@ -49,7 +49,7 @@ typedef enum _SYSTEM_MEMORY_LIST_COMMAND { MemoryCommandMax } SYSTEM_MEMORY_LIST_COMMAND; -BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags) +static BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags) { BOOL bResult; DWORD dwBufferLength; @@ -77,7 +77,7 @@ BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags) return bResult; } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { HANDLE hProcess = GetCurrentProcess(); HANDLE hToken; @@ -118,36 +118,36 @@ int cmd_dropcaches(void) #elif defined(__linux__) -int cmd_sync(void) +static int cmd_sync(void) { return system("sync"); } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { return system("echo 3 | sudo tee /proc/sys/vm/drop_caches"); } #elif defined(__APPLE__) -int cmd_sync(void) +static int cmd_sync(void) { return system("sync"); } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { return system("sudo purge"); } #else -int cmd_sync(void) +static int cmd_sync(void) { return 0; } -int cmd_dropcaches(void) +static int cmd_dropcaches(void) { return error("drop caches not implemented on this platform"); }
Re: RFC v3: Another proposed hash function transition plan
Hi Johannes, > SHA-256 got much more cryptanalysis than SHA3-256 […]. I do not think this is true. Keccak/SHA-3 actually got (and is still getting) a lot of cryptanalysis, with papers published at renowned crypto conferences [1]. Keccak/SHA-3 is recognized to have a significant safety margin. E.g., one can cut the number of rounds in half (as in Keyak or KangarooTwelve) and still get a very strong function. I don't think we could say the same for SHA-256 or SHA-512… Kind regards, Gilles, for the Keccak team [1] https://keccak.team/third_party.html
Re: [PATCH 1/3] sha1_name: Create perf test for find_unique_abbrev()
On 9/17/2017 5:51 PM, Junio C Hamano wrote: Derrick Stolee writes: +int cmd_main(int ac, const char **av) +{ + setup_git_directory(); As far as I recall, we do not (yet) allow declaration after statement in our codebase. Move this down to make it after all decls. Will fix. + + unsigned int hash_delt = 0x13579BDF; + unsigned int hash_base = 0x01020304; + struct object_id oid; + + int i, count = 0; + int n = sizeof(struct object_id) / sizeof(int); It probably is technically OK to assume sizeof(int) always equals to sizeof(unsigned), but because you use 'n' _only_ to work with uint and never with int, it would make more sense to match. Will fix. I also notice that "n" should be const. But I do not think we want this "clever" optimization that involves 'n' in the first place. >>> + while (count++ < 10) { + for (i = 0; i < n; i++) + ((unsigned int*)oid.hash)[i] = hash_base; Does it make sense to assume that uint is always 4-byte (so this code won't work if it is 8-byte on your platform) and doing this is faster than using platform-optimized memcpy()? I'm not sure what you mean by using memcpy to improve this, because it would require calling memcpy in the inner loop, such as for (i = 0; i < n; i++) memcpy(oid.hash + i * sizeof(unsigned), &hash_base, sizeof(unsigned)); I'm probably misunderstanding your intended use of memcpy(). + find_unique_abbrev(oid.hash, MINIMUM_ABBREV); + + hash_base += hash_delt; + } I also wonder if this is measuring the right thing. I am guessing that by making many queries for a unique abbreviation of "random" (well, it is deterministic, but my point is these queries are not based on the names of objects that exist in the repository) hashes, this test measures how much time it takes for us to decide that such a random hash does not exist. In the real life use, we make the opposite query far more frequently: we have an object that we _know_ exists in the repository and we try to find a sufficient length to disambiguate it from others, and I suspect that these two use different logic. Don't you need to be measuring the time it takes to compute the shortest abbreviation of an object that exists instead? First, this doesn't just measure the time it takes to determine non- existence, because it finds the len required to disambiguate an "incoming" hash from all known hashes. When testing, I put in a simple printf to report the result abbreviation so I could see how often it needed to be extended. In this sense, the test exposes the while loop that is removed by PATCH 2/3. Second, your criticism about extant hashes is valid, and one I struggled to reconcile. I see two issues with testing known hashes: 1. By determining the hash exists, we have inspected the file that contains it (either a .idx or the loose-object directory). This has side-effects that warm up the file cache so the looped method is artificially faster to find matching objects. The effect is particularly significant on a repo with exactly one packfile. 2. If we iterate over the entire set of objects, this test takes O(N*t(N)) time, where t(N) is the average time to compute a minimum abbreviation. For large repos, this can take several minutes. Instead, even with the constant 100,000 test hashes, we have an O(t(N)) test. We could avoid the asymptotic growth by limiting the number of existing hashes we use, but how do we find a sufficiently uniform sample from them? By looking at some other perf tests, I see that we can add a pre- requisite action. I will investigate making another helper that uniformly selects a set of hashes from the repo and writes them to stdout in a random order. p0008-abbrev.sh will run the helper as a prerequisite, saving the data to a file. test-abbrev will read the hashes from stdin to test find_unique_abbrev. This should avoid the side-effects I mentioned (test-abbrev will not warm up indexes) while also testing abbreviation lengths for existing objects. I'll get started on these changes and send a new patch with new perf data in a couple days. Please let me know if there is a better way to measure performance for this change. Thanks, -Stolee