[PATCH v7 04/10] fsck: support referenced promisor objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach fsck to not treat missing promisor objects indirectly pointed to
by refs as an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   | 11 +++
 t/t0410-partial-clone.sh | 23 +++
 2 files changed, 34 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index ee937bb..4c2a56d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void 
*data, struct fsck_opt
if (obj->flags & REACHABLE)
return 0;
obj->flags |= REACHABLE;
+
+   if (is_promisor_object(>oid))
+   /*
+* Further recursion does not need to be performed on this
+* object since it is a promisor object (so it does not need to
+* be added to "pending").
+*/
+   return 0;
+
if (!(obj->flags & HAS_OBJ)) {
if (parent && !has_object_file(>oid)) {
printf("broken link from %7s %s\n",
@@ -208,6 +217,8 @@ static void check_reachable_object(struct object *obj)
 * do a full fsck
 */
if (!(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object(>oid))
+   return;
if (has_sha1_pack(obj->oid.hash))
return; /* it is in pack - forget about it */
printf("missing %s %s\n", printable_type(obj),
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index bf75162..4f9931f 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -102,4 +102,27 @@ test_expect_success 'missing ref object, but promised, 
passes fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+   test_commit -C repo 2 &&
+   test_commit -C repo 3 &&
+   git -C repo tag -a annotated_tag -m "annotated tag" &&
+
+   C=$(git -C repo rev-parse 1) &&
+   T=$(git -C repo rev-parse 2^{tree}) &&
+   B=$(git hash-object repo/3.t) &&
+   AT=$(git -C repo rev-parse annotated_tag) &&
+
+   promise_and_delete "$C" &&
+   promise_and_delete "$T" &&
+   promise_and_delete "$B" &&
+   promise_and_delete "$AT" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.9.3



[PATCH v7 02/10] fsck: introduce partialclone extension

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  2 +-
 cache.h  |  3 +-
 packfile.c   | 77 +++--
 packfile.h   | 13 
 t/t0410-partial-clone.sh | 81 
 5 files changed, 171 insertions(+), 5 deletions(-)
 create mode 100755 t/t0410-partial-clone.sh

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 56afe40..2934299 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!is_promisor_object(oid)) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index 35e3f5e..c76f2e9 100644
--- a/cache.h
+++ b/cache.h
@@ -1587,7 +1587,8 @@ extern struct packed_git {
unsigned pack_local:1,
 pack_keep:1,
 freshened:1,
-do_not_close:1;
+do_not_close:1,
+pack_promisor:1;
unsigned char sha1[20];
struct revindex_entry *revindex;
/* something like ".git/objects/pack/x.pack" */
diff --git a/packfile.c b/packfile.c
index 4a5fe7a..234797c 100644
--- a/packfile.c
+++ b/packfile.c
@@ -8,6 +8,11 @@
 #include "list.h"
 #include "streaming.h"
 #include "sha1-lookup.h"
+#include "commit.h"
+#include "object.h"
+#include "tag.h"
+#include "tree-walk.h"
+#include "tree.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -643,10 +648,10 @@ struct packed_git *add_packed_git(const char *path, 
size_t path_len, int local)
return NULL;
 
/*
-* ".pack" is long enough to hold any suffix we're adding (and
+* ".promisor" is long enough to hold any suffix we're adding (and
 * the use xsnprintf double-checks that)
 */
-   alloc = st_add3(path_len, strlen(".pack"), 1);
+   alloc = st_add3(path_len, strlen(".promisor"), 1);
p = alloc_packed_git(alloc);
memcpy(p->pack_name, path, path_len);
 
@@ -654,6 +659,10 @@ struct packed_git *add_packed_git(const char *path, size_t 
path_len, int local)
if (!access(p->pack_name, F_OK))
p->pack_keep = 1;
 
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".promisor");
+   if (!access(p->pack_name, F_OK))
+   p->pack_promisor = 1;
+
xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
free(p);
@@ -781,7 +790,8 @@ static void prepare_packed_git_one(char *objdir, int local)
if (ends_with(de->d_name, ".idx") ||
ends_with(de->d_name, ".pack") ||
ends_with(de->d_name, ".bitmap") ||
-   ends_with(de->d_name, ".keep"))
+   ends_with(de->d_name, ".keep") ||
+   ends_with(de->d_name, ".promisor"))
string_list_append(, path.buf);
else
report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
@@ -1889,6 +1899,9 @@ int for_each_packed_object(each_packed_object_fn cb, void 
*data, unsigned flags)
for (p = packed_git; p; p = p->next) {
if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
continue;
+   if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) &&
+   !p->pack_promisor)
+   continue;
if (open_pack_index(p)) {
pack_errors = 1;
continue;
@@ -1899,3 +1912,61 @@ int for_each_packed_object(each_packed_object_fn cb, 
void *data, unsigned flags)
}
return r ? r : pack_errors;
 }
+
+static int add_promisor_object(const struct object_id *oid,
+   

[PATCH v7 10/10] gc: do not repack promisor packfiles

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/git-pack-objects.txt | 11 
 builtin/gc.c   |  3 +++
 builtin/pack-objects.c | 37 --
 builtin/prune.c|  7 +
 builtin/repack.c   |  8 --
 t/t0410-partial-clone.sh   | 54 --
 6 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index aa403d0..81bc490 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -255,6 +255,17 @@ a missing object is encountered.  This is the default 
action.
 The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
++
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing object will raise an error.
+
+--exclude-promisor-objects::
+   Omit objects that are known to be in the promisor remote.  (This
+   option has the purpose of operating only on locally created objects,
+   so that when we repack, we still maintain a distinction between
+   locally created objects [without .promisor] and objects from the
+   promisor remote [with .promisor].)  This is used with partial clone.
 
 SEE ALSO
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0..77fa720 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_push(, prune_expire);
if (quiet)
argv_array_push(, "--no-progress");
+   if (repository_format_partial_clone)
+   argv_array_push(,
+   "--exclude-promisor-objects");
if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
return error(FAILED_RUN, prune.argv[0]);
}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 45ad35d..f5fc401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -75,6 +75,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0;
 static struct list_objects_filter_options filter_options;
 
 enum missing_action {
-   MA_ERROR = 0,/* fail if any missing objects are encountered */
-   MA_ALLOW_ANY,/* silently allow ALL missing objects */
+   MA_ERROR = 0,  /* fail if any missing objects are encountered */
+   MA_ALLOW_ANY,  /* silently allow ALL missing objects */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
@@ -2577,6 +2580,20 @@ static void show_object__ma_allow_any(struct object 
*obj, const char *name, void
show_object(obj, name, data);
 }
 
+static void show_object__ma_allow_promisor(struct object *obj, const char 
*name, void *data)
+{
+   assert(arg_missing_action == MA_ALLOW_PROMISOR);
+
+   /*
+* Quietly ignore EXPECTED missing objects.  This avoids problems with
+* staging them now and getting an odd error later.
+*/
+   if (!has_object_file(>oid) && is_promisor_object(>oid))
+   return;
+
+   show_object(obj, name, data);
+}
+
 static int option_parse_missing_action(const struct option *opt,
   const char *arg, int unset)
 {
@@ -2591,10 +2608,18 @@ static int option_parse_missing_action(const struct 
option *opt,
 
if (!strcmp(arg, "allow-any")) {
arg_missing_action = MA_ALLOW_ANY;
+   fetch_if_missing = 0;
fn_show_object = show_object__ma_allow_any;
return 0;
}
 
+   if (!strcmp(arg, "allow-promisor")) {
+   arg_missing_action = MA_ALLOW_PROMISOR;
+   fetch_if_missing = 0;
+   fn_show_object = show_object__ma_allow_promisor;
+   return 0;
+   }
+
die(_("invalid value for --missing"));
   

[PATCH v7 08/10] sha1_file: support lazily fetching missing objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
 about the loose/packed distinction and operate on both differently,
 and ensure that they can handle the concept of objects that are
 neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
 - this searches a single pack that is provided as an argument; the
   caller already knows (through other means) that the sought object
   is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
 it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c   |  2 ++
 builtin/fetch-pack.c |  2 ++
 builtin/fsck.c   |  3 +++
 builtin/index-pack.c |  6 ++
 cache.h  |  8 
 fetch-object.c   |  3 +++
 sha1_file.c  | 32 ++
 t/t0410-partial-clone.sh | 51 
 8 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd..cf9ea5c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -475,6 +475,8 @@ static int batch_objects(struct batch_options *opt)
 
for_each_loose_object(batch_loose_object, , 0);
for_each_packed_object(batch_packed_object, , 0);
+   if (repository_format_partial_clone)
+   warning("This repository has extensions.partialClone 
set. Some objects may not be loaded.");
 
cb.opt = opt;
cb.expand = 
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe72..15eeed7 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("fetch-pack");
 
memset(, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 578a7c8..3b76c0e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
int i;
struct alternate_object_database *alt;
 
+   /* fsck knows how to handle missing promisor objects */
+   fetch_if_missing = 0;
+
errors_found = 0;
check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 24c2f05..a0a35e6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */
int report_end_of_input = 0;
 
+   /*
+* index-pack never needs to fetch missing objects, since it only
+* accesses the repo to do hash collision checks
+*/
+   fetch_if_missing = 0;
+
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index c76f2e9..6980072 100644
--- a/cache.h
+++ b/cache.h
@@ -1727,6 +1727,14 @@ 

[PATCH v7 07/10] introduce fetch-object: fetch one promisor object

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 Documentation/gitremote-helpers.txt |  7 ++
 Makefile|  1 +
 builtin/fetch-pack.c|  8 +++
 builtin/index-pack.c| 16 ++---
 fetch-object.c  | 24 +++
 fetch-object.h  |  6 +
 fetch-pack.c| 48 +
 fetch-pack.h|  8 +++
 remote-curl.c   | 14 ++-
 transport.c |  8 +++
 transport.h | 11 +
 11 files changed, 126 insertions(+), 25 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h

diff --git a/Documentation/gitremote-helpers.txt 
b/Documentation/gitremote-helpers.txt
index 4a584f3..4b8c93e 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -466,6 +466,13 @@ set by Git if the remote helper has the 'option' 
capability.
Transmit  as a push option. As the push option
must not contain LF or NUL characters, the string is not encoded.
 
+'option from-promisor' {'true'|'false'}::
+   Indicate that these objects are being fetched from a promisor.
+
+'option no-dependents' {'true'|'false'}::
+   Indicate that only the objects wanted need to be fetched, not
+   their dependents.
+
 SEE ALSO
 
 linkgit:git-remote[1]
diff --git a/Makefile b/Makefile
index ca378a4..795e0c7 100644
--- a/Makefile
+++ b/Makefile
@@ -792,6 +792,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d1..02abe72 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.update_shallow = 1;
continue;
}
+   if (!strcmp("--from-promisor", arg)) {
+   args.from_promisor = 1;
+   continue;
+   }
+   if (!strcmp("--no-dependents", arg)) {
+   args.no_dependents = 1;
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4f305a7..24c2f05 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1429,14 +1429,16 @@ static void write_special_file(const char *suffix, 
const char *msg,
if (close(fd) != 0)
die_errno(_("cannot close written %s file '%s'"),
  suffix, filename);
-   *report = suffix;
+   if (report)
+   *report = suffix;
}
strbuf_release(_buf);
 }
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_msg, unsigned char *sha1)
+ const char *keep_msg, const char *promisor_msg,
+ unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
@@ -1455,6 +1457,9 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
if (keep_msg)
write_special_file("keep", keep_msg, final_pack_name, sha1,
   );
+   if (promisor_msg)
+   write_special_file("promisor", promisor_msg, final_pack_name,
+  sha1, NULL);
 

[PATCH v7 09/10] rev-list: support termination at promisor objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/rev-list-options.txt |  11 
 builtin/rev-list.c |  71 +++---
 list-objects.c |  29 ++-
 object.c   |   2 +-
 revision.c |  33 +++-
 revision.h |   5 +-
 t/t0410-partial-clone.sh   | 101 +
 7 files changed, 240 insertions(+), 12 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 8d8b7f4..0ce8ccd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -745,10 +745,21 @@ The form '--missing=allow-any' will allow object 
traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
 +
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing objects will raise an error.
++
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 endif::git-rev-list[]
 
+--exclude-promisor-objects::
+   (For internal use only.)  Prefilter object traversal at
+   promisor boundary.  This is used with partial clone.  This is
+   stronger than `--missing=allow-promisor` because it limits the
+   traversal, rather than just silencing errors about missing
+   objects.
+
 --no-walk[=(sorted|unsorted)]::
Only show the given commits, but do not traverse their ancestors.
This has no effect if a range is specified. If the argument
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 547a3e0..48f922d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] ... [ -- paths... ]\n"
@@ -67,6 +68,7 @@ enum missing_action {
MA_ERROR = 0,/* fail if any missing objects are encountered */
MA_ALLOW_ANY,/* silently allow ALL missing objects */
MA_PRINT,/* print ALL missing objects in special section */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 
@@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void 
*data)
 
 static inline void finish_object__ma(struct object *obj)
 {
+   /*
+* Whether or not we try to dynamically fetch missing objects
+* from the server, we currently DO NOT have the object.  We
+* can either print, allow (ignore), or conditionally allow
+* (ignore) them.
+*/
switch (arg_missing_action) {
case MA_ERROR:
die("missing blob object '%s'", oid_to_hex(>oid));
@@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj)
oidset_insert(_objects, >oid);
return;
 
+   case MA_ALLOW_PROMISOR:
+   if (is_promisor_object(>oid))
+   return;
+   die("unexpected missing blob object '%s'",
+   oid_to_hex(>oid));
+   return;
+
default:
BUG("unhandled missing_action");
return;
}
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid))
+   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
finish_object__ma(obj);
+   return 1;
+   }
if (info->revs->verify_objects && !obj->parsed && obj->type != 
OBJ_COMMIT)
parse_object(>oid);
+   return 0;
 

[PATCH v7 06/10] index-pack: refactor writing of .keep files

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan 
---
 builtin/index-pack.c | 99 
 1 file changed, 53 insertions(+), 46 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8ec459f..4f305a7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1389,15 +1389,58 @@ static void fix_unresolved_deltas(struct sha1file *f)
free(sorted_by_pos);
 }
 
+static const char *derive_filename(const char *pack_name, const char *suffix,
+  struct strbuf *buf)
+{
+   size_t len;
+   if (!strip_suffix(pack_name, ".pack", ))
+   die(_("packfile name '%s' does not end with '.pack'"),
+   pack_name);
+   strbuf_add(buf, pack_name, len);
+   strbuf_addch(buf, '.');
+   strbuf_addstr(buf, suffix);
+   return buf->buf;
+}
+
+static void write_special_file(const char *suffix, const char *msg,
+  const char *pack_name, const unsigned char *sha1,
+  const char **report)
+{
+   struct strbuf name_buf = STRBUF_INIT;
+   const char *filename;
+   int fd;
+   int msg_len = strlen(msg);
+
+   if (pack_name)
+   filename = derive_filename(pack_name, suffix, _buf);
+   else
+   filename = odb_pack_name(_buf, sha1, suffix);
+
+   fd = odb_pack_keep(filename);
+   if (fd < 0) {
+   if (errno != EEXIST)
+   die_errno(_("cannot write %s file '%s'"),
+ suffix, filename);
+   } else {
+   if (msg_len > 0) {
+   write_or_die(fd, msg, msg_len);
+   write_or_die(fd, "\n", 1);
+   }
+   if (close(fd) != 0)
+   die_errno(_("cannot close written %s file '%s'"),
+ suffix, filename);
+   *report = suffix;
+   }
+   strbuf_release(_buf);
+}
+
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_name, const char *keep_msg,
- unsigned char *sha1)
+ const char *keep_msg, unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
struct strbuf index_name = STRBUF_INIT;
-   struct strbuf keep_name_buf = STRBUF_INIT;
int err;
 
if (!from_stdin) {
@@ -1409,28 +1452,9 @@ static void final(const char *final_pack_name, const 
char *curr_pack_name,
die_errno(_("error while closing pack file"));
}
 
-   if (keep_msg) {
-   int keep_fd, keep_msg_len = strlen(keep_msg);
-
-   if (!keep_name)
-   keep_name = odb_pack_name(_name_buf, sha1, "keep");
-
-   keep_fd = odb_pack_keep(keep_name);
-   if (keep_fd < 0) {
-   if (errno != EEXIST)
-   die_errno(_("cannot write keep file '%s'"),
- keep_name);
-   } else {
-   if (keep_msg_len > 0) {
-   write_or_die(keep_fd, keep_msg, keep_msg_len);
-   write_or_die(keep_fd, "\n", 1);
-   }
-   if (close(keep_fd) != 0)
-   die_errno(_("cannot close written keep file 
'%s'"),
- keep_name);
-   report = "keep";
-   }
-   }
+   if (keep_msg)
+   write_special_file("keep", keep_msg, final_pack_name, sha1,
+  );
 
if (final_pack_name != curr_pack_name) {
if (!final_pack_name)
@@ -1472,7 +1496,6 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
 
strbuf_release(_name);
strbuf_release(_name);
-   strbuf_release(_name_buf);
 }
 
 static int git_index_pack_config(const char *k, const char *v, void *cb)
@@ -1615,26 +1638,13 @@ static void show_pack_info(int stat_only)
}
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-  struct strbuf *buf)
-{
-   size_t len;
-   if (!strip_suffix(pack_name, ".pack", ))
-   die(_("packfile name '%s' does not end with '.pack'"),
-   pack_name);
-   strbuf_add(buf, pack_name, len);
-   strbuf_addstr(buf, suffix);
-   

[PATCH v7 01/10] extension.partialclone: introduce partial clone extension

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Introduce new repository extension option:
`extensions.partialclone`

See the update to Documentation/technical/repository-version.txt
in this patch for more information.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/repository-version.txt | 12 
 cache.h|  2 ++
 environment.c  |  1 +
 setup.c|  7 ++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/repository-version.txt 
b/Documentation/technical/repository-version.txt
index 00ad379..e03eacc 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -86,3 +86,15 @@ for testing format-1 compatibility.
 When the config key `extensions.preciousObjects` is set to `true`,
 objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
 `git repack -d`).
+
+`partialclone`
+~~
+
+When the config key `extensions.partialclone` is set, it indicates
+that the repo was created with a partial clone (or later performed
+a partial fetch) and that the remote may have omitted sending
+certain unwanted objects.  Such a remote is called a "promisor remote"
+and it promises that all such omitted objects can be fetched from it
+in the future.
+
+The value of this key is the name of the promisor remote.
diff --git a/cache.h b/cache.h
index 6440e2b..35e3f5e 100644
--- a/cache.h
+++ b/cache.h
@@ -860,10 +860,12 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
+extern char *repository_format_partial_clone;
 
 struct repository_format {
int version;
int precious_objects;
+   char *partial_clone; /* value of extensions.partialclone */
int is_bare;
char *work_tree;
struct string_list unknown_extensions;
diff --git a/environment.c b/environment.c
index 8289c25..e52aab3 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
+char *repository_format_partial_clone;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/setup.c b/setup.c
index 03f51e0..58536bd 100644
--- a/setup.c
+++ b/setup.c
@@ -420,7 +420,11 @@ static int check_repo_format(const char *var, const char 
*value, void *vdata)
;
else if (!strcmp(ext, "preciousobjects"))
data->precious_objects = git_config_bool(var, value);
-   else
+   else if (!strcmp(ext, "partialclone")) {
+   if (!value)
+   return config_error_nonbool(var);
+   data->partial_clone = xstrdup(value);
+   } else
string_list_append(>unknown_extensions, ext);
} else if (strcmp(var, "core.bare") == 0) {
data->is_bare = git_config_bool(var, value);
@@ -463,6 +467,7 @@ static int check_repository_format_gently(const char 
*gitdir, int *nongit_ok)
}
 
repository_format_precious_objects = candidate.precious_objects;
+   repository_format_partial_clone = candidate.partial_clone;
string_list_clear(_extensions, 0);
if (!has_common) {
if (candidate.is_bare != -1) {
-- 
2.9.3



[PATCH v7 00/10] Partial clone part 2: fsck and promisors

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

This is V7 of part 2 of partial clone.  This builds upon V6 of part 1.

This version squashes the fixup commits that I added to the V6p2 series.
The net result is identical.

Jonathan Tan (10):
  extension.partialclone: introduce partial clone extension
  fsck: introduce partialclone extension
  fsck: support refs pointing to promisor objects
  fsck: support referenced promisor objects
  fsck: support promisor objects as CLI argument
  index-pack: refactor writing of .keep files
  introduce fetch-object: fetch one promisor object
  sha1_file: support lazily fetching missing objects
  rev-list: support termination at promisor objects
  gc: do not repack promisor packfiles

 Documentation/git-pack-objects.txt |  11 +
 Documentation/gitremote-helpers.txt|   7 +
 Documentation/rev-list-options.txt |  11 +
 Documentation/technical/repository-version.txt |  12 +
 Makefile   |   1 +
 builtin/cat-file.c |   2 +
 builtin/fetch-pack.c   |  10 +
 builtin/fsck.c |  26 +-
 builtin/gc.c   |   3 +
 builtin/index-pack.c   | 113 
 builtin/pack-objects.c |  37 ++-
 builtin/prune.c|   7 +
 builtin/repack.c   |   8 +-
 builtin/rev-list.c |  71 -
 cache.h|  13 +-
 environment.c  |   1 +
 fetch-object.c |  27 ++
 fetch-object.h |   6 +
 fetch-pack.c   |  48 ++--
 fetch-pack.h   |   8 +
 list-objects.c |  29 ++-
 object.c   |   2 +-
 packfile.c |  77 +-
 packfile.h |  13 +
 remote-curl.c  |  14 +-
 revision.c |  33 ++-
 revision.h |   5 +-
 setup.c|   7 +-
 sha1_file.c|  32 ++-
 t/t0410-partial-clone.sh   | 343 +
 transport.c|   8 +
 transport.h|  11 +
 32 files changed, 899 insertions(+), 97 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h
 create mode 100755 t/t0410-partial-clone.sh

-- 
2.9.3



[PATCH v7 03/10] fsck: support refs pointing to promisor objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach fsck to not treat refs referring to missing promisor objects as an
error when extensions.partialclone is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  8 
 t/t0410-partial-clone.sh | 24 
 2 files changed, 32 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 2934299..ee937bb 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -434,6 +434,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
 
obj = parse_object(oid);
if (!obj) {
+   if (is_promisor_object(oid)) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+default_refs++;
+return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 3ddb3b9..bf75162 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -13,6 +13,14 @@ pack_as_from_promisor () {
>repo/.git/objects/pack/pack-$HASH.promisor
 }
 
+promise_and_delete () {
+   HASH=$(git -C repo rev-parse "$1") &&
+   git -C repo tag -a -m message my_annotated_tag "$HASH" &&
+   git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
+   git -C repo tag -d my_annotated_tag &&
+   delete_object repo "$HASH"
+}
+
 test_expect_success 'missing reflog object, but promised by a commit, passes 
fsck' '
test_create_repo repo &&
test_commit -C repo my_commit &&
@@ -78,4 +86,20 @@ test_expect_success 'missing reflog object alone fails fsck, 
even with extension
test_must_fail git -C repo fsck
 '
 
+test_expect_success 'missing ref object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref
+   git -C repo branch my_branch "$A" &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.9.3



[PATCH] doc: reword gitworflows for neutrality

2017-12-08 Thread Daniel Bensoussan
Changed 'he' to 'them' to be more neutral in "gitworkflows.txt".

See discussion at: 
https://public-inbox.org/git/xmqqvahieeqy@gitster.mtv.corp.google.com/

Signed-off-by: Matthieu Moy 
Signed-off-by: Timothee Albertin 
Signed-off-by: Nathan Payre 
Signed-off-by: Daniel Bensoussan 
---
 Documentation/gitworkflows.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitworkflows.txt b/Documentation/gitworkflows.txt
index 02569d0..926e044 100644
--- a/Documentation/gitworkflows.txt
+++ b/Documentation/gitworkflows.txt
@@ -407,8 +407,8 @@ follows.
 `git pull  `
 =
 
-Occasionally, the maintainer may get merge conflicts when he tries to
-pull changes from downstream.  In this case, he can ask downstream to
+Occasionally, the maintainer may get merge conflicts when they try to
+pull changes from downstream.  In this case, they can ask downstream to
 do the merge and resolve the conflicts themselves (perhaps they will
 know better how to resolve them).  It is one of the rare cases where
 downstream 'should' merge from upstream.
-- 
2.11.0



Re: [PATCH v2 4/4] t/Makefile: introduce TEST_SHELL_PATH

2017-12-08 Thread Johannes Schindelin
Hi Peff,

the other three patches look good to me.

On Fri, 8 Dec 2017, Jeff King wrote:

> You may want to run the test suite with a different shell
> than you use to build Git. For instance, you may build with
> SHELL_PATH=/bin/sh (because it's faster, or it's what you
> expect to exist on systems where the build will be used) but
> want to run the test suite with bash (e.g., since that
> allows using "-x" reliably across the whole test suite).
> There's currently no good way to do this.
> 
> You might think that doing two separate make invocations,
> like:
> 
>   make &&
>   make -C t SHELL_PATH=/bin/bash
> 
> would work. And it _almost_ does. The second make will see
> our bash SHELL_PATH, and we'll use that to run the
> individual test scripts (or tell prove to use it to do so).
> So far so good.
> 
> But this breaks down when "--tee" or "--verbose-log" is
> used. Those options cause the test script to actually
> re-exec itself using $SHELL_PATH. But wait, wouldn't our
> second make invocation have set SHELL_PATH correctly in the
> environment?
> 
> Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we
> built during the first "make". And that overrides the
> environment, giving us the original SHELL_PATH again.

... and we could simply see whether the environment variable
TEST_SHELL_PATH (which we would set in t/Makefile from the passed-in
SHELL_PATH) is set, and override it again.

I still think we can do without recording test-phase details in the
build-phase (which may, or may not, know what the test-phase wants to do).

In other words, I believe that we can make the invocation you mentioned
above work, by touching only t/Makefile (to pass SHELL_PATH as
TEST_SHELL_PATH) and t/test-lib.sh (to override the SHELL_PATH from
GIT-BUILD-OPTIONS with TEST_SHELL_PATH, if set).

Ciao,
Dscho


Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Junio C Hamano
Stefan Beller  writes:

> diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
> new file mode 100644
> index 00..e65c7cad6e
> --- /dev/null
> +++ b/diffcore-blobfind.c
> @@ -0,0 +1,41 @@
> +/*
> + * Copyright (c) 2017 Google Inc.
> + */
> +#include "cache.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +
> +static void diffcore_filter_blobs(struct diff_queue_struct *q,
> +   struct diff_options *options)
> +{
> + int src, dst;
> +
> + if (!options->blobfind)
> + BUG("blobfind oidset not initialized???");
> +
> + for (src = dst = 0; src < q->nr; src++) {
> + struct diff_filepair *p = q->queue[src];
> +
> + if ((DIFF_FILE_VALID(p->one) &&
> +  oidset_contains(options->blobfind, >one->oid)) ||
> + (DIFF_FILE_VALID(p->two) &&
> +  oidset_contains(options->blobfind, >two->oid))) {

Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
at the sides of the pair?  I think an unmerged pair is queued with
filespecs whose modes are cleared, but we should not be relying on
that---I think we even had bugs we fixed by introducing an explicit
is_unmerged field to the filepair struct.

> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>   simplify_merges(revs);
>   if (revs->children.name)
>   set_children(revs);
> + if (revs->diffopt.blobfind)
> + revs->simplify_history = 0;
>   return 0;
>  }

It makes sense to clear this bit by default, but is this an
unconditonal clearing?  In other words, is there a way for the user
to countermand this default and ask for merge simplification from
the command line, e.g. "diff --simplify-history --blobfind="?

> +test_description='test finding specific blobs in the revision walking'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup ' '
> + git commit --allow-empty -m "empty initial commit" &&
> +
> + echo "Hello, world!" >greeting &&
> + git add greeting &&
> + git commit -m "add the greeting blob" && # borrowed from Git from the 
> Bottom Up
> + git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
> +
> + echo asdf >unrelated &&
> + git add unrelated &&
> + git commit -m "unrelated history" &&
> +
> + git revert HEAD^ &&
> +
> + git commit --allow-empty -m "another unrelated commit"
> +'
> +
> +test_expect_success 'find the greeting blob' '
> + cat >expect <<-EOF &&
> + Revert "add the greeting blob"
> + add the greeting blob
> + EOF
> +
> + git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
> + cut -c 14- actual.raw >actual &&
> + test_cmp expect actual

The hardcoded numbers look strange and smell like a workaround of
not asking for full object names.

Also, now it has the simplify-history bit in the change, don't we
want to check a mergy history, and also running "diff-files" while
the index has unmerged entries?

Other than that, the changes look quite sane and pleasnt read.

Thanks.


> +'
> +
> +test_done


[PATCH Outreachy 2/2] format: create docs for pretty.h

2017-12-08 Thread Olga Telezhnaya
Write some docs for functions in pretty.h.
Take it as a first draft, they would be changed later.

Signed-off-by: Olga Telezhnaia 
Mentored-by: Christian Couder 
Mentored by: Jeff King 
---
 pretty.h | 44 
 1 file changed, 44 insertions(+)

diff --git a/pretty.h b/pretty.h
index ef5167484fb64..5c85d94e332d7 100644
--- a/pretty.h
+++ b/pretty.h
@@ -48,6 +48,7 @@ struct pretty_print_context {
int graph_width;
 };
 
+/* Check whether commit format is mail. */
 static inline int cmit_fmt_is_mail(enum cmit_fmt fmt)
 {
return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD);
@@ -57,31 +58,74 @@ struct userformat_want {
unsigned notes:1;
 };
 
+/* Set the flag "w->notes" if there is placeholder %N in "fmt". */
 void userformat_find_requirements(const char *fmt, struct userformat_want *w);
+
+/*
+ * Shortcut for invoking pretty_print_commit if we do not have any context.
+ * Context would be set empty except "fmt".
+ */
 void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
struct strbuf *sb);
+
+/*
+ * Get information about user and date from "line", format it and
+ * put it into "sb".
+ * Format of "line" must be readable for split_ident_line function.
+ * The resulting format is "what: name  date".
+ */
 void pp_user_info(struct pretty_print_context *pp, const char *what,
struct strbuf *sb, const char *line,
const char *encoding);
+
+/*
+ * Format title line of commit message taken from "msg_p" and
+ * put it into "sb".
+ * First line of "msg_p" is also affected.
+ */
 void pp_title_line(struct pretty_print_context *pp, const char **msg_p,
struct strbuf *sb, const char *encoding,
int need_8bit_cte);
+
+/*
+ * Get current state of commit message from "msg_p" and continue formatting
+ * by adding indentation and '>' signs. Put result into "sb".
+ */
 void pp_remainder(struct pretty_print_context *pp, const char **msg_p,
struct strbuf *sb, int indent);
 
+/*
+ * Create a text message about commit using given "format" and "context".
+ * Put the result to "sb".
+ * Please use this function for custom formats.
+ */
 void format_commit_message(const struct commit *commit,
const char *format, struct strbuf *sb,
const struct pretty_print_context *context);
 
+/*
+ * Parse given arguments from "arg", check it for correctness and
+ * fill struct rev_info.
+ */
 void get_commit_format(const char *arg, struct rev_info *);
 
+/*
+ * Make a commit message with all rules from given "pp"
+ * and put it into "sb".
+ * Please use this function if you have a context (candidate for "pp").
+ */
 void pretty_print_commit(struct pretty_print_context *pp,
const struct commit *commit,
struct strbuf *sb);
 
+/*
+ * Change line breaks in "msg" to "line_separator" and put it into "sb".
+ * Return "msg" itself.
+ */
 const char *format_subject(struct strbuf *sb, const char *msg,
const char *line_separator);
 
+/* Check if "cmit_fmt" will produce an empty output. */
 int commit_format_is_empty(enum cmit_fmt);
 
 #endif /* PRETTY_H */

--
https://github.com/git/git/pull/439


[PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Olga Telezhnaya
Create header for pretty.c to make formatting interface more structured.
This is a middle point, this file would be merged futher with other
files which contain formatting stuff.

Signed-off-by: Olga Telezhnaia 
Mentored-by: Christian Couder 
Mentored by: Jeff King 
---
 archive.c |  1 +
 builtin/notes.c   |  2 +-
 builtin/reset.c   |  2 +-
 builtin/show-branch.c |  2 +-
 combine-diff.c|  1 +
 commit.c  |  1 +
 commit.h  | 80 --
 diffcore-pickaxe.c|  1 +
 grep.c|  1 +
 log-tree.c|  1 +
 notes-cache.c |  1 +
 pretty.h  | 87 +++
 revision.h|  2 +-
 sequencer.c   |  1 +
 sha1_name.c   |  1 +
 submodule.c   |  1 +
 16 files changed, 101 insertions(+), 84 deletions(-)
 create mode 100644 pretty.h

diff --git a/archive.c b/archive.c
index 0b7b62af0c3ec..60607e8c00857 100644
--- a/archive.c
+++ b/archive.c
@@ -2,6 +2,7 @@
 #include "config.h"
 #include "refs.h"
 #include "commit.h"
+#include "pretty.h"
 #include "tree-walk.h"
 #include "attr.h"
 #include "archive.h"
diff --git a/builtin/notes.c b/builtin/notes.c
index 1a2c7d92ad7e7..7c8176164561b 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -12,7 +12,7 @@
 #include "builtin.h"
 #include "notes.h"
 #include "blob.h"
-#include "commit.h"
+#include "pretty.h"
 #include "refs.h"
 #include "exec_cmd.h"
 #include "run-command.h"
diff --git a/builtin/reset.c b/builtin/reset.c
index 906e541658230..e15f595799c40 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -12,7 +12,7 @@
 #include "lockfile.h"
 #include "tag.h"
 #include "object.h"
-#include "commit.h"
+#include "pretty.h"
 #include "run-command.h"
 #include "refs.h"
 #include "diff.h"
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 2e24b5c330e8e..e8a4aa40cb4b6 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -1,6 +1,6 @@
 #include "cache.h"
 #include "config.h"
-#include "commit.h"
+#include "pretty.h"
 #include "refs.h"
 #include "builtin.h"
 #include "color.h"
diff --git a/combine-diff.c b/combine-diff.c
index 2505de119a2be..01ba1b03a06d2 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "commit.h"
+#include "pretty.h"
 #include "blob.h"
 #include "diff.h"
 #include "diffcore.h"
diff --git a/commit.c b/commit.c
index cab8d4455bdbd..ac17a27a4ab0a 100644
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tag.h"
 #include "commit.h"
+#include "pretty.h"
 #include "pkt-line.h"
 #include "utf8.h"
 #include "diff.h"
diff --git a/commit.h b/commit.h
index 99a3fea68d3f6..41a2067809444 100644
--- a/commit.h
+++ b/commit.h
@@ -121,93 +121,13 @@ struct commit_list *copy_commit_list(struct commit_list 
*list);
 
 void free_commit_list(struct commit_list *list);
 
-/* Commit formats */
-enum cmit_fmt {
-   CMIT_FMT_RAW,
-   CMIT_FMT_MEDIUM,
-   CMIT_FMT_DEFAULT = CMIT_FMT_MEDIUM,
-   CMIT_FMT_SHORT,
-   CMIT_FMT_FULL,
-   CMIT_FMT_FULLER,
-   CMIT_FMT_ONELINE,
-   CMIT_FMT_EMAIL,
-   CMIT_FMT_MBOXRD,
-   CMIT_FMT_USERFORMAT,
-
-   CMIT_FMT_UNSPECIFIED
-};
-
-static inline int cmit_fmt_is_mail(enum cmit_fmt fmt)
-{
-   return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD);
-}
-
 struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
 
-struct pretty_print_context {
-   /*
-* Callers should tweak these to change the behavior of pp_* functions.
-*/
-   enum cmit_fmt fmt;
-   int abbrev;
-   const char *after_subject;
-   int preserve_subject;
-   struct date_mode date_mode;
-   unsigned date_mode_explicit:1;
-   int print_email_subject;
-   int expand_tabs_in_log;
-   int need_8bit_cte;
-   char *notes_message;
-   struct reflog_walk_info *reflog_info;
-   struct rev_info *rev;
-   const char *output_encoding;
-   struct string_list *mailmap;
-   int color;
-   struct ident_split *from_ident;
-
-   /*
-* Fields below here are manipulated internally by pp_* functions and
-* should not be counted on by callers.
-*/
-   struct string_list in_body_headers;
-   int graph_width;
-};
-
-struct userformat_want {
-   unsigned notes:1;
-};
-
 extern int has_non_ascii(const char *text);
 extern const char *logmsg_reencode(const struct commit *commit,
   char **commit_encoding,
   const char *output_encoding);
-extern void get_commit_format(const char *arg, struct rev_info *);
-extern const char *format_subject(struct strbuf *sb, const char *msg,
- const char *line_separator);
-extern void userformat_find_requirements(const char *fmt, struct 

Kindly Assist Me

2017-12-08 Thread Mr. Sonami
Kindly Assist Me


In good faith from Mr.Sonami, actually could you please consider to
help me to relocate this sum of five million, three hundred thousand
dollars (US$5.3 m) to your country for establishing a medium industry
in your country? The said 5.3 million dollars was deposited in our
bank by Mr. Jean-Noël Rey a Swiss land citizen who died in a terrorist
attacks in 2016. We have never met before, but need not to worry as I
am using the only secured and confidential medium available to seek
for your foreign assistance in this business. I am contacting you
independently of my investigation and no one is informed of this
communication.

I contacted you in this transaction on my search for a reliable,
capable partner and I discovered that the late Swiss died along side
with his supposed to be next of kin through CNN News on terrorist
attacks. I will give you all vital information concerning the Swiss
and the 5.3 million dollars in our custody so that you will contact my
bank for them to release the money to you. You can come here in person
or you can request the bank to give you the contact of the bank
lawyer's who can represent your interest in the transfer process.

Reply and send this information to me at: sonami...@gmail.com

Your full Name.
Age...
Country
Occupation..
Your Telephone Numbers.

Have a nice day and good luck
Mr.Sonami


Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-08 Thread Johannes Schindelin
Hi,

On Fri, 8 Dec 2017, Torsten Bögershausen wrote:

> > * tb/check-crlf-for-safe-crlf (2017-11-27) 1 commit
> >   (merged to 'next' on 2017-12-05 at 7adaa1fe01)
> >  + convert: tighten the safe autocrlf handling
> > 
> >  The "safe crlf" check incorrectly triggered for contents that does
> >  not use CRLF as line endings, which has been corrected.
> > 
> >  Broken on Windows???
> >  cf. 
> 
> Yes, broken on Windows. A fix is coming the next days.

We might want to consider using a saner Continuous Testing workflow, to
avoid re-testing (and re-finding) breakages in individual patch series
just because completely unrelated patch got updated.

I mean, yes, it seemed like a good idea a long time ago to have One Branch
that contains All The Patch Series Currently Cooking, back when our most
reliable (because only) test facilities were poor humans.

But we see how many more subtle bugs are spotted nowadays where Git's
source code is tested automatically on a growing number of Operating
System/CPU architecture "coordinates", and it is probably time to save
some human resources.

How about testing the individual branches instead?

This would save me a ton of time, as bisecting is just too expensive given
the scattered base commits of the branches smooshed into `pu`. (There is a
new Git/Error.pm breakage in pu for about a week that I simply have not
gotten around to, or better put: that I did not want to tackle given the
time committment).)

Ciao,
Dscho

[PATCH] refs: drop "clear packed-refs while locked" assertion

2017-12-08 Thread Jeff King
This patch fixes a regression in v2.14.0. It's actually fixed already in
v2.15.0 because all of the packed-ref code there was rewritten. So
there's no point in applying this on "master" or even "maint". But I
figured it was worth sharing here in case somebody else runs across it,
and in case we ever do a v2.14.4 release.

-- >8 --
In clear_packed_ref_cache(), we assert that we're not
currently holding the packed-refs lock. But in each of the
three code paths that can hit this, the assertion is either
a noop or actively does the wrong thing:

 1. in rollback_packed_refs(), we will have just released
the lock before calling the function, and so the
assertion can never trigger.

 2. get_packed_ref_cache() can reach this assertion via
validate_packed_ref_cache(). But it calls the validate
function only when it knows that we're not holding the
lock, so again, the assertion can never trigger.

 3. lock_packed_refs() also calls validate_packed_ref_cache().
In this case we're _always_ holding the lock, which
means any time the validate function has to clear the
cache, we'll trigger this assertion and die.

This doesn't happen often in practice because the
validate function clears the cache only if we find that
somebody else has racily rewritten the packed-refs file
between the time we read it and the time we took the lock.

So most of the time we don't reach the assertion at all
(nobody has racily written the file so there's no need
to clear the cache). And when we do, it is not actually
indicative of a bug; clearing the cache while holding
the lock is the right thing to do here.

This final case is relatively new, being triggerd by the
extra validation added in fed6ebebf1 (lock_packed_refs():
fix cache validity check, 2017-06-12).

Signed-off-by: Jeff King 
---
 refs/files-backend.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f21a954ce7..dd41e1d382 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -99,8 +99,6 @@ static void clear_packed_ref_cache(struct files_ref_store 
*refs)
if (refs->packed) {
struct packed_ref_cache *packed_refs = refs->packed;
 
-   if (is_lock_file_locked(>packed_refs_lock))
-   die("BUG: packed-ref cache cleared while locked");
refs->packed = NULL;
release_packed_ref_cache(packed_refs);
}
-- 
2.15.1.659.g8bd2eae3ea


[PATCH v2 1/4] test-lib: silence "-x" cleanup under bash

2017-12-08 Thread Jeff King
When the test suite's "-x" option is used with bash, we end
up seeing cleanup cruft in the output:

  $ bash t0001-init.sh -x
  [...]
  ++ diff -u expected actual
  + test_eval_ret_=0
  + want_trace
  + test t = t
  + test t = t
  + set +x
  ok 42 - re-init from a linked worktree

This ranges from mildly annoying (for a successful test) to
downright confusing (when we say "last command exited with
error", but it's really 5 commands back).

We normally are able to suppress this cleanup. As the
in-code comment explains, we can't convince the shell not to
print it, but we can redirect its stderr elsewhere.

But since d88785e424 (test-lib: set BASH_XTRACEFD
automatically, 2016-05-11), that doesn't hold for bash. It
sends the "set -x" output directly to descriptor 4, not to
stderr.

We can fix this by also redirecting descriptor 4, and
paying close attention to which commands redirected and
which are not (see the updated comment).

Two alternatives I considered and rejected:

  - unsetting and setting BASH_XTRACEFD; doing so closes the
descriptor, which we must avoid

  - we could keep everything in a single block as before,
redirect 4>/dev/null there, but retain 5>&4 as a copy.
And then selectively restore 4>&5 for commands which
should be allowed to trace. This would work, but the
descriptor swapping seems unnecessarily confusing.

Signed-off-by: Jeff King 
---
 t/test-lib.sh | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 116bd6a70c..7914453a3b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -601,26 +601,40 @@ test_eval_inner_ () {
 }
 
 test_eval_ () {
-   # We run this block with stderr redirected to avoid extra cruft
-   # during a "-x" trace. Once in "set -x" mode, we cannot prevent
+   # If "-x" tracing is in effect, then we want to avoid polluting stderr
+   # with non-test commands. But once in "set -x" mode, we cannot prevent
# the shell from printing the "set +x" to turn it off (nor the saving
# of $? before that). But we can make sure that the output goes to
# /dev/null.
#
-   # The test itself is run with stderr put back to &4 (so either to
-   # /dev/null, or to the original stderr if --verbose was used).
+   # There are a few subtleties here:
+   #
+   #   - we have to redirect descriptor 4 in addition to 2, to cover
+   # BASH_XTRACEFD
+   #
+   #   - the actual eval has to come before the redirection block (since
+   # it needs to see descriptor 4 to set up its stderr)
+   #
+   #   - likewise, any error message we print must be outside the block to
+   # access descriptor 4
+   #
+   #   - checking $? has to come immediately after the eval, but it must
+   # be _inside_ the block to avoid polluting the "set -x" output
+   #
+
+   test_eval_inner_ "$@" &3 2>&4
{
-   test_eval_inner_ "$@" &3 2>&4
test_eval_ret_=$?
if want_trace
then
set +x
-   if test "$test_eval_ret_" != 0
-   then
-   say_color error >&4 "error: last command exited 
with \$?=$test_eval_ret_"
-   fi
fi
-   } 2>/dev/null
+   } 2>/dev/null 4>&2
+
+   if test "$test_eval_ret_" != 0 && want_trace
+   then
+   say_color error >&4 "error: last command exited with 
\$?=$test_eval_ret_"
+   fi
return $test_eval_ret_
 }
 
-- 
2.15.1.659.g8bd2eae3ea



[PATCH v2 4/4] t/Makefile: introduce TEST_SHELL_PATH

2017-12-08 Thread Jeff King
You may want to run the test suite with a different shell
than you use to build Git. For instance, you may build with
SHELL_PATH=/bin/sh (because it's faster, or it's what you
expect to exist on systems where the build will be used) but
want to run the test suite with bash (e.g., since that
allows using "-x" reliably across the whole test suite).
There's currently no good way to do this.

You might think that doing two separate make invocations,
like:

  make &&
  make -C t SHELL_PATH=/bin/bash

would work. And it _almost_ does. The second make will see
our bash SHELL_PATH, and we'll use that to run the
individual test scripts (or tell prove to use it to do so).
So far so good.

But this breaks down when "--tee" or "--verbose-log" is
used. Those options cause the test script to actually
re-exec itself using $SHELL_PATH. But wait, wouldn't our
second make invocation have set SHELL_PATH correctly in the
environment?

Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we
built during the first "make". And that overrides the
environment, giving us the original SHELL_PATH again.

Let's introduce a new variable that lets you specify a
specific shell to be run for the test scripts. Note that we
have to touch both the main and t/ Makefiles, since we have
to record it in GIT-BUILD-OPTIONS in one, and use it in the
latter.

Signed-off-by: Jeff King 
---
 Makefile  | 8 
 t/Makefile| 6 --
 t/test-lib.sh | 2 +-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index fef9c8d272..8a21c4d8f1 100644
--- a/Makefile
+++ b/Makefile
@@ -425,6 +425,10 @@ all::
 #
 # to say "export LESS=FRX (and LV=-c) if the environment variable
 # LESS (and LV) is not set, respectively".
+#
+# Define TEST_SHELL_PATH if you want to use a shell besides SHELL_PATH for
+# running the test scripts (e.g., bash has better support for "set -x"
+# tracing).
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -729,6 +733,8 @@ endif
 export PERL_PATH
 export PYTHON_PATH
 
+TEST_SHELL_PATH = $(SHELL_PATH)
+
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
@@ -1725,6 +1731,7 @@ prefix_SQ = $(subst ','\'',$(prefix))
 gitwebdir_SQ = $(subst ','\'',$(gitwebdir))
 
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
+TEST_SHELL_PATH_SQ = $(subst ','\'',$(TEST_SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
 PYTHON_PATH_SQ = $(subst ','\'',$(PYTHON_PATH))
 TCLTK_PATH_SQ = $(subst ','\'',$(TCLTK_PATH))
@@ -2355,6 +2362,7 @@ GIT-LDFLAGS: FORCE
 # and the first level quoting from the shell that runs "echo".
 GIT-BUILD-OPTIONS: FORCE
@echo SHELL_PATH=\''$(subst ','\'',$(SHELL_PATH_SQ))'\' >$@+
+   @echo TEST_SHELL_PATH=\''$(subst ','\'',$(TEST_SHELL_PATH_SQ))'\' >>$@+
@echo PERL_PATH=\''$(subst ','\'',$(PERL_PATH_SQ))'\' >>$@+
@echo DIFF=\''$(subst ','\'',$(subst ','\'',$(DIFF)))'\' >>$@+
@echo PYTHON_PATH=\''$(subst ','\'',$(PYTHON_PATH_SQ))'\' >>$@+
diff --git a/t/Makefile b/t/Makefile
index 1bb06c36f2..96317a35f4 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -8,6 +8,7 @@
 
 #GIT_TEST_OPTS = --verbose --debug
 SHELL_PATH ?= $(SHELL)
+TEST_SHELL_PATH ?= $(SHELL_PATH)
 PERL_PATH ?= /usr/bin/perl
 TAR ?= $(TAR)
 RM ?= rm -f
@@ -23,6 +24,7 @@ endif
 
 # Shell quote;
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
+TEST_SHELL_PATH_SQ = $(subst ','\'',$(TEST_SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
 TEST_RESULTS_DIRECTORY_SQ = $(subst ','\'',$(TEST_RESULTS_DIRECTORY))
 
@@ -42,11 +44,11 @@ failed:
test -z "$$failed" || $(MAKE) $$failed
 
 prove: pre-clean $(TEST_LINT)
-   @echo "*** prove ***"; $(PROVE) --exec '$(SHELL_PATH_SQ)' 
$(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
+   @echo "*** prove ***"; $(PROVE) --exec '$(TEST_SHELL_PATH_SQ)' 
$(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
$(MAKE) clean-except-prove-cache
 
 $(T):
-   @echo "*** $@ ***"; '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
+   @echo "*** $@ ***"; '$(TEST_SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
 
 pre-clean:
$(RM) -r '$(TEST_RESULTS_DIRECTORY_SQ)'
diff --git a/t/test-lib.sh b/t/test-lib.sh
index b8dd5e79ac..1ae0fc02d0 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -80,7 +80,7 @@ done,*)
# from any previous runs.
>"$GIT_TEST_TEE_OUTPUT_FILE"
 
-   (GIT_TEST_TEE_STARTED=done ${SHELL_PATH} "$0" "$@" 2>&1;
+   (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
 echo $? >"$BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE"
test "$(cat "$BASE.exit")" = 0
exit
-- 
2.15.1.659.g8bd2eae3ea


[PATCH v2 3/4] test-lib: make "-x" work with "--verbose-log"

2017-12-08 Thread Jeff King
The "-x" tracing option implies "--verbose". This is a
problem when running under a TAP harness like "prove", where
we need to use "--verbose-log" instead. Instead, let's
handle this the same way we do for --valgrind, including the
recent fix from 88c6e9d31c (test-lib: --valgrind should not
override --verbose-log, 2017-09-05). Namely, let's enable
--verbose only when we know there isn't a more specific
verbosity option indicated.

Note that we also have to tweak `want_trace` to turn it on
(previously we just lumped $verbose_log in with $verbose,
but now we don't necessarily auto-set the latter).

Signed-off-by: Jeff King 
---
 t/test-lib.sh | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 7914453a3b..b8dd5e79ac 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -264,7 +264,6 @@ do
shift ;;
-x)
trace=t
-   verbose=t
shift ;;
--verbose-log)
verbose_log=t
@@ -283,6 +282,11 @@ then
test -z "$verbose_log" && verbose=t
 fi
 
+if test -n "$trace" && test -z "$verbose_log"
+then
+   verbose=t
+fi
+
 if test -n "$color"
 then
# Save the color control sequences now rather than run tput
@@ -586,7 +590,9 @@ maybe_setup_valgrind () {
 }
 
 want_trace () {
-   test "$trace" = t && test "$verbose" = t
+   test "$trace" = t && {
+   test "$verbose" = t || test "$verbose_log" = t
+   }
 }
 
 # This is a separate function because some tests use
-- 
2.15.1.659.g8bd2eae3ea



[PATCH v2 2/4] t5615: avoid re-using descriptor 4

2017-12-08 Thread Jeff King
File descriptors 3 and 4 are special in our test suite, as
they link back to the test script's original stdout and
stderr. Normally this isn't something tests need to worry
about: they are free to clobber these descriptors for
sub-commands without affecting the overall script.

But there's one very special thing about descriptor 4: since
d88785e424 (test-lib: set BASH_XTRACEFD automatically,
2016-05-11), we ask bash to output "set -x" output to it by
number. This goes to _any_ descriptor 4, even if it no
longer points to the place it did when we set BASH_XTRACEFD.

But in t5615, we run a shell loop with descriptor 4
redirected.  As a result, t5615 works with non-bash shells
even with "-x". And it works with bash without "-x". But the
combination of "bash t5615-alternate-env.sh -x" gets a test
failure (because our "set -x" output pollutes one of the
files).

We can fix this by using any descriptor _except_ the magical
4. So let's switch arbitrarily to using 5/6 in this loop,
not 3/4.

Another alternative is to use a different descriptor for
BASH_XTRACEFD. But picking an unused one turns out to be
hard. Most shells limit us to 9 numbered descriptors. Bash
can handle more, but:

  - while the BASH_XTRACEFD is specific to bash, GIT_TRACE=4
has a similar problem, and would affect all shells

  - constructs like "999>/dev/null" are synticatically
invalid to non-bash shells. So we have to actually bury
it inside an eval, which creates more complications.

Of the numbers 1-9, you might think that "9" would be less
used than "4". But it's not; many of our scripts use
descriptors 8 and 9 (probably under the assumption that they
are high and therefore unused). The least-used descriptor is
currently "7". We could switch to that, but we're just
trading one magic number for another.

Signed-off-by: Jeff King 
---
 t/t5615-alternate-env.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t5615-alternate-env.sh b/t/t5615-alternate-env.sh
index d2d883f3a1..b4905b822c 100755
--- a/t/t5615-alternate-env.sh
+++ b/t/t5615-alternate-env.sh
@@ -7,9 +7,9 @@ check_obj () {
alt=$1; shift
while read obj expect
do
-   echo "$obj" >&3 &&
-   echo "$obj $expect" >&4
-   done 3>input 4>expect &&
+   echo "$obj" >&5 &&
+   echo "$obj $expect" >&6
+   done 5>input 6>expect &&
GIT_ALTERNATE_OBJECT_DIRECTORIES=$alt \
git "$@" cat-file --batch-check='%(objectname) %(objecttype)' \
actual &&
-- 
2.15.1.659.g8bd2eae3ea



[PATCH v2 0/4] making test-suite tracing more useful

2017-12-08 Thread Jeff King
This series fixes some rough edges in the "-x" feature of the test
suite. With it, it should be possible to turn on tracing for CI runs.

This is mostly a repost of v1 at:

  
https://public-inbox.org/git/20171019210140.64lb52cqtgdh2...@sigill.intra.peff.net

which had some discussion, but wasn't picked up.

I fixed a few typos pointed out by reviewers, and I tried to summarize
the discussion around "magic descriptor 4" in the commit message of
patch 2.  My conclusion there was that none of the options is
particularly good.  The only thing thing that might make sense is making
"7" the magical descriptor. But given that "4" only had one troubling
call, I'm inclined to just leave it as-is and see if this ever comes up
again (especially if we start using "-x" in the Travis builds, then we'd
catch any problems).

  [1/4]: test-lib: silence "-x" cleanup under bash
  [2/4]: t5615: avoid re-using descriptor 4
  [3/4]: test-lib: make "-x" work with "--verbose-log"
  [4/4]: t/Makefile: introduce TEST_SHELL_PATH

 Makefile |  8 
 t/Makefile   |  6 --
 t/t5615-alternate-env.sh |  6 +++---
 t/test-lib.sh| 46 +-
 4 files changed, 48 insertions(+), 18 deletions(-)

-Peff


Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote:

> > diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
> > index 22034f87e7..8e8a15ea4a 100644
> > --- a/builtin/fmt-merge-msg.c
> > +++ b/builtin/fmt-merge-msg.c
> > @@ -377,7 +377,8 @@ static void shortlog(const char *name,
> > string_list_append(,
> >oid_to_hex(>object.oid));
> > else
> > -   string_list_append(, strbuf_detach(, NULL));
> > +   string_list_append_nodup(,
> > +strbuf_detach(, NULL));
> > }
> >  
> > if (opts->credit_people)
> 
> What is leaked comes from strbuf, so the title is not a lie, but I
> tend to think that this leak is caused by a somewhat strange
> string_list API.  The subjects string-list is initialized as a "dup"
> kind, but a caller that wants to avoid leaking can (and should) use
> _nodup() call to add a string without duping.  It all feels a bit
> too convoluted.

I'm not sure it's string-list's fault. Many callers (including this one)
have _some_ entries whose strings must be duplicated and others which do
not.

So either:

  1. The list gets marked as "nodup", and we add an extra xstrdup() to the
 oid_to_hex call above. And also need to remember to free() the
 strings later, since the list does not own them.

or

  2. We mark it as "dup" and incur an extra allocation and copy, like:

   string_list_append(, sb.buf);
   strbuf_release();

So I'd really blame the caller, which doesn't want to do (2) out of a
sense of optimization. It could also perhaps write it as:

  while (commit = get_revision(rev)) {
strbuf_reset();
... maybe put some stuff in sb ...
if (!sb.len)
string_list_append(, oid_to_hex(obj));
else
string_list_append(, sb.buf);
  }
  strbuf_release();

which at least avoids the extra allocations.

By the way, I think there's another quite subtle leak in this function.
We do this:

  format_commit_message(commit, "%s", , );
  strbuf_ltrim();

and then only use "sb" if sb.len is non-zero. But we may have actually
allocated to create our zero-length string (e.g., if we had a strbuf
full of spaces and trimmed them all off). Since we reuse "sb" over and
over as we loop, this will actually only leak once for the whole loop,
not once per iteration. So it's probably not a big deal, but writing it
with the explicit reset/release pattern fixes that (and is more
idiomatic for our code base, I think).

-Peff


Re: [PATCH] setup.c: fix comment about order of .git directory discovery

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 09:59:49AM +0100, SZEDER Gábor wrote:

> Since gitfiles were introduced in b44ebb19e (Add platform-independent
> .git "symlink", 2008-02-20) the order of checks during .git directory
> discovery is: gitfile, gitdir, bare repo.  However, that commit did
> only partially update the in-code comment describing this order,
> missing the last line which still puts gitdir before gitfile.

I was confused at first, because your patch only changes to ".git/" to
".git". So there is no ordering change, only a swapping out of one for
the other. But that pushes done the ".git/" into the "etc." which is
below.

So I think this makes sense.

Thanks.

-Peff


[PATCH] cvsimport: apply shell-quoting regex globally

2017-12-08 Thread Jeff King
Commit 5b4efea666 (cvsimport: shell-quote variable used in
backticks, 2017-09-11) tried to shell-quote a variable, but
forgot to use the "/g" modifier to apply the quoting to the
whole variable. This means we'd miss any embedded
single-quotes after the first one.

Reported-by: 
Signed-off-by: Jeff King 
---
 git-cvsimport.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index 36929921ea..2d8df83172 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -642,7 +642,7 @@ sub is_sha1 {
 
 sub get_headref ($) {
my $name = shift;
-   $name =~ s/'/'\\''/;
+   $name =~ s/'/'\\''/g;
my $r = `git rev-parse --verify '$name' 2>/dev/null`;
return undef unless $? == 0;
chomp $r;
-- 
2.15.1.659.g8bd2eae3ea


Re: [PATCH] decorate: clean up and document API

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 04:14:24PM -0800, Jonathan Tan wrote:

> Improve the names of the identifiers in decorate.h, document them, and
> add an example of how to use these functions.
> 
> The example is compiled and run as part of the test suite.
> 
> Signed-off-by: Jonathan Tan 
> ---
> This patch contains some example code in a test helper.
> 
> There is a discussion at [1] about where example code for hashmap should
> go. For something relatively simple, like decorate, perhaps example code
> isn't necessary; but if we include example code anyway, I think that we
> might as well make it compiled and tested, so this patch contains that
> example code in a test helper.

I have mixed feelings. On the one hand, compiling and running the code
ensures that those things actually work. On the other hand, I expect you
can make a much clearer example if instead of having running code, you
show snippets of almost-code.

E.g.:

  struct decoration d = { NULL };

  add_decoration(, obj, "foo");
  ...
  str = lookup_decoration(obj);

pretty much gives the relevant overview, with very little boilerplate.
Yes, it omits things like the return value of add_decoration(), but
those sorts of details are probably better left to the function
docstrings.

Other than that philosophical point, the documentation you added looks
pretty good to me. Two possible improvements to the API we could do on
top:

  1. Should there be a DECORATION_INIT macro (possibly taking the "name"
 as an argument)? (Actually, the whole name thing seems like a
 confusing and bad API design in the first place).

  2. This is really just an oidmap to a void pointer. I wonder if we
 ought to be wrapping that code (I think we still want some
 interface so that the caller doesn't have to declare their own
 structs).

-Peff


Re: Unfortunate interaction between git diff-index and fakeroot

2017-12-08 Thread Elazar Leibovich


On 12/07/2017 05:31 PM, Junio C Hamano wrote:

Correct.  fakeroot would report that the files that are actually
owned by the user who is running fakeroot are owned by root; the
cached stat information in the index would be "corrected" to say
that they are owned by root.  So once the index is refreshed like
so, things will become consistent.

I still fail to understand, what's the benefit of storing the owner ID in
the index, where no one seems to use this information.
I mean, one could store many other meta information in the index, such as
ACL lists,  number of blocks allocated, etc, but the code seems to
ignore unused information, such as commit 
2cb45e95438c113871fbbea5b4f629f9463034e7

which ignores st_dev, because it doesn't actually matter, or
commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores
mode changes not relevant to the owner.

In which situation does anyone care about the owner UID?
What's the difference between that, and, e.g., st_dev?

If you are using "diff-index" for the "is the tree dirty?" check
without running "update-index --refresh", then you are not using
the command correctly.

Just want to clarify. It seems like the linux kernel is using diff-index
to do just that, in scripts/setlocalversion

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/setlocalversion#n77

Do I understand correctly that linux should update the index
first, or better, use porcelain "git diff --name-only" instead?

It seems to be the case previously, but in commit 
cdf2bc632ebc9ef512345fe8e6015edfd367e256

git update-index was removed, to prevent error messages on
read-only linux tree.

Do I understand correctly that the more correct solution
for modern git, is to use git diff --name-only instead? It should
work on read-only as well as on read-write trees, and
would not break if anyone uses git under fakeroot.

Did I understand you correctly?


Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> 
> One might be tempted to extend git-describe to also work with blobs,
> such that `git describe ` gives a description as
> ':'.  This was implemented at [2]; as seen by the sheer
> number of responses (>110), it turns out this is tricky to get right.
> The hard part to get right is picking the correct 'commit-ish' as that
> could be the commit that (re-)introduced the blob or the blob that
> removed the blob; the blob could exist in different branches.

Neat. I didn't follow the original thread very closely, but I think this
is a good idea (and is definitely something I've simulated manually with
"git log --raw | grep" before).

Bear with me while I pontificate for a moment.

We do something similar at GitHub to report on too-large objects
during a push (we identify the object during index-pack, but then want
to hand back a human-readable pathname).

We do it with a patch to "rev-list", so that you can use the normal
traversal options to limit the set of visited commits. Which effectively
makes it "traverse and find a place where this object is referenced, and
then tell me the path at which you found it (or tell me if it was not
referenced at all)".

That sidesteps the "describe" problem, because now it is the user's
responsibility to tell you which commits to look in. :)

But the rev-list solution only finds _a_ commit that contains the
object, and which likely has nothing to do with the object at all. Your
solution here finds the introduction and removal of the object, which
are interesting for a wider variety of searches.

So I wondered if this could replace my rev-list patch (which I've been
meaning to upstream for a while). But I think the answer is "no", there
is room for both features. What you're finding here is much more
specific and useful data, but it's also much more expensive to generate.
For my uses cases, it was enough to ask "show me a plausible path
quickly" or even "is this object reachable" (which you can answer out of
bitmaps!).

> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
> 
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> 
> we observe that the Makefile as shipped with 2.0 was introduced in
> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> a different blob.

So I like this very much as an addition to Git's toolbox. But does it
really need to be limited to blobs? We should be able to ask for trees
(both root trees and sub-trees), too. That's going to be less common, I
think, but I have definitely done

  git log --raw -t --pretty=raw --no-abbrev |
  less +/$tree_sha1

to debug things (corrupted repos, funny merges, etc).

You could even do it for submodule commits. Which demonstrates that this
feature does not even need to actually have a resolvable object; you're
just looking for the textual sha1.

You do your filtering on the diff queue, which I think will have the
entries you need if "-t" is specified (and should always have submodule
commits, I think). So you'd just need to relax the incoming object
check, like:

diff --git a/diff.c b/diff.c
index e4571df3bf..0007bb0a09 100644
--- a/diff.c
+++ b/diff.c
@@ -4490,8 +4490,12 @@ static int parse_blobfind_opt(struct diff_options *opt, 
const char *arg)
 {
struct object_id oid;
 
-   if (get_oid_blob(arg, ) || sha1_object_info(oid.hash, NULL) != 
OBJ_BLOB)
-   return error("object '%s' is not a blob", arg);
+   /*
+* We don't even need to have the object, and 40-hex sha1s will
+* just resolve here.
+*/
+   if (get_oid_blob(arg, ))
+   return error("unable to resolve '%s'", arg);
 
if (!opt->blobfind)
opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));

At which point:

  ./git log --oneline -t --blobfind=v2.0.0:Documentation

just works (the results are kind of interesting, too; you see that tree
"go away" in a lot of different commits because many independent
branches touched the directory).

Also interesting:

  ./git log --oneline --blobfind=HEAD:sha1collisiondetection

Well, the output for this particular case isn't that interesting. But it
shows that we can find submodules, too.

-Peff


Re: [PATCH v2 5/7] diff: use skip-to-optional-val in parsing --relative

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 02:31:38PM -0800, Junio C Hamano wrote:

> If this goes on top as a standalone patch, then the reason why it is
> separate from the other users of _default() is not because the way
> it uses the null return is special, but because it was written by a
> different author, I would think.

Mostly I was just concerned that we missed a somewhat subtle bug in the
first conversion, and I think it's worth calling out in the commit
message why that callsite must be converted in a particular way. Whether
that happens in a separate commit or squashed, I don't care too much.

But I dunno. Maybe it is obvious now that the correct code is in there.
;)

-Peff


Re: [PATCH] hashmap: adjust documentation to reflect reality

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 10:47:43PM +0100, Johannes Schindelin wrote:

> > We could add that example to the test helper as then we have a good (tested)
> > example for that case, too.
> 
> What we could *also* do, and what would probably make *even more* sense,
> is to simplify the example drastically, to a point where testing it in
> test-hashmap is pointless, and where a reader can gather *very* quickly
> what it takes to use the hashmap routines.

Yes, I'd be in favor of that, too.

> In any case, I would really like to see my patch applied first, as it is a
> separate concern from what you gentle people talk about: I simply want
> that incorrect documentation fixed. The earlier, the better.

Definitely. I think it is in "next" already.

-Peff


RE: URGENT PLEASE

2017-12-08 Thread Office Contact


Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Jeff King
On Fri, Dec 08, 2017 at 03:10:34AM -0500, Eric Sunshine wrote:

> On Fri, Dec 8, 2017 at 12:16 AM, Jeff King  wrote:
> > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
> > with ",", 2017-10-01) switched the syntax of the trailers
> > placeholder, but forgot to update the documentation in
> > pretty-formats.txt.
> >
> > There's need to mention the old syntax; it was never in a
> 
> I suppose you mean: s/need/no need/

Yes, indeed. Thanks.

-Peff


Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Eric Sunshine
On Fri, Dec 8, 2017 at 12:16 AM, Jeff King  wrote:
> Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
> with ",", 2017-10-01) switched the syntax of the trailers
> placeholder, but forgot to update the documentation in
> pretty-formats.txt.
>
> There's need to mention the old syntax; it was never in a

I suppose you mean: s/need/no need/

> released version of Git.
>
> Signed-off-by: Jeff King 


<    1   2