Re: What's cooking in git.git (Sep 2016, #08; Tue, 27)

2016-09-30 Thread Stefan Beller
>
>
> * jc/attr (2016-05-25) 18 commits
>  - attr: support quoting pathname patterns in C style
>  - attr: expose validity check for attribute names
>  - attr: add counted string version of git_attr()
>  - attr: add counted string version of git_check_attr()
>  - attr: retire git_check_attrs() API
>  - attr: convert git_check_attrs() callers to use the new API
>  - attr: convert git_all_attrs() to use "struct git_attr_check"
>  - attr: (re)introduce git_check_attr() and struct git_attr_check
>  - attr: rename function and struct related to checking attributes
>  - attr.c: plug small leak in parse_attr_line()
>  - attr.c: tighten constness around "git_attr" structure
>  - attr.c: simplify macroexpand_one()
>  - attr.c: mark where #if DEBUG ends more clearly
>  - attr.c: complete a sentence in a comment
>  - attr.c: explain the lack of attr-name syntax check in parse_attr()
>  - attr.c: update a stale comment on "struct match_attr"
>  - attr.c: use strchrnul() to scan for one line
>  - commit.c: use strchrnul() to scan for one line
>  (this branch is used by jc/attr-more, sb/pathspec-label and 
> sb/submodule-default-paths.)
>
>  The attributes API has been updated so that it can later be
>  optimized using the knowledge of which attributes are queried.
>
>  I wanted to polish this topic further to make the attribute
>  subsystem thread-ready, but because other topics depend on this
>  topic and they do not (yet) need it to be thread-ready.
>
>  As the authors of topics that depend on this seem not in a hurry,
>  let's discard this and dependent topics and restart them some other
>  day.
>
>  Will discard.

So I just realized this is a big hint for me to pick up that topic; I assumed
you'd want to tackle the attr subsystem eventually, so all I was doing, was
waiting for your motivation to look at attr stuff to come back.

So what is the actual lacking stuff here?

Thanks,
Stefan


Re: [PATCH v3 2/5] ref-filter: add function to print single ref_array_item

2016-09-30 Thread Junio C Hamano
Jakub Narębski  writes:

>> Expose a format_ref function to create, pretty print and free individual
>> ref-items.
>
> It's now pretty_print_ref, not format_ref (old version in commit message).

Good eyes.  Will tweak locally.

Thanks.


[PATCH 1/3] abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing

2016-09-30 Thread Junio C Hamano
We'll be introducing a new way to decide the default abbreviation
length by initialising DEFAULT_ABBREV to -1 to signal the first call
to "find unique abbreviation" codepath to compute a reasonable value
based on the number of objects we have to avoid collisions.

We have long relied on DEFAULT_ABBREV being a positive concrete
value that is used as the abbreviation length when no extra
configuration or command line option has overridden it.  Some
codepaths wants to use such a positive concrete default value
even before making their first request to actually trigger the
computation for the auto sized default.

Introduce FALLBACK_DEFAULT_ABBREV and use it to the code that
attempts to align the report from "git fetch".  For now, this
macro is also used to initialize the default_abbrev variable,
but the auto-sizing code will use -1 and then use the value of
FALLBACK_DEFAULT_ABBREV as the starting point of auto-sizing.

Signed-off-by: Junio C Hamano 
---
 builtin/fetch.c | 3 +++
 cache.h | 3 +++
 environment.c   | 2 +-
 transport.h | 3 +--
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 164623bb6f..a9f12cc5cf 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -17,6 +17,9 @@
 #include "argv-array.h"
 #include "utf8.h"
 
+#define TRANSPORT_SUMMARY(x) \
+   (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
+
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
N_("git fetch [] "),
diff --git a/cache.h b/cache.h
index f346c01708..5a651b8435 100644
--- a/cache.h
+++ b/cache.h
@@ -1183,6 +1183,9 @@ static inline int hex2chr(const char *s)
 #define MINIMUM_ABBREV minimum_abbrev
 #define DEFAULT_ABBREV default_abbrev
 
+/* used when the code does not know or care what the default abbrev is */
+#define FALLBACK_DEFAULT_ABBREV 7
+
 struct object_context {
unsigned char tree[20];
char path[PATH_MAX];
diff --git a/environment.c b/environment.c
index cd5aa57179..44fb107b8a 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/transport.h b/transport.h
index 6fe3485325..e783377e40 100644
--- a/transport.h
+++ b/transport.h
@@ -142,8 +142,7 @@ struct transport {
 #define TRANSPORT_PUSH_ATOMIC 8192
 #define TRANSPORT_PUSH_OPTIONS 16384
 
-#define TRANSPORT_SUMMARY_WIDTH (2 * DEFAULT_ABBREV + 3)
-#define TRANSPORT_SUMMARY(x) (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - 
gettext_width(x)), (x)
+#define TRANSPORT_SUMMARY_WIDTH (2 * FALLBACK_DEFAULT_ABBREV + 3)
 
 /* Returns a transport suitable for the url */
 struct transport *transport_get(struct remote *, const char *);
-- 
2.10.0-622-g05f606bbb0



[PATCH 0/3] auto-sizing default abbreviation length

2016-09-30 Thread Junio C Hamano
So here is what I queued in 'pu' tonight after back and forth with
Linus and Peff.  The third step from Linus needs to be signed off
and also a meaningful log message for it needs to be written, and
also may need to be updated to include what Linus did in [*1*], but
otherwise I think these are in good enough shape for people to start
playing with them.

They apply on top of Peff's jk/ambiguous-short-object-names topic
that ends at 5b33cb1fd7 ("get_short_sha1: make default
disambiguation configurable", 2016-09-27).


*1*

http://public-inbox.org/git/


Junio C Hamano (2):
  abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
  abbrev: prepare for new world order

Linus Torvalds (1):
  abbrev: auto size the default abbreviation

 builtin/fetch.c |  3 +++
 builtin/rev-parse.c |  5 +++--
 cache.h |  4 
 diff.c  | 25 +++--
 environment.c   |  2 +-
 sha1_name.c | 28 +++-
 transport.h |  3 +--
 7 files changed, 62 insertions(+), 8 deletions(-)

-- 
2.10.0-622-g05f606bbb0



[PATCH 3/3] abbrev: auto size the default abbreviation

2016-09-30 Thread Junio C Hamano
From: Linus Torvalds 

In fairly early days we somehow decided to abbreviate object names
down to 7-hexdigits, but as projects grow, it is becoming more and
more likely to see such a short object names made in earlier days
and recorded in the log messages no longer unique.

Currently the Linux kernel project needs 11 to 12 hexdigits, while
Git itself needs 10 hexdigits to uniquely identify the objects they
have, while many smaller projects may still be fine with the
original 7-hexdigit default.  One-size does not fit all projects.

Introduce a mechanism, where we estimate the number of objects in
the repository upon the first request to abbreviate an object name
with the default setting and come up with a sane default for the
repository.  Based on the expectation that we would see collision in
a repository with 2^(2N) objects when using object names shortened
to first N bits, use sufficient number of hexdigits to cover the
number of objects in the repository.  Each hexdigit (4-bits) we add
to the shortened name allows us to have four times (2-bits) as many
objects in the repository.

---
 cache.h   |  1 +
 environment.c |  2 +-
 sha1_name.c   | 28 +++-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 5a651b8435..0e2a0595e5 100644
--- a/cache.h
+++ b/cache.h
@@ -1204,6 +1204,7 @@ struct object_context {
 #define GET_SHA1_TREEISH  020
 #define GET_SHA1_BLOB 040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_AUTOMATIC  0200
 #define GET_SHA1_ONLY_TO_DIE04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/environment.c b/environment.c
index 44fb107b8a..6f9d290563 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd7cf..beb7ab588b 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, 
void *);
 
 struct disambiguate_state {
int len; /* length of prefix in hex chars */
+   unsigned int nrobjects;
char hex_pfx[GIT_SHA1_HEXSZ + 1];
unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,14 @@ static void find_short_object_filename(struct 
disambiguate_state *ds)
 
if (strlen(de->d_name) != 38)
continue;
+
+   /*
+* We only look at the one subdirectory, and we assume
+* each subdirectory is roughly similar, so each
+* object we find probably has 255 other objects in
+* the other fan-out directories.
+*/
+   ds->nrobjects += 256;
if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
continue;
memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +160,7 @@ static void unique_in_pack(struct packed_git *p,
 
open_pack_index(p);
num = p->num_objects;
+   ds->nrobjects += num;
last = num;
while (first < last) {
uint32_t mid = (first + last) / 2;
@@ -380,6 +390,9 @@ static int show_ambiguous_object(const unsigned char *sha1, 
void *data)
return 0;
 }
 
+/* start from our historical default before the automatic abbreviation */
+static int default_automatic_abbrev = FALLBACK_DEFAULT_ABBREV;
+
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
  unsigned flags)
 {
@@ -426,6 +439,14 @@ static int get_short_sha1(const char *name, int len, 
unsigned char *sha1,
for_each_abbrev(ds.hex_pfx, show_ambiguous_object, );
}
 
+   if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
+   unsigned int expect_collision = 1 << (len * 2);
+   if (ds.nrobjects > expect_collision) {
+   default_automatic_abbrev = len+1;
+   return SHORT_NAME_AMBIGUOUS;
+   }
+   }
+
return status;
 }
 
@@ -458,14 +479,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn 
fn, void *cb_data)
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
int status, exists;
+   int flags = GET_SHA1_QUIETLY;
 
+   if (len < 0) {
+   flags |= GET_SHA1_AUTOMATIC;
+   len = default_automatic_abbrev;
+   }
sha1_to_hex_r(hex, sha1);
if (len == 40 || !len)
return 40;
exists = has_sha1_file(sha1);
while (len < 40) {
unsigned char sha1_ret[20];
-   status = 

[PATCH 2/3] abbrev: prepare for new world order

2016-09-30 Thread Junio C Hamano
The code that sets custom abbreviation length, in response to
command line argument, often does something like this:

if (skip_prefix(arg, "--abbrev=", ))
abbrev = atoi(arg);
else if (!strcmp("--abbrev", ))
abbrev = DEFAULT_ABBREV;
/* make the value sane */
if (abbrev < 0 || 40 < abbrev)
abbrev = ... some sane value ...

However, it is pointless to sanity-check and tweak the value
obtained from DEFAULT_ABBREV.  We are going to allow it to be
initially set to -1 to signal that the default abbreviation length
must be auto sized upon the first request to abbreviate, based on
the number of objects in the repository, and when that happens,
rejecting or tweaking a negative value to a "saner" one will
negatively interfere with the auto sizing.  The codepaths for

git rev-parse --short 
git diff --raw --abbrev

do exactly that; allow them to pass possibly negative abbrevs
intact, that will come from DEFAULT_ABBREV in the future.

Signed-off-by: Junio C Hamano 
---
 builtin/rev-parse.c | 5 +++--
 diff.c  | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 76cf05e2ad..17cbfabdde 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -643,8 +643,9 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
filter &= ~(DO_FLAGS|DO_NOREV);
verify = 1;
abbrev = DEFAULT_ABBREV;
-   if (arg[7] == '=')
-   abbrev = strtoul(arg + 8, NULL, 10);
+   if (!arg[7])
+   continue;
+   abbrev = strtoul(arg + 8, NULL, 10);
if (abbrev < MINIMUM_ABBREV)
abbrev = MINIMUM_ABBREV;
else if (40 <= abbrev)
diff --git a/diff.c b/diff.c
index c6da383c56..cefc13eb8e 100644
--- a/diff.c
+++ b/diff.c
@@ -3399,7 +3399,7 @@ void diff_setup_done(struct diff_options *options)
 */
read_cache();
}
-   if (options->abbrev <= 0 || 40 < options->abbrev)
+   if (40 < options->abbrev)
options->abbrev = 40; /* full */
 
/*
-- 
2.10.0-622-g05f606bbb0



Re: [PATCH v3 0/5] Add --format to tag verification

2016-09-30 Thread Junio C Hamano
What is in the patch series looked more or less good to me.  Lukas's
sirname was still P in [3/5], a patch in [4/5] added an unnecssary
blank line before git_verify_tag_config() and also a local variable
declaration for "char *fmt_pretty" was indented funnily, but none of
these were something I couldn't fix up locally.

I however notice that there is no new tests to protect these two new
features from future breakages.  Perhaps you want to add some in
[6/5]?

Thanks.





RE: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories

2016-09-30 Thread David Turner
Thanks.  The rest all look good too.

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Friday, September 30, 2016 6:45 PM
> To: David Turner
> Cc: git@vger.kernel.org
> Subject: Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object
> directories
> 
> On Fri, Sep 30, 2016 at 09:32:04PM +, David Turner wrote:
> 
> > > +static void env_append(struct argv_array *env, const char *key,
> > > +const char *val) {
> > > + const char *old = getenv(key);
> > > +
> > > + if (!old)
> > > + argv_array_pushf(env, "%s=%s", key, val);
> > > + else
> > > + argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP,
> > > val);
> > >+}
> >
> > I would like a comment explaining this function.
> 
> I'll squash in:
> 
> diff --git a/tmp-objdir.c b/tmp-objdir.c index c92e6cc..a98c246 100644
> --- a/tmp-objdir.c
> +++ b/tmp-objdir.c
> @@ -70,6 +70,13 @@ static void remove_tmp_objdir_on_signal(int signo)
>   raise(signo);
>  }
> 
> +/*
> + * These env_* functions are for setting up the child environment; the
> + * "replace" variant overrides the value of any existing variable with
> +that
> + * "key". The "append" variant puts our new value at the end of a list,
> + * separated by PATH_SEP (which is what separate values in
> + * GIT_ALTERNATE_OBJECT_DIRECTORIES).
> + */
>  static void env_append(struct argv_array *env, const char *key, const char
> *val)  {
>   const char *old = getenv(key);
> 
> > > + * Finalize a temporary object directory by migrating its objects
> > > +into the main
> > > + * object database.
> > > + */
> >
> > This should mention that it frees its argument.
> 
> And:
> 
> diff --git a/tmp-objdir.h b/tmp-objdir.h index aa47aa9..b1e45b4 100644
> --- a/tmp-objdir.h
> +++ b/tmp-objdir.h
> @@ -35,7 +35,8 @@ const char **tmp_objdir_env(const struct tmp_objdir
> *);
> 
>  /*
>   * Finalize a temporary object directory by migrating its objects into the 
> main
> - * object database.
> + * object database, removing the temporary directory, and freeing any
> + * associated resources.
>   */
>  int tmp_objdir_migrate(struct tmp_objdir *);
> 
> 
> -Peff


Re: [PATCH v3 2/5] ref-filter: add function to print single ref_array_item

2016-09-30 Thread Jakub Narębski
W dniu 01.10.2016 o 00:18, santi...@nyu.edu pisze:
> From: Lukas Puehringer 
> 
> ref-filter functions are useful for printing git object information
> using a format specifier. However, some other modules may not want to use
> this functionality on a ref-array but only print a single item.
> 
> Expose a format_ref function to create, pretty print and free individual
> ref-items.

It's now pretty_print_ref, not format_ref (old version in commit message).

[...]
> +void pretty_print_ref(const char *name, const unsigned char *sha1,
> + const char *format, unsigned kind)



Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories

2016-09-30 Thread Jeff King
On Fri, Sep 30, 2016 at 09:32:04PM +, David Turner wrote:

> > +static void env_append(struct argv_array *env, const char *key, const
> > +char *val) {
> > +   const char *old = getenv(key);
> > +
> > +   if (!old)
> > +   argv_array_pushf(env, "%s=%s", key, val);
> > +   else
> > +   argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP,
> > val); 
> >+}
> 
> I would like a comment explaining this function. 

I'll squash in:

diff --git a/tmp-objdir.c b/tmp-objdir.c
index c92e6cc..a98c246 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -70,6 +70,13 @@ static void remove_tmp_objdir_on_signal(int signo)
raise(signo);
 }
 
+/*
+ * These env_* functions are for setting up the child environment; the
+ * "replace" variant overrides the value of any existing variable with that
+ * "key". The "append" variant puts our new value at the end of a list,
+ * separated by PATH_SEP (which is what separate values in
+ * GIT_ALTERNATE_OBJECT_DIRECTORIES).
+ */
 static void env_append(struct argv_array *env, const char *key, const char 
*val)
 {
const char *old = getenv(key);

> > + * Finalize a temporary object directory by migrating its objects into
> > +the main
> > + * object database.
> > + */
> 
> This should mention that it frees its argument.

And:

diff --git a/tmp-objdir.h b/tmp-objdir.h
index aa47aa9..b1e45b4 100644
--- a/tmp-objdir.h
+++ b/tmp-objdir.h
@@ -35,7 +35,8 @@ const char **tmp_objdir_env(const struct tmp_objdir *);
 
 /*
  * Finalize a temporary object directory by migrating its objects into the main
- * object database.
+ * object database, removing the temporary directory, and freeing any
+ * associated resources.
  */
 int tmp_objdir_migrate(struct tmp_objdir *);
 

-Peff


Re: "Purposes, Concepts,Misfits, and a Redesign of Git" (a research paper)

2016-09-30 Thread Jakub Narębski
W dniu 30.09.2016 o 18:14, Konstantin Khomoutov pisze:

> The "It Will Never Work in Theory" blog has just posted a summary of a
> study which tried to identify shortcomings in the design of Git.
> 
> In the hope it might be interesting, I post this summary here.
> URL: http://neverworkintheory.org/2016/09/30/rethinking-git.html

I will comment on the article itself, not just on the summary.

| 2.2 Git
[...]
| But tracked files cannot be ignored; to ignore a tracked file
| one has to mark it as “assume unchanged.” This “assume
| unchanged” file will not be recognized by add; to make it
| tracked again this marking has to be removed.

WRONG!  Git has tracked files, untracked unignored files, and
untracked ignored files (mostly considered unimportant).

The "assume unchanged" bit is _performance_ optimization. It is not,
and cannot be a 'ignore tracked files' bit - here lies lost work!!!
You can use (imperfectly) "prefer worktree" bit hack instead.

You can say, if 'ignoring change to tracked files' is motivation,
or purpose, it lacks direct concept.

[...]
| As a result, when a user switches branches, files may be
| unexpectedly overwritten. 

This is possible _only_ if there are uncommitted changes. If they
are there, and they do not conflict with switching a branch, they
are "floated" to a newly checked out branch.

| Git fails with an error if there are any conflicting changes,
| effectively preventing the user from switching in this case.
| To mitigate this problem, Git provides a way to save versions
| of files to another storage area, called the “stash,” using
| a special command issued prior to the branch switch.

Or you can try to merge uncommitted changes with changes between
two branches: current and switched to.

Or you can forcibly discard your changes.

Or (with modern Git), you can put each branch in a separate
working area (with "git worktree"), though the article may predate
this feature.

[...]
| *Syncing with Other Repositories* Crucial to the understanding
| of how syncing with other repositories work is the notion
| of a “remote branch.” This is a branch (pointer to a commit)
| that (asynchronously) reflects the state of a branch in another
| repository. It is updated whenever there is some network
| communication (e.g., a push or fetch).

It is called "remote-tracking branch", rather than "remote branch".
At least in Git documentation. This branch is in local repository,
not in remote one. The remote-tracking branch for example
`origin/master` follows (tracks) branch `master` in remote
repository `origin`.

| The notion of a “remote branch” must not be confused
| with that of an “upstream branch.” An upstream branch is
| just a convenience for users: after the user assigns it to some
| branch, commands like pull and push default to use that
| branch for fetching and pushing changes if no branch is given
| as input.

Actually "upstream branch" (and related "upstream repository")
are a concept, not only a convenience. They denote a branch
(usually in remote repository) which is intended to ultimately
include changes in given branch. Note that "upstream branch"
can be set separately for any given local branch.

One thing that can enormously help recovering from errors, and
is not covered in the list of concepts is REFLOG.


[...]
| 3. Operational Misfits
[...]
| *Saving Changes* Suppose you are in the middle of a long
| task and want to save your changes, so that they can be later
| retrieved in case of failure. How would you do that?

You would use `git stash` or `git stash --include-untracked`!
In more complicated situations (during long-running operation
like resolving merge conflicts, interactive rebase, or finding
bugs with bisect) with modern Git you can create a new separate
working area with `git worktree`.

This also applies to the "*Switching branches*" problem (which
is more involved, as `git stash` would often not work, and if
it does it is harder to restore state - note however that stash
description includes the branch it was on).

So I would say that *Saving Changes* is solved with stash,
while *Switching Branches* remains a misfit.

| *Detached Head* Suppose you are working on some branch
| and realize that the last few commits you did are wrong, so
| you decide to go back to an old commit to start over again.
| You checkout that old commit and keep working creating
| commits. You might be surprised to discover that these new
| commits you’ve been working on belong to no branch at all.
| To avoid losing them you need to create a new branch or reset
| an existing one to point to the last commit.

It would be hard to be surprised unless one is in habit of
disregarding multi-line warning from Git... ;-)

I think it might be more of an UX problem, namely that the
`git checkout` command does too many things, which include
checking out revision (detaching HEAD, or landing on unnamed
branch, unless we create a new branch at the same time with
the '-b ' option), and checking out a 

[PATCH v3 3/5] tag: add format specifier to gpg_verify_tag

2016-09-30 Thread santiago
From: Lukas P 

Calling functions for gpg_verify_tag() may desire to print relevant
information about the header for further verification. Add an optional
format argument to print any desired information after GPG verification.

Signed-off-by: Lukas Puehringer 
---
 builtin/tag.c|  2 +-
 builtin/verify-tag.c |  2 +-
 tag.c| 17 +++--
 tag.h|  4 ++--
 4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 50e4ae5..14f3b48 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -105,7 +105,7 @@ static int delete_tag(const char *name, const char *ref,
 static int verify_tag(const char *name, const char *ref,
const unsigned char *sha1)
 {
-   return gpg_verify_tag(sha1, name, GPG_VERIFY_VERBOSE);
+   return verify_and_format_tag(sha1, name, NULL, GPG_VERIFY_VERBOSE);
 }
 
 static int do_sign(struct strbuf *buffer)
diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
index 99f8148..de10198 100644
--- a/builtin/verify-tag.c
+++ b/builtin/verify-tag.c
@@ -51,7 +51,7 @@ int cmd_verify_tag(int argc, const char **argv, const char 
*prefix)
const char *name = argv[i++];
if (get_sha1(name, sha1))
had_error = !!error("tag '%s' not found.", name);
-   else if (gpg_verify_tag(sha1, name, flags))
+   else if (verify_and_format_tag(sha1, name, NULL, flags))
had_error = 1;
}
return had_error;
diff --git a/tag.c b/tag.c
index 291073f..d3512c0 100644
--- a/tag.c
+++ b/tag.c
@@ -4,6 +4,7 @@
 #include "tree.h"
 #include "blob.h"
 #include "gpg-interface.h"
+#include "ref-filter.h"
 
 const char *tag_type = "tag";
 
@@ -33,8 +34,8 @@ static int run_gpg_verify(const char *buf, unsigned long 
size, unsigned flags)
return ret;
 }
 
-int gpg_verify_tag(const unsigned char *sha1, const char *name_to_report,
-   unsigned flags)
+int verify_and_format_tag(const unsigned char *sha1, const char *name,
+   const char *fmt_pretty, unsigned flags)
 {
enum object_type type;
char *buf;
@@ -44,21 +45,25 @@ int gpg_verify_tag(const unsigned char *sha1, const char 
*name_to_report,
type = sha1_object_info(sha1, NULL);
if (type != OBJ_TAG)
return error("%s: cannot verify a non-tag object of type %s.",
-   name_to_report ?
-   name_to_report :
+   name ?
+   name :
find_unique_abbrev(sha1, DEFAULT_ABBREV),
typename(type));
 
buf = read_sha1_file(sha1, , );
if (!buf)
return error("%s: unable to read file.",
-   name_to_report ?
-   name_to_report :
+   name ?
+   name :
find_unique_abbrev(sha1, DEFAULT_ABBREV));
 
ret = run_gpg_verify(buf, size, flags);
 
free(buf);
+
+   if (fmt_pretty)
+   pretty_print_ref(name, sha1, fmt_pretty, FILTER_REFS_TAGS);
+
return ret;
 }
 
diff --git a/tag.h b/tag.h
index a5721b6..896b9c2 100644
--- a/tag.h
+++ b/tag.h
@@ -17,7 +17,7 @@ extern int parse_tag_buffer(struct tag *item, const void 
*data, unsigned long si
 extern int parse_tag(struct tag *item);
 extern struct object *deref_tag(struct object *, const char *, int);
 extern struct object *deref_tag_noverify(struct object *);
-extern int gpg_verify_tag(const unsigned char *sha1,
-   const char *name_to_report, unsigned flags);
+extern int verify_and_format_tag(const unsigned char *sha1, const char *name,
+   const char *fmt_pretty, unsigned flags);
 
 #endif /* TAG_H */
-- 
2.10.0



[PATCH v3 5/5] builtin/tag: add --format argument for tag -v

2016-09-30 Thread santiago
From: Lukas Puehringer 

Adding --format to git tag -v mutes the default output of the GPG
verification and instead prints the formatted tag object.
This allows callers to cross-check the tagname from refs/tags with
the tagname from the tag object header upon GPG verification.

The callback function for for_each_tag_name() didn't allow callers to
pass custom data to their callback functions. Add a new opaque pointer
to each_tag_name_fn's parameter to allow this.

Signed-off-by: Lukas Puehringer 
---
 Documentation/git-tag.txt |  2 +-
 builtin/tag.c | 34 +++---
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index 7ecca8e..3bb5e3c 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -15,7 +15,7 @@ SYNOPSIS
 'git tag' [-n[]] -l [--contains ] [--points-at ]
[--column[=] | --no-column] [--create-reflog] [--sort=]
[--format=] [--[no-]merged []] [...]
-'git tag' -v ...
+'git tag' -v [--format=] ...
 
 DESCRIPTION
 ---
diff --git a/builtin/tag.c b/builtin/tag.c
index 14f3b48..7730fd0 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -24,7 +24,7 @@ static const char * const git_tag_usage[] = {
N_("git tag -d ..."),
N_("git tag -l [-n[]] [--contains ] [--points-at ]"
"\n\t\t[--format=] [--[no-]merged []] 
[...]"),
-   N_("git tag -v ..."),
+   N_("git tag -v [--format=] ..."),
NULL
 };
 
@@ -66,15 +66,17 @@ static int list_tags(struct ref_filter *filter, struct 
ref_sorting *sorting, con
 }
 
 typedef int (*each_tag_name_fn)(const char *name, const char *ref,
-   const unsigned char *sha1);
+   const unsigned char *sha1, void *cb_data);
 
-static int for_each_tag_name(const char **argv, each_tag_name_fn fn)
+static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
+   void *cb_data)
 {
const char **p;
char ref[PATH_MAX];
int had_error = 0;
unsigned char sha1[20];
 
+
for (p = argv; *p; p++) {
if (snprintf(ref, sizeof(ref), "refs/tags/%s", *p)
>= sizeof(ref)) {
@@ -87,14 +89,14 @@ static int for_each_tag_name(const char **argv, 
each_tag_name_fn fn)
had_error = 1;
continue;
}
-   if (fn(*p, ref, sha1))
+   if (fn(*p, ref, sha1, cb_data))
had_error = 1;
}
return had_error;
 }
 
 static int delete_tag(const char *name, const char *ref,
-   const unsigned char *sha1)
+   const unsigned char *sha1, void *cb_data)
 {
if (delete_ref(ref, sha1, 0))
return 1;
@@ -103,9 +105,16 @@ static int delete_tag(const char *name, const char *ref,
 }
 
 static int verify_tag(const char *name, const char *ref,
-   const unsigned char *sha1)
+   const unsigned char *sha1, void *cb_data)
 {
-   return verify_and_format_tag(sha1, name, NULL, GPG_VERIFY_VERBOSE);
+   int flags;
+char *fmt_pretty = cb_data;
+   flags = GPG_VERIFY_VERBOSE;
+
+   if (fmt_pretty)
+   flags = GPG_VERIFY_QUIET;
+
+   return verify_and_format_tag(sha1, name, fmt_pretty, flags);
 }
 
 static int do_sign(struct strbuf *buffer)
@@ -334,7 +343,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
struct strbuf err = STRBUF_INIT;
struct ref_filter filter;
static struct ref_sorting *sorting = NULL, **sorting_tail = 
-   const char *format = NULL;
+   char *format = NULL;
struct option options[] = {
OPT_CMDMODE('l', "list", , N_("list tag names"), 'l'),
{ OPTION_INTEGER, 'n', NULL, , N_("n"),
@@ -424,9 +433,12 @@ int cmd_tag(int argc, const char **argv, const char 
*prefix)
if (filter.merge_commit)
die(_("--merged and --no-merged option are only allowed with 
-l"));
if (cmdmode == 'd')
-   return for_each_tag_name(argv, delete_tag);
-   if (cmdmode == 'v')
-   return for_each_tag_name(argv, verify_tag);
+   return for_each_tag_name(argv, delete_tag, NULL);
+   if (cmdmode == 'v') {
+   if (format)
+   verify_ref_format(format);
+   return for_each_tag_name(argv, verify_tag, format);
+   }
 
if (msg.given || msgfile) {
if (msg.given && msgfile)
-- 
2.10.0



[PATCH v3 2/5] ref-filter: add function to print single ref_array_item

2016-09-30 Thread santiago
From: Lukas Puehringer 

ref-filter functions are useful for printing git object information
using a format specifier. However, some other modules may not want to use
this functionality on a ref-array but only print a single item.

Expose a format_ref function to create, pretty print and free individual
ref-items.

Signed-off-by: Lukas Puehringer 
---
 ref-filter.c | 10 ++
 ref-filter.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/ref-filter.c b/ref-filter.c
index bc551a7..ee3ed67 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1655,6 +1655,16 @@ void show_ref_array_item(struct ref_array_item *info, 
const char *format, int qu
putchar('\n');
 }
 
+void pretty_print_ref(const char *name, const unsigned char *sha1,
+   const char *format, unsigned kind)
+{
+   struct ref_array_item *ref_item;
+   ref_item = new_ref_array_item(name, sha1, 0);
+   ref_item->kind = kind;
+   show_ref_array_item(ref_item, format, 0);
+   free_array_item(ref_item);
+}
+
 /*  If no sorting option is given, use refname to sort as default */
 struct ref_sorting *ref_default_sorting(void)
 {
diff --git a/ref-filter.h b/ref-filter.h
index 14d435e..3d23090 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -107,4 +107,7 @@ struct ref_sorting *ref_default_sorting(void);
 /*  Function to parse --merged and --no-merged options */
 int parse_opt_merge_filter(const struct option *opt, const char *arg, int 
unset);
 
+void pretty_print_ref(const char *name, const unsigned char *sha1,
+   const char *format, unsigned kind);
+
 #endif /*  REF_FILTER_H  */
-- 
2.10.0



[PATCH v3 4/5] builtin/verify-tag: add --format to verify-tag

2016-09-30 Thread santiago
From: Santiago Torres 

Callers of verify-tag may want to cross-check the tagname from refs/tags
with the tagname from the tag object header upon GPG verification. This
is to avoid tag refs that point to an incorrect object.

Add a --format parameter to git verify-tag to print the formatted tag
object header in addition to or instead of the --verbose or --raw GPG
verification output.

Signed-off-by: Santiago Torres 
---
 Documentation/git-verify-tag.txt |  2 +-
 builtin/verify-tag.c | 13 +++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-verify-tag.txt b/Documentation/git-verify-tag.txt
index d590edc..0b8075d 100644
--- a/Documentation/git-verify-tag.txt
+++ b/Documentation/git-verify-tag.txt
@@ -8,7 +8,7 @@ git-verify-tag - Check the GPG signature of tags
 SYNOPSIS
 
 [verse]
-'git verify-tag' ...
+'git verify-tag' [--format=] ...
 
 DESCRIPTION
 ---
diff --git a/builtin/verify-tag.c b/builtin/verify-tag.c
index de10198..745b6a6 100644
--- a/builtin/verify-tag.c
+++ b/builtin/verify-tag.c
@@ -12,12 +12,14 @@
 #include 
 #include "parse-options.h"
 #include "gpg-interface.h"
+#include "ref-filter.h"
 
 static const char * const verify_tag_usage[] = {
-   N_("git verify-tag [-v | --verbose] ..."),
+   N_("git verify-tag [-v | --verbose] [--format=] 
..."),
NULL
 };
 
+
 static int git_verify_tag_config(const char *var, const char *value, void *cb)
 {
int status = git_gpg_config(var, value, cb);
@@ -30,9 +32,11 @@ int cmd_verify_tag(int argc, const char **argv, const char 
*prefix)
 {
int i = 1, verbose = 0, had_error = 0;
unsigned flags = 0;
+char *fmt_pretty;
const struct option verify_tag_options[] = {
OPT__VERBOSE(, N_("print tag contents")),
OPT_BIT(0, "raw", , N_("print raw gpg status output"), 
GPG_VERIFY_RAW),
+   OPT_STRING(  0 , "format", _pretty, N_("format"), 
N_("format to use for the output")),
OPT_END()
};
 
@@ -46,12 +50,17 @@ int cmd_verify_tag(int argc, const char **argv, const char 
*prefix)
if (verbose)
flags |= GPG_VERIFY_VERBOSE;
 
+   if (fmt_pretty) {
+   verify_ref_format(fmt_pretty);
+   flags |= GPG_VERIFY_QUIET;
+   }
+
while (i < argc) {
unsigned char sha1[20];
const char *name = argv[i++];
if (get_sha1(name, sha1))
had_error = !!error("tag '%s' not found.", name);
-   else if (verify_and_format_tag(sha1, name, NULL, flags))
+   else if (verify_and_format_tag(sha1, name, fmt_pretty, flags))
had_error = 1;
}
return had_error;
-- 
2.10.0



[PATCH v3 0/5] Add --format to tag verification

2016-09-30 Thread santiago
From: Santiago Torres 

This is the third iteration of [1][2], and as a result of the discussion
in [3].

In this re-roll we:

* Fixed all the signed-off-by's

[0002]
* Renamed the function format_ref to pretty_print_ref instead, which
  is a more descriptive name 

[0004] 
* Added the respective line for the new --format parameter in the
  documentation.

[0005] 
* Added mention of the --format flag in the documentation files. 
* Fixed the function signatures, now they take an opaque void *cb_data pointer
  so it can be used in a more general way (by e.g., delete_tag).

This patch applies to 2.10.0 and master.

[1] http://public-inbox.org/git/20160922185317.349-1-santi...@nyu.edu/
[2] http://public-inbox.org/git/20160926224233.32702-1-santi...@nyu.edu/
[3] http://public-inbox.org/git/20160607195608.16643-1-santi...@nyu.edu/

Lukas Puehringer (4):
  gpg-interface, tag: add GPG_VERIFY_QUIET flag
  ref-filter: add function to print single ref_array_item
  tag: add format specifier to gpg_verify_tag
  builtin/tag: add --format argument for tag -v

Santiago Torres (1):
  builtin/verify-tag: add --format to verify-tag

 Documentation/git-tag.txt|  2 +-
 Documentation/git-verify-tag.txt |  2 +-
 builtin/tag.c| 34 +++---
 builtin/verify-tag.c | 13 +++--
 gpg-interface.h  |  1 +
 ref-filter.c | 10 ++
 ref-filter.h |  3 +++
 tag.c| 22 +++---
 tag.h|  4 ++--
 9 files changed, 67 insertions(+), 24 deletions(-)

-- 
2.10.0



[PATCH v3 1/5] gpg-interface, tag: add GPG_VERIFY_QUIET flag

2016-09-30 Thread santiago
From: Lukas Puehringer 

Functions that print git object information may require that the
gpg-interface functions be silent. Add GPG_VERIFY_QUIET flag and prevent
print_signature_buffer from being called if flag is set.

Signed-off-by: Lukas Puehringer 
---
 gpg-interface.h | 1 +
 tag.c   | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gpg-interface.h b/gpg-interface.h
index ea68885..85dc982 100644
--- a/gpg-interface.h
+++ b/gpg-interface.h
@@ -3,6 +3,7 @@
 
 #define GPG_VERIFY_VERBOSE 1
 #define GPG_VERIFY_RAW 2
+#define GPG_VERIFY_QUIET   4
 
 struct signature_check {
char *payload;
diff --git a/tag.c b/tag.c
index d1dcd18..291073f 100644
--- a/tag.c
+++ b/tag.c
@@ -3,6 +3,7 @@
 #include "commit.h"
 #include "tree.h"
 #include "blob.h"
+#include "gpg-interface.h"
 
 const char *tag_type = "tag";
 
@@ -24,7 +25,9 @@ static int run_gpg_verify(const char *buf, unsigned long 
size, unsigned flags)
 
ret = check_signature(buf, payload_size, buf + payload_size,
size - payload_size, );
-   print_signature_buffer(, flags);
+
+   if (!(flags & GPG_VERIFY_QUIET))
+   print_signature_buffer(, flags);
 
signature_check_clear();
return ret;
-- 
2.10.0



Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories

2016-09-30 Thread Jeff King
On Fri, Sep 30, 2016 at 02:25:43PM -0700, Junio C Hamano wrote:

> > +void add_to_alternates_internal(const char *reference)
> > +{
> > +   prepare_alt_odb();
> > +   link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
> > +}
> > +
> 
> A function _internal being extern felt a bit funny.  We are only
> appending so the first one does not have to be reprepare.

It's a match for add_to_alternates_file(). Suggestions for a better word
are welcome.

We do need to prepare_alt_odb(), as that is what sets up the
alt_odb_tail pointer. And also, a later prepare() call would overwrite
our entry.  We could refactor the alt_odb code, but it seemed simplest
to just make sure we don't add to an unprepared list.

> > +   t = xmalloc(sizeof(*t));
> > +   strbuf_init(>path, 0);
> > +   argv_array_init(>env);
> > +
> > +   strbuf_addf(>path, "%s/incoming-XX", get_object_directory());
> 
> I was wondering where you would put this in.  Inside .git/objects/
> sounds good.

The name "incoming" is kind of arbitrary and related to the fact that
this is used for receive-pack (though if we were to use it on the
fetching side, I think it would be equally correct). I don't think it
really matters in practice.

> > +static int pack_copy_priority(const char *name)
> > +{
> > +   if (!starts_with(name, "pack"))
> > +   return 0;
> > +   if (ends_with(name, ".keep"))
> > +   return 1;
> > +   if (ends_with(name, ".pack"))
> > +   return 2;
> > +   if (ends_with(name, ".idx"))
> > +   return 3;
> > +   return 4;
> > +}
> 
> Thanks for being careful.  A blind "cp -r" would have ruined the
> day.
> 
> We do not do bitmaps upon receiving, I guess.

But we don't, but they (and anything else) would just sort at the end,
which is OK.

> > + * struct tmp_objdir *t = tmp_objdir_create();
> > + * if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
> > + * !tmp_objdir_migrate(t))
> > + * printf("success!\n");
> > + * else
> > + * die("failed...tmp_objdir will clean up for us");
> 
> Made me briefly wonder if a caller might want to use appropriate
> environment to use the tmp-objdir given by the API in addition to
> its own, but then such a caller just needs to prepare its own argv-array
> and concatenate tmp_objdir_env() before making the opt_cd_env call,
> so this is perfectly fine.

Yep, and that's exactly what happens in one spot of the next patch.
My original had just open-coded, but I was happy to see we have
argv_array_pushv() these days, so it's a one-liner.

In the very original version, the receive-pack process did not need to
access the new objects at all (not until ref update time anyway, at
which point they've been migrated). And that's why the environment is
intentionally kept separate, and the caller can feed it to whichever
sub-programs it chooses. But a later version of git that handled shallow
pushes required receive-pack to actually look at the objects, and I
added the add_to_alternates_internal() call you see here.

At that point, it does make me wonder if a better interface would be for
tmp_objdir to just set up the environment variables in the parent
process in the first place, and then restore them upon
tmp_objdir_destroy(). It makes things a bit more automatic, which makes
me hesitate, but I think it would be fine for receive-pack.

I dunno. I mostly left it alone because I did it this way long ago, and
it wasn't broke. Polishing for upstream is an opportunity to fix old
oddities, but I think there is some value in applying a more
battle-tested patch.

-Peff



Re: [PATCH 1/5] pretty: allow formatting DATE_SHORT

2016-09-30 Thread Jacob Keller
On Fri, Sep 30, 2016 at 3:56 AM, SZEDER Gábor  wrote:
>> On Thu, Sep 29, 2016 at 1:33 AM, Jeff King  wrote:
>> > There's no way to do this short of "%ad" and --date=short,
>> > but that limits you to having a single date format in the
>> > output.
>> >
>> > This would possibly be better done with something more like
>> > "%ad(short)".
>> >
>> > Signed-off-by: Jeff King 
>> > ---
>> >  pretty.c | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/pretty.c b/pretty.c
>> > index 493edb0..c532c17 100644
>> > --- a/pretty.c
>> > +++ b/pretty.c
>> > @@ -727,6 +727,9 @@ static size_t format_person_part(struct strbuf *sb, 
>> > char part,
>> > case 'I':   /* date, ISO 8601 strict */
>> > strbuf_addstr(sb, show_ident_date(, 
>> > DATE_MODE(ISO8601_STRICT)));
>> > return placeholder_len;
>> > +   case 's':
>> > +   strbuf_addstr(sb, show_ident_date(, DATE_MODE(SHORT)));
>> > +   return placeholder_len;
>> > }
>> >
>> >  skip:
>> > --
>> > 2.10.0.566.g5365f87
>> >
>>
>> Nice. I use date=short in some of my aliases and switching to this is
>> nicer. I assume this turns into "%(as)"?
>>
>> What about documenting this in  pretty-formats.txt?
>
> Here you go :)
>
>   
> http://public-inbox.org/git/1444235305-8718-1-git-send-email-sze...@ira.uka.de/
>

Nice, thanks!

Regards,
Jake


RE: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories

2016-09-30 Thread David Turner
> +static void env_append(struct argv_array *env, const char *key, const
> +char *val) {
> + const char *old = getenv(key);
> +
> + if (!old)
> + argv_array_pushf(env, "%s=%s", key, val);
> + else
> + argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP,
> val); 
>+}

I would like a comment explaining this function. 

> + * Finalize a temporary object directory by migrating its objects into
> +the main
> + * object database.
> + */

This should mention that it frees its argument.



Re: [PATCH 3/6] tmp-objdir: introduce API for temporary object directories

2016-09-30 Thread Junio C Hamano
Jeff King  writes:

> diff --git a/sha1_file.c b/sha1_file.c
> index 9a79c19..65deaf9 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -421,6 +421,12 @@ void add_to_alternates_file(const char *reference)
>   free(alts);
>  }
>  
> +void add_to_alternates_internal(const char *reference)
> +{
> + prepare_alt_odb();
> + link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
> +}
> +

A function _internal being extern felt a bit funny.  We are only
appending so the first one does not have to be reprepare.

> +static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
> +{
> + int err;
> +
> + if (!t)
> + return 0;
> +
> + if (t == the_tmp_objdir)
> + the_tmp_objdir = NULL;
> +
> + /*
> +  * This may use malloc via strbuf_grow(), but we should
> +  * have pre-grown t->path sufficiently so that this
> +  * doesn't happen in practice.
> +  */
> + err = remove_dir_recursively(>path, 0);
> +
> + /*
> +  * When we are cleaning up due to a signal, we won't bother
> +  * freeing memory; it may cause a deadlock if the signal
> +  * arrived while libc's allocator lock is held.
> +  */
> + if (!on_signal)
> + tmp_objdir_free(t);
> + return err;
> +}
> +
> +int tmp_objdir_destroy(struct tmp_objdir *t)
> +{
> + return tmp_objdir_destroy_1(t, 0);
> +}

Looks sensible.

> + t = xmalloc(sizeof(*t));
> + strbuf_init(>path, 0);
> + argv_array_init(>env);
> +
> + strbuf_addf(>path, "%s/incoming-XX", get_object_directory());

I was wondering where you would put this in.  Inside .git/objects/
sounds good.

> +/*
> + * Make sure we copy packfiles and their associated metafiles in the correct
> + * order. All of these ends_with checks are slightly expensive to do in
> + * the midst of a sorting routine, but in practice it shouldn't matter.
> + * We will have a relatively small number of packfiles to order, and loose
> + * objects exit early in the first line.
> + */
> +static int pack_copy_priority(const char *name)
> +{
> + if (!starts_with(name, "pack"))
> + return 0;
> + if (ends_with(name, ".keep"))
> + return 1;
> + if (ends_with(name, ".pack"))
> + return 2;
> + if (ends_with(name, ".idx"))
> + return 3;
> + return 4;
> +}

Thanks for being careful.  A blind "cp -r" would have ruined the
day.

We do not do bitmaps upon receiving, I guess.

> + *   struct tmp_objdir *t = tmp_objdir_create();
> + *   if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
> + *   !tmp_objdir_migrate(t))
> + *   printf("success!\n");
> + *   else
> + *   die("failed...tmp_objdir will clean up for us");

Made me briefly wonder if a caller might want to use appropriate
environment to use the tmp-objdir given by the API in addition to
its own, but then such a caller just needs to prepare its own argv-array
and concatenate tmp_objdir_env() before making the opt_cd_env call,
so this is perfectly fine.



Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Junio C Hamano
Junio C Hamano  writes:

> Jonathan Tan  writes:
>
>>> I vaguely recall that there were some discussion on the definition
>>> of "what's a trailer line" with folks from the kernel land, perhaps
>>> while discussing the interpret-trailers topic.  IIRC, when somebody
>>> passes an improved version along, the resulting message's trailer
>>> block may look like this:
>>>
>>> Signed-off-by: Original Author 
>>> [fixed typo in the variable names]
>>> Signed-off-by: Somebhody Else 
>>>
>>> and an obvious "wish" of theirs was to treat not just RFC2822-like
>>> "a line that begins with token followed by a colon" but also these
>>> short comments as part of the trailer block.  Your original wish in
>>> [*1*] is to also treat "a line that begin with a whitespace that
>>> follows a line that begins with token followed by a colon" as part
>>> of the trailer block and I personally think that is a reasonable
>>> thing to wish for, too.
>>
>> If we allowed arbitrary lines in the trailer block, this would solve
>> my original problem, yes.

Here is an experiment I ran during my lunch break.  The script
(attached) is meant to run in the kernel repository and
for each log messages of each non-merge commit:

 * find its last paragraph, where the definition of paragraph is
   simply "a blank/empty line";

 * inspect if there is at least one RFC2822-header-looking line, or
   a line that begins with "(cherry picked from";

 * dump the ones that do not pass the above criteria.

My cursory look of the output did not spot a legitimate trailer
block that we should have identified.  The output lines shown were
ones that are not signed off at all (e.g. af8c34ce6ae32add that says
"Linux 4.7-rc2"), ones that has three-dash line "---" in them
(e.g. 133d558216d9), ones that has diffstat that should have been
after "---" (e.g. 259307074bfcf1f).

The story is the same if you run it in git.git; the "do we have at
least one rfc2822-header-looking line or '(cherry picked from' line
in the last paragraph? if so, then that is an existing trailer
block" seems to be a good heuristics to cover many cases like
these:

d0196c8d5d3057c5c21a82f3d0113ca8e501033b
Signed-off-by: Arnd Bergmann 
[tomi.valkei...@ti.com: resolved conflicts]
Signed-off-by: Tomi Valkeinen 

59f0aa9480cfef9173a648cec4537addc5f3ad94
Link 1: https://bugzilla.kernel.org/show_bug.cgi?id=9916
http://bugzilla.kernel.org/show_bug.cgi?id=10100
https://lkml.org/lkml/2008/2/25/282
Link 2: https://bugzilla.kernel.org/show_bug.cgi?id=9399
https://bugzilla.kernel.org/show_bug.cgi?id=12461
https://bugzilla.kernel.org/show_bug.cgi?id=11880
Link 3: https://bugzilla.kernel.org/show_bug.cgi?id=11884
https://bugzilla.kernel.org/show_bug.cgi?id=14081
https://bugzilla.kernel.org/show_bug.cgi?id=14086
https://bugzilla.kernel.org/show_bug.cgi?id=14446
Link 4: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Lv Zheng 
Tested-by: Chris Bainbridge 
Signed-off-by: Rafael J. Wysocki 

-- >8 --
#!/bin/sh

git log --no-merges |
perl -e '
sub flush {
my ($commit, @lines) = @_;
my $seen_good = 0;
for (@lines) {
if (/^[-A-Za-z0-9]+: / ||
/^\(cherry picked from/) {
$seen_good = 1;
last;
}
}
if (!$seen_good) {
print "\n$commit\n";
for (@lines) {
print;
}
}
}

my (@lines, $this);
while (<>) {
if (/^commit (.*)$/) {
my $next = $1;
flush($this, @lines);
@lines = ();
$this = $next;
}
if (s/^//) {
if (/^\s*$/) {
@lines = ();
} else {
push @lines, $_;
}
}
}
if (@lines && $this) {
flush($this, @lines);
}
'


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Jonathan Tan

On 09/30/2016 12:34 PM, Junio C Hamano wrote:

2) The Linux kernel's repository has some "commit ... upstream." lines
in this position (below the commit title) - for example, in commit
dacc0987fd2e.


"A group of people seem to prefer it there" does not lead to
"therefore let's move it there for everybody".  It does open a
possibility that we may want to add a new option to put it there,
but does not justify changing what existing "-x" option does.


To clarify, my patch adds the new option you described (to place it 
below the title instead of at the bottom of the commit message). The 
default is still the current behavior.


Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Junio C Hamano
Linus Torvalds  writes:

> On Fri, Sep 30, 2016 at 10:54 AM, Linus Torvalds
>  wrote:
>>
>>> So IMHO, the best combination is the init_default_abbrev() you posted in
>>> [1], but initialized at the top of find_unique_abbrev(). And cached
>>> there, obviously, in a similar way.
>>
>> That's certainly possible, but I'm really not happy with how the
>> counting function looks.  And nobody actually stood up to say "yeah,
>> that gets alternate loose objects right" or "if you have tons of those
>> alternate loose objects you have other issues anyway". I think
>> somebody would have to "own" that counting function, the advantage of
>> just putting it into disambiguate_state is that we just get the
>> counting for free..
>
> Side note: maybe we can mix the two approaches, and keep the counting
> in the disambiguation state, and just make the counting function do
>
> init_object_disambiguation();
> find_short_object_filename();
> find_short_packed_object();
> finish_object_disambiguation(, sha1);
>
> and then just use "ds.nrobjects". So the counting would still be done
> by the disambiguation code, it just woudln't be in get_short_sha1().
>
> So here's another version that takes that approach. And if somebody
> (hint hint) wants to do the counting differently, they can perhaps
> send an incremental patch to do that.
>
> (This patch also contains the few setup issues Junio found with the
> new "default_abbrev is negative" model)

Sorry, but I do not quite see the point in the difference between
this one and your original that had a hook in get_short_sha1(), as
it seemed to me that Peff's objection was about the counting done in
find_short_object_filename() and find_short_packed_object(), which
is (understandably) still here.





Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Ævar Arnfjörð Bjarmason
On Fri, Sep 30, 2016 at 3:01 AM, Linus Torvalds
 wrote:
> On Thu, Sep 29, 2016 at 5:56 PM, Mike Hommey  wrote:
>>
>> OTOH, how often does one refer to trees or blobs with abbreviated sha1s?
>> Most of the time, you'd use abbreviated sha1s for commits. And the number
>> of commits in git and the kernel repositories are much lower than the
>> number of overall objects.
>
> See that whole other discussion about this. I agree. If we only ever
> worried about just commits, the abbreviation length wouldn't need to
> be grown nearly as aggressively. The current default would still be
> wrong for the kernel, but it wouldn't be as noticeably wrong, and
> updating it to 8 or 9 would be fine.
>
> That said, people argued against that too. We *do* end up having
> abbreviated SHA1's for blobs in the diff index. When I said that _I_
> neer use it, somebody piped up to say that they do.
>
> So I'd rather just keep the existing semantics (a hash is a hash is a
> hash), and just abbreviate at a sufficient point that we don't have to
> worry too much about disambiguating further by object type.

I work on a repo that's around the size of linux.git in every way
(commits, objects etc.), and growing twice as fast.

So I also see 8 or 9 digit abbreviations on a daily basis, even with
the current defaults core.abbrev, but I still think growing it so
aggressively is the wrong thing to do.

The fact that we have a core.abbrev option at all and nobody's talking
about getting rid of it entirely means we all acknowledge the UX
convenience of short SHA1s.

I don't think it's a good idea for such UX options to have defaults
that really only make sense for repositories at the very far end of
the bell curve, which is the case with linux.git and the repo I work
on.

Either way you're going to waste somebody's time. I think it's a
better trade-off that some kernel dev occasionally has to look at
Peff's new disambiguation output, than have the wast hordes of
everyday Git users have less screen real estate, need to recite longer
sha1s over the phone during outages (people do that), and any number
of other every day use cases.

I think if anything we should be talking about making the default
shorter & then have some clever auto-scaling by repository size as has
been discussed in this thread to deal with the repositories at the far
end of the bell curve.


Re: [PATCH v8 11/11] convert: add filter..process option

2016-09-30 Thread Lars Schneider

> On 27 Sep 2016, at 17:37, Jakub Narębski  wrote:
> 
> Part second of the review of 11/11.
> 
> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
> 
>> +
>> +if (!drv->process && (CAP_CLEAN & wanted_capability) && drv->clean)
> 
> This is just a very minor nitpicking, but wouldn't it be easier
> to read with those checks reordered?
> 
>  +if ((wanted_capability & CAP_CLEAN) && !drv->process && drv->clean)

OK


>> +
>> +if (start_command(process)) {
>> +error("cannot fork to run external filter '%s'", cmd);
>> +kill_multi_file_filter(hashmap, entry);
>> +return NULL;
>> +}
> 
> I guess there is a reason why we init hashmap entry, try to start
> external process, then kill entry of unable to start, instead of
> trying to start external process, and adding hashmap entry when
> we succeed?

Yes. This way I can reuse the kill_multi_file_filter() function.


>> +
>> +sigchain_push(SIGPIPE, SIG_IGN);
> 
> I guess that this is here to handle errors writing to filter
> by ourself, isn't it?

Yes.


>> +error("external filter '%s' does not support long running 
>> filter protocol", cmd);
> 
> We could have described the error here better.
> 
>  +error("external filter '%s' does not support filter protocol 
> version 2", cmd);

OK


>> +static void read_multi_file_filter_values(int fd, struct strbuf *status) {
> 
> This is more
> 
>  +static void read_multi_file_filter_status(int fd, struct strbuf *status) {
> 
> It doesn't read arbitrary values, it examines 'metadata' from
> filter for "status=" lines.

True!


>> +if (pair[0] && pair[0]->len && pair[1]) {
>> +if (!strcmp(pair[0]->buf, "status=")) {
>> +strbuf_reset(status);
>> +strbuf_addbuf(status, pair[1]);
>> +}
> 
> So it is last status= line wins behavior?

Correct.


> 
>> +}
> 
> Shouldn't we free 'struct strbuf **pair', maybe allocated by the
> strbuf_split_str() function, and reset to NULL?

True. strbuf_list_free() should be enough.


>> 
>> +fflush(NULL);
> 
> Why this fflush(NULL) is needed here?

This flushes all open output streams. The single filter does the same.


>> 
>> +if (fd >= 0 && !src) {
>> +if (fstat(fd, _stat) == -1)
>> +return 0;
>> +len = xsize_t(file_stat.st_size);
>> +}
> 
> Errr... is it necessary?  The protocol no longer provides size=
> hint, and neither uses such hint if provided.

We require the size in write_packetized_from_buf() later.


>> +
>> +err = strlen(filter_type) > PKTLINE_DATA_MAXLEN;
>> +if (err)
>> +goto done;
> 
> Errr... this should never happen.  We control which capabilities
> we pass, it can be only "clean" or "smudge", nothing else. Those
> would always be shorter than PKTLINE_DATA_MAXLEN.
> 
> Never mind that that is "command=smudge\n" etc. that needs to
> be shorter that PKTLINE_DATA_MAXLEN!
> 
> So, IMHO it should be at most assert, and needs to be corrected
> anyway.

OK!


> This should never happen, PATH_MAX everywhere is much shorter
> than PKTLINE_DATA_MAXLEN / LARGE_PACKET_MAX.  Or is it?
> 
> Anyway, we should probably explain or warn
> 
>   error("path name too long: '%s'", path);

OK


>> +/*
>> + * Something went wrong with the protocol filter.
>> + * Force shutdown and restart if another blob requires 
>> filtering!
> 
> Is this exclamation mark '!' here necessary?
> 

No.


Thanks,
Lars



[PATCH 6/6] tmp-objdir: do not migrate files starting with '.'

2016-09-30 Thread Jeff King
This avoids "." and "..", as we already do, but also leaves
room for index-pack to store extra data in the quarantine
area (e.g., for passing back any analysis to be read by the
pre-receive hook).

Signed-off-by: Jeff King 
---
 tmp-objdir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tmp-objdir.c b/tmp-objdir.c
index 9f53238..2181a42 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -181,7 +181,7 @@ static int read_dir_paths(struct string_list *out, const 
char *path)
return -1;
 
while ((de = readdir(dh)))
-   if (!is_dot_or_dotdot(de->d_name))
+   if (de->d_name[0] != '.')
string_list_append(out, de->d_name);
 
closedir(dh);
-- 
2.10.0.618.g82cc264


[PATCH 5/6] tmp-objdir: put quarantine information in the environment

2016-09-30 Thread Jeff King
The presence of the GIT_QUARANTINE_PATH variable lets any
called programs know that they're operating in a temporary
object directory (and where that directory is).

Signed-off-by: Jeff King 
---
 cache.h  | 1 +
 tmp-objdir.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/cache.h b/cache.h
index 607c9b5..fd81a6c 100644
--- a/cache.h
+++ b/cache.h
@@ -433,6 +433,7 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define GIT_GLOB_PATHSPECS_ENVIRONMENT "GIT_GLOB_PATHSPECS"
 #define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
+#define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
 
 /*
  * This environment variable is expected to contain a boolean indicating
diff --git a/tmp-objdir.c b/tmp-objdir.c
index c92e6cc..9f53238 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -140,6 +140,8 @@ struct tmp_objdir *tmp_objdir_create(void)
env_append(>env, ALTERNATE_DB_ENVIRONMENT,
   absolute_path(get_object_directory()));
env_replace(>env, DB_ENVIRONMENT, absolute_path(t->path.buf));
+   env_replace(>env, GIT_QUARANTINE_ENVIRONMENT,
+   absolute_path(t->path.buf));
 
return t;
 }
-- 
2.10.0.618.g82cc264



[PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts

2016-09-30 Thread Jeff King
When a client pushes objects to us, index-pack checks the
objects themselves and then installs them into place. If we
then reject the push due to a pre-receive hook, we cannot
just delete the packfile; other processes may be depending
on it. We have to do a normal reachability check at this
point via `git gc`.

But such objects may hang around for weeks due to the
gc.pruneExpire grace period. And worse, during that time
they may be exploded from the pack into inefficient loose
objects.

Instead, this patch teaches receive-pack to put the new
objects into a "quarantine" temporary directory. We make
these objects available to the connectivity check and to the
pre-receive hook, and then install them into place only if
it is successful (and otherwise remove them as tempfiles).

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c | 41 -
 t/t5547-push-quarantine.sh | 36 
 2 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100755 t/t5547-push-quarantine.sh

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 896b16f..04ad909 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -20,6 +20,7 @@
 #include "gpg-interface.h"
 #include "sigchain.h"
 #include "fsck.h"
+#include "tmp-objdir.h"
 
 static const char * const receive_pack_usage[] = {
N_("git receive-pack "),
@@ -86,6 +87,8 @@ static enum {
 } use_keepalive;
 static int keepalive_in_sec = 5;
 
+struct tmp_objdir *tmp_objdir;
+
 static enum deny_action parse_deny_action(const char *var, const char *value)
 {
if (value) {
@@ -663,6 +666,9 @@ static int run_and_feed_hook(const char *hook_name, feed_fn 
feed,
} else
argv_array_pushf(_array, "GIT_PUSH_OPTION_COUNT");
 
+   if (tmp_objdir)
+   argv_array_pushv(_array, tmp_objdir_env(tmp_objdir));
+
if (use_sideband) {
memset(, 0, sizeof(muxer));
muxer.proc = copy_to_sideband;
@@ -762,6 +768,7 @@ static int run_update_hook(struct command *cmd)
proc.stdout_to_stderr = 1;
proc.err = use_sideband ? -1 : 0;
proc.argv = argv;
+   proc.env = tmp_objdir_env(tmp_objdir);
 
code = start_command();
if (code)
@@ -833,6 +840,7 @@ static int update_shallow_ref(struct command *cmd, struct 
shallow_info *si)
!delayed_reachability_test(si, i))
sha1_array_append(, si->shallow->sha1[i]);
 
+   opt.env = tmp_objdir_env(tmp_objdir);
setup_alternate_shallow(_lock, _file, );
if (check_connected(command_singleton_iterator, cmd, )) {
rollback_lock_file(_lock);
@@ -1240,12 +1248,17 @@ static void set_connectivity_errors(struct command 
*commands,
 
for (cmd = commands; cmd; cmd = cmd->next) {
struct command *singleton = cmd;
+   struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
if (shallow_update && si->shallow_ref[cmd->index])
/* to be checked in update_shallow_ref() */
continue;
+
+   opt.env = tmp_objdir_env(tmp_objdir);
if (!check_connected(command_singleton_iterator, ,
-NULL))
+))
continue;
+
cmd->error_string = "missing necessary objects";
}
 }
@@ -1428,6 +1441,7 @@ static void execute_commands(struct command *commands,
data.si = si;
opt.err_fd = err_fd;
opt.progress = err_fd && !quiet;
+   opt.env = tmp_objdir_env(tmp_objdir);
if (check_connected(iterate_receive_command_list, , ))
set_connectivity_errors(commands, si);
 
@@ -1444,6 +1458,19 @@ static void execute_commands(struct command *commands,
return;
}
 
+   /*
+* Now we'll start writing out refs, which means the objects need
+* to be in their final positions so that other processes can see them.
+*/
+   if (tmp_objdir_migrate(tmp_objdir) < 0) {
+   for (cmd = commands; cmd; cmd = cmd->next) {
+   if (!cmd->error_string)
+   cmd->error_string = "unable to migrate objects 
to permanent storage";
+   }
+   return;
+   }
+   tmp_objdir = NULL;
+
check_aliased_updates(commands);
 
free(head_name_to_free);
@@ -1639,6 +1666,18 @@ static const char *unpack(int err_fd, struct 
shallow_info *si)
argv_array_push(, alt_shallow_file);
}
 
+   tmp_objdir = tmp_objdir_create();
+   if (!tmp_objdir)
+   return "unable to create temporary object directory";
+   child.env = tmp_objdir_env(tmp_objdir);
+
+   /*
+* Normally we just pass the tmp_objdir environment to the child
+* processes that do the heavy 

[PATCH 3/6] tmp-objdir: introduce API for temporary object directories

2016-09-30 Thread Jeff King
Once objects are added to the object database by a process,
they cannot easily be deleted, as we don't know what other
processes may have started referencing them. We have to
clean them up with git-gc, which will apply the usual
reachability and grace-period checks.

This patch provides an alternative: it helps callers create
a temporary directory inside the object directory, and a
temporary environment which can be passed to sub-programs to
ask them to write there (the original object directory
remains accessible as an alternate of the temporary one).

See tmp-objdir.h for details on the API.

Signed-off-by: Jeff King 
---
 Makefile |   1 +
 cache.h  |   1 +
 sha1_file.c  |   6 ++
 tmp-objdir.c | 266 +++
 tmp-objdir.h |  53 
 5 files changed, 327 insertions(+)
 create mode 100644 tmp-objdir.c
 create mode 100644 tmp-objdir.h

diff --git a/Makefile b/Makefile
index 1aad150..4e3becb 100644
--- a/Makefile
+++ b/Makefile
@@ -831,6 +831,7 @@ LIB_OBJS += submodule-config.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
+LIB_OBJS += tmp-objdir.o
 LIB_OBJS += trace.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport.o
diff --git a/cache.h b/cache.h
index ed3d5df..607c9b5 100644
--- a/cache.h
+++ b/cache.h
@@ -1389,6 +1389,7 @@ extern void prepare_alt_odb(void);
 extern void read_info_alternates(const char * relative_base, int depth);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 extern void add_to_alternates_file(const char *reference);
+extern void add_to_alternates_internal(const char *reference);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
 
diff --git a/sha1_file.c b/sha1_file.c
index 9a79c19..65deaf9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -421,6 +421,12 @@ void add_to_alternates_file(const char *reference)
free(alts);
 }
 
+void add_to_alternates_internal(const char *reference)
+{
+   prepare_alt_odb();
+   link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
diff --git a/tmp-objdir.c b/tmp-objdir.c
new file mode 100644
index 000..c92e6cc
--- /dev/null
+++ b/tmp-objdir.c
@@ -0,0 +1,266 @@
+#include "cache.h"
+#include "tmp-objdir.h"
+#include "dir.h"
+#include "sigchain.h"
+#include "string-list.h"
+#include "strbuf.h"
+#include "argv-array.h"
+
+struct tmp_objdir {
+   struct strbuf path;
+   struct argv_array env;
+};
+
+/*
+ * Allow only one tmp_objdir at a time in a running process, which simplifies
+ * our signal/atexit cleanup routines.  It's doubtful callers will ever need
+ * more than one, and we can expand later if so.  You can have many such
+ * tmp_objdirs simultaneously in many processes, of course.
+ */
+struct tmp_objdir *the_tmp_objdir;
+
+static void tmp_objdir_free(struct tmp_objdir *t)
+{
+   strbuf_release(>path);
+   argv_array_clear(>env);
+   free(t);
+}
+
+static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
+{
+   int err;
+
+   if (!t)
+   return 0;
+
+   if (t == the_tmp_objdir)
+   the_tmp_objdir = NULL;
+
+   /*
+* This may use malloc via strbuf_grow(), but we should
+* have pre-grown t->path sufficiently so that this
+* doesn't happen in practice.
+*/
+   err = remove_dir_recursively(>path, 0);
+
+   /*
+* When we are cleaning up due to a signal, we won't bother
+* freeing memory; it may cause a deadlock if the signal
+* arrived while libc's allocator lock is held.
+*/
+   if (!on_signal)
+   tmp_objdir_free(t);
+   return err;
+}
+
+int tmp_objdir_destroy(struct tmp_objdir *t)
+{
+   return tmp_objdir_destroy_1(t, 0);
+}
+
+static void remove_tmp_objdir(void)
+{
+   tmp_objdir_destroy(the_tmp_objdir);
+}
+
+static void remove_tmp_objdir_on_signal(int signo)
+{
+   tmp_objdir_destroy_1(the_tmp_objdir, 1);
+   sigchain_pop(signo);
+   raise(signo);
+}
+
+static void env_append(struct argv_array *env, const char *key, const char 
*val)
+{
+   const char *old = getenv(key);
+
+   if (!old)
+   argv_array_pushf(env, "%s=%s", key, val);
+   else
+   argv_array_pushf(env, "%s=%s%c%s", key, old, PATH_SEP, val);
+}
+
+static void env_replace(struct argv_array *env, const char *key, const char 
*val)
+{
+   argv_array_pushf(env, "%s=%s", key, val);
+}
+
+static int setup_tmp_objdir(const char *root)
+{
+   char *path;
+   int ret = 0;
+
+   path = xstrfmt("%s/pack", root);
+   ret = mkdir(path, 0777);
+   free(path);
+
+   return ret;
+}
+
+struct tmp_objdir *tmp_objdir_create(void)
+{
+   static int installed_handlers;
+   struct 

[PATCH 2/6] sha1_file: always allow relative paths to alternates

2016-09-30 Thread Jeff King
We recursively expand alternates repositories, so that if A
borrows from B which borrows from C, A can see all objects.

For the root object database, we allow relative paths, so A
can point to B as "../B/objects". However, we currently do
not allow relative paths when recursing, so B must use an
absolute path to reach C.

That is an ancient protection from c2f493a (Transitively
read alternatives, 2006-05-07) that tries to avoid adding
the same alternate through two different paths. But since
5bdf0a8 (sha1_file: normalize alt_odb path before comparing
and storing, 2011-09-07), we use a normalized absolute path
for each alt_odb entry.

So this protection is no longer necessary; we will detect
the duplicate no matter how we got there.  And it's a good
idea to get rid of it, as it creates an unnecessary
complication when setting up recursive alternates (B has to
know that A is going to borrow from it and make sure to use
an absolute path).

We adjust the test script here to demonstrate that this now
works. Unfortunately, we can't demonstrate that the
duplicate is suppressed, since it has no user-visible
behavior (it's just one less place for our object lookups to
go). But you can verify it manually via gdb, with something
like:

for i in a b c; do
git init --bare $i
blob=$(echo $i | git -C $i hash-object -w --stdin)
done
echo "../../b/objects" >a/objects/info/alternates
echo "../../c/objects" >>a/objects/info/alternates
echo "../../c/objects" >b/objects/info/alternates
gdb --args git cat-file -e $blob

After prepare_alt_odb() runs, we have only a single copy of
"/path/to/c/objects/" in the alt_odb list.

Signed-off-by: Jeff King 
---
 sha1_file.c   | 7 +--
 t/t5613-info-alternate.sh | 4 ++--
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..9a79c19 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -343,12 +343,7 @@ static void link_alt_odb_entries(const char *alt, int len, 
int sep,
const char *entry = entries.items[i].string;
if (entry[0] == '\0' || entry[0] == '#')
continue;
-   if (!is_absolute_path(entry) && depth) {
-   error("%s: ignoring relative alternate object store %s",
-   relative_base, entry);
-   } else {
-   link_alt_odb_entry(entry, relative_base, depth, 
objdirbuf.buf);
-   }
+   link_alt_odb_entry(entry, relative_base, depth, objdirbuf.buf);
}
string_list_clear(, 0);
free(alt_copy);
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 9cd2626..b429707 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -102,9 +102,9 @@ test_valid_repo'
 cd "$base_dir"
 
 test_expect_success \
-'that relative alternate is only possible for current dir' '
+'that relative alternate is recursive' '
 cd D &&
-! (test_valid_repo)
+test_valid_repo
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264



[PATCH 1/6] check_connected: accept an env argument

2016-09-30 Thread Jeff King
This lets callers influence the environment seen by
rev-list, which will be useful when we start providing
quarantined objects.

Signed-off-by: Jeff King 
---
 connected.c | 1 +
 connected.h | 5 +
 2 files changed, 6 insertions(+)

diff --git a/connected.c b/connected.c
index 8e3e4b1..136c2ac 100644
--- a/connected.c
+++ b/connected.c
@@ -63,6 +63,7 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 _("Checking connectivity"));
 
rev_list.git_cmd = 1;
+   rev_list.env = opt->env;
rev_list.in = -1;
rev_list.no_stdout = 1;
if (opt->err_fd)
diff --git a/connected.h b/connected.h
index afa48cc..4ca325f 100644
--- a/connected.h
+++ b/connected.h
@@ -33,6 +33,11 @@ struct check_connected_options {
 
/* If non-zero, show progress as we traverse the objects. */
int progress;
+
+   /*
+* Insert these variables into the environment of the child process.
+*/
+   const char **env;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
-- 
2.10.0.618.g82cc264



[PATCH 0/6] receive-pack: quarantine pushed objects

2016-09-30 Thread Jeff King
I've mentioned before on the list that GitHub "quarantines" objects
while the pre-receive hook runs. Here are the patches to implement
that.

The basic problem is that as-is, index-pack admits pushed objects into
the main object database immediately, before the pre-receive hook runs.
It _has_ to, since the hook needs to be able to actually look at the
objects. However, this means that if the pre-receive hook rejects the
push, we still end up with the objects in the repository. We can't just
delete them as temporary files, because we don't know what other
processes might have started referencing them.

The solution here is to push into a "quarantine" directory that is
accessible only to pre-receive, check_connected(), etc, and only
move the objects into the main object database after we've finished
those basic checks.

One of the things we use it for at GitHub is object-size policy, which
we implement via a pre-receive hook (sort of; see below). This scheme
has been in use for about 2 years, though I did do a fair bit of
tweaking to make it ready for upstream (squashing bugfixes and merges
from upstream that came later, along with polishing a few rough edges I
saw while doing so). So I may have introduced new bugs. :)

The patches are:

  [1/6]: check_connected: accept an env argument
  [2/6]: sha1_file: always allow relative paths to alternates

These two are preparatory.

  [3/6]: tmp-objdir: introduce API for temporary object directories
  [4/6]: receive-pack: quarantine objects until pre-receive accepts

This is the interesting part.

  [5/6]: tmp-objdir: put quarantine information in the environment
  [6/6]: tmp-objdir: do not migrate files starting with '.'

These are two changes that I ended up doing later to support another
series. They're not strictly necessary here, but I think they're
worth including now, as they change the visible behavior in minor
ways. It seems like a good idea to start with what I think should be
the final behavior.

The other series is basically an optimization for the object-size
policy. Without it, you are stuck walking the graph again in the
pre-receive hook to find the new objects and check their sizes.

But index-pack can do that for you very cheaply; it has the size of
each object already. But it _doesn't_ produce nice error messages;
it has no idea at what path the objects are found, and it doesn't
know what kind of advice it should give the user.

So what we can do is ask index-pack to make a note of any objects
larger than N bytes, and write their sha1 and size into a file in
the quarantine path. Then the pre-receive hook can look in that log
and generate any nice message it wants. In the common case, the log
is empty, and it does not have to do any work at all.

These two patches set that up by letting index-pack and pre-receive
know that quarantine path and use it to store arbitrary files that
_don't_ get migrated to the main object database (i.e., the log file
mentioned above).

-Peff


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Junio C Hamano
Jonathan Tan  writes:

>> I vaguely recall that there were some discussion on the definition
>> of "what's a trailer line" with folks from the kernel land, perhaps
>> while discussing the interpret-trailers topic.  IIRC, when somebody
>> passes an improved version along, the resulting message's trailer
>> block may look like this:
>>
>> Signed-off-by: Original Author 
>> [fixed typo in the variable names]
>> Signed-off-by: Somebhody Else 
>>
>> and an obvious "wish" of theirs was to treat not just RFC2822-like
>> "a line that begins with token followed by a colon" but also these
>> short comments as part of the trailer block.  Your original wish in
>> [*1*] is to also treat "a line that begin with a whitespace that
>> follows a line that begins with token followed by a colon" as part
>> of the trailer block and I personally think that is a reasonable
>> thing to wish for, too.
>
> If we allowed arbitrary lines in the trailer block, this would solve
> my original problem, yes.

OK.

> Looking at that, it seems that sequencer.c started interpreting the
> last paragraph of the commit message as a footer and adding an
> exception for "cherry picked from" in commit b971e04 ("sequencer.c:
> always separate "(cherry picked from" from commit body",
> 2013-02-12). So the interpretations of sequencer.c and
> interpret-trailers were already divergent, but I should have probably
> at least discussed that.

It is not too late to discuss it.  I still think it is a good longer
term plan to try to unify the definition of what a trailer block is
and the implementation of the code that determines the boundary
between the log message proper and the trailer block and that allows
us to manipulate the trailer block, that currently is scattered
across multiple places into one.  Historically, "commit -s" had one
(because it needed to decide if it needs to see if the last sign-off
is already the one it is adding, and to decide if a blank line is
needed before the sing-off being added), "am -s" had another, and
"cherry-pick" probably had one, too.  "interpret-trailers" was, at
least originally, envisioned as an effort to develop a unified
machinery that can be called from these codepaths, and to aid the
development and encourage its use, it also had its own end-user
facing command.  Your interest in the "trailer" topic may be a good
trigger for us to further that original vision.

> As for a reason:
>
> 1) I do not have a specific reason for placing it in that exact
> position, but I would like to be able to place the "cherry picked
> from" line without affecting the last paragraph (specifically, without
> making the "cherry picked from" line the only line in the last
> paragraph).
> ...
> 1a) (Avoiding the footer might also be a good way of more clearly
> defining what the footer is. For example, currently, "cherry picked
> from" is treated as a special case in sequencer.c but not in
> trailer.c, as far as I can tell. If we consistently avoided the
> footer, we wouldn't need such a special case anywhere.)

That is one of the numerous shortcomings of the "interpret-trailers"
that is still not finished, I would say.

> 2) The Linux kernel's repository has some "commit ... upstream." lines
> in this position (below the commit title) - for example, in commit
> dacc0987fd2e.

"A group of people seem to prefer it there" does not lead to
"therefore let's move it there for everybody".  It does open a
possibility that we may want to add a new option to put it there,
but does not justify changing what existing "-x" option does.


Re: [PATCH] diff_unique_abbrev(): document its assumtion and limitation

2016-09-30 Thread Junio C Hamano
Jeff King  writes:

> ... Now that function _would_
> want to be updated as a result of the other conversation (it would need
> to do something sensible with "-1", like turning it into "7", or
> whatever else is deemed reasonable outside of a repository).
>
> Anyway. I just wonder if you want to give it a better name while you are
> at it.

I'd say the patch to introduce the new function that makes the old
name potentially confusing is a good one to do the rename.  Until
then I do not think there is no need to rename the existing one ;-)

Related tangent about "like turning it into", I am thinking adding
something like this as a preparatory step to Linus's auto-sizing
serires.  That way, we do not have to spell "7"

Having to spell FALLBACK_DEFAULT_ABBREV over and over again would be
more irritating than having to spell "7" often, but I think it would
be a sign of a deeper problem if it turns out we have to repeat this
constant in many places, so an irritatingly long name may serve as a
canary in the coalmine ;-)

-- >8 --
Subject: abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing

We'll be introducing a new way to decide the default abbreviation
length by initialising DEFAULT_ABBREV to -1 to signal the first call
to "find unique abbreviation" codepath to compute a reasonable value
based on the number of objects we have to avoid collisions.

We have long relied on DEFAULT_ABBREV being a positive concrete
value that is used as the abbreviation length when no extra
configuration or command line option has overridden it.  Some
codepaths wants to use such a positive concrete default value
even before making their first request to actually trigger the
computation for the auto sized default.

Introduce FALLBACK_DEFAULT_ABBREV and use it to the code that
attempts to align the report from "git fetch".  For now, this
macro is also used to initialize the default_abbrev variable,
but the auto-sizing code will use -1 and then use the value of
FALLBACK_DEFAULT_ABBREV as the starting point of auto-sizing.

Signed-off-by: Junio C Hamano 
---

 builtin/fetch.c | 3 +++
 cache.h | 3 +++
 environment.c   | 2 +-
 transport.h | 3 +--
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e4639d8eb1..5d6994d8e7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -16,6 +16,9 @@
 #include "connected.h"
 #include "argv-array.h"
 
+#define TRANSPORT_SUMMARY(x) \
+   (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
+
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
N_("git fetch [] "),
diff --git a/cache.h b/cache.h
index 4ff196c259..677554c59f 100644
--- a/cache.h
+++ b/cache.h
@@ -1133,6 +1133,9 @@ static inline unsigned int hexval(unsigned char c)
 #define MINIMUM_ABBREV minimum_abbrev
 #define DEFAULT_ABBREV default_abbrev
 
+/* used when the code does not know or care what the default abbrev is */
+#define FALLBACK_DEFAULT_ABBREV 7
+
 struct object_context {
unsigned char tree[20];
char path[PATH_MAX];
diff --git a/environment.c b/environment.c
index 96160a75a5..c8860f722d 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/transport.h b/transport.h
index c68140892c..ea25e42317 100644
--- a/transport.h
+++ b/transport.h
@@ -135,8 +135,7 @@ struct transport {
 #define TRANSPORT_PUSH_CERT_IF_ASKED 4096
 #define TRANSPORT_PUSH_ATOMIC 8192
 
-#define TRANSPORT_SUMMARY_WIDTH (2 * DEFAULT_ABBREV + 3)
-#define TRANSPORT_SUMMARY(x) (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - 
gettext_width(x)), (x)
+#define TRANSPORT_SUMMARY_WIDTH (2 * FALLBACK_DEFAULT_ABBREV + 3)
 
 /* Returns a transport suitable for the url */
 struct transport *transport_get(struct remote *, const char *);


Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Junio C Hamano
Linus Torvalds  writes:

> Considering that TRANSPORT_SUMMARY and TRANSPORT_SUMMARY_WIDTH are
> both used in exactly one place each, I'd suggest getting rid of that
> crazy macro, and just expanding it in those places to avoid these
> kinds of crazy "hiding variables inside complex defines thning".
>
> And maybe just deciding to hardcode TRANSPORT_SUMMARY_WIDTH to 17
> (which was it's original default value and presumably is what the test
> is effectively hardcoded for too), and avoiding that complexity
> entirely.

For all fairness, when the WIDTH thing was introduced, there were
two places that needed reference it at f1863d0d16 ("refactor
duplicated code in builtin-send-pack.c and transport.c",
2010-02-16).  But that is no longer the case, and it makes sense to
hardcode it as 17 (or something derived from a symbolic constant
that gives the new "default to default").

What TRANSPORT_SUMMARY() does is even more crazy and it really
shouldn't be exposed as a public interface.  Let's move it to its
single calling place.


Re: [PATCH v8 11/11] convert: add filter..process option

2016-09-30 Thread Lars Schneider

> On 27 Sep 2016, at 00:41, Jakub Narębski  wrote:
> 
> Part first of the review of 11/11.
> 
> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider 
>> 
>> diff --git a/Documentation/gitattributes.txt 
>> b/Documentation/gitattributes.txt
>> index 7aff940..946dcad 100644
>> --- a/Documentation/gitattributes.txt
>> +++ b/Documentation/gitattributes.txt
>> @@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the 
>> command is
>> fed the blob object from its standard input, and its standard
>> output is used to update the worktree file.  Similarly, the
>> `clean` command is used to convert the contents of worktree file
>> -upon checkin.
>> +upon checkin. By default these commands process only a single
>> +blob and terminate.  If a long running `process` filter is used
>   
> 
> Should we use this terminology here?  I have not read the preceding
> part of documentation, so I don't know if it talks about "blobs" or
> if it uses "files" and/or "file contents".

I used that because it was used in the paragraph above already.


>> +Long Running Filter Process
>> +^^^
>> +
>> +If the filter command (a string value) is defined via
>> +`filter..process` then Git can process all blobs with a
>> +single filter invocation for the entire life of a single Git
>> +command. This is achieved by using a packet format (pkt-line,
>> +see technical/protocol-common.txt) based protocol over standard
>> +input and standard output as follows. All packets are considered
>> +text and therefore are terminated by an LF. Exceptions are the
>> +"*CONTENT" packets and the flush packet.
> 
> I guess that reasoning here is that all but CONTENT packets are
> metadata, and thus to aid debuggability of the protocol are "text",
> as considered by pkt-line.
> 
> Perhaps a bit more readable would be the following (but current is
> just fine; I am nitpicking):
> 
>  All packets, except for the "{star}CONTENT" packets and the ""
>  flush packer, are considered text and therefore are terminated by
>  a LF.

OK, I use that!


> I think it might be a good idea to describe what flush packet is
> somewhere in this document; on the other hand referring (especially
> if hyperlinked) to pkt-line technical documentation might be good
> enough / better.  I'm unsure, but I tend on the side that referring
> to technical documentation is better.

I have this line in the first paragraph of the Long Running Filter process:
"packet format (pkt-line, see technical/protocol-common.txt) based protocol"

> 
>> +to read a welcome response message ("git-filter-server") and exactly
>> +one protocol version number from the previously sent list. All further
> 
> I guess that is to provide forward-compatibility, isn't it?  Also,
> "Git expects..." probably means filter process MUST send, in the
> RFC2119 (https://tools.ietf.org/html/rfc2119) meaning.

True. I feel "expects" reads better but I am happy to change it if
you feel strong about it.


>> +
>> +After the version negotiation Git sends a list of supported capabilities
>> +and a flush packet.
> 
> Is it that Git SHOULD send list of ALL supported capabilities, or is
> it that Git SHOULD NOT send capabilities it does not support, and that
> it MAY send only those capabilities it needs (so for example if command
> uses only `smudge`, it may not send `clean`, so that filter driver doesn't
> need to initialize data it would not need).

"After the version negotiation Git sends a list of all capabilities that
it supports and a flush packet."

Better?


> I wonder why it is "=true", and not "capability=".
> Is there a case where we would want to send "=false".  Or
> is it to allow configurable / value based capabilities?  Isn't it going
> a bit too far: is there even a hind of an idea for parametrize-able
> capability? YAGNI is a thing...

Peff suggested that format and I think it is OK:
http://public-inbox.org/git/20160803224619.bwtbvmslhuicx...@sigill.intra.peff.net/


> A few new capabilities that we might want to support in the near future
> is "size", "stream", which are options describing how to communicate,
> and "cleanFromFile", "smudgeToFile", which are new types of operations...
> but neither needs any parameter.
> 
> I guess that adding new capabilities doesn't require having to come up
> with the new version of the protocol, isn't it.

Correct.


>> +packet:  git< git-filter-server
>> +packet:  git< version=2
>> +packet:  git> clean=true
>> +packet:  git> smudge=true
>> +packet:  git> not-yet-invented=true
> 
> Hmmm... should we hint at the use of kebab-case versus snake_case
> or camelCase for new capabilities?

I personally prefer kebab-case but I think that is a discussion for
future contributions ;-)


>> +
>> +packet:  git> command=smudge
>> +packet:  git> pathname=path/testfile.dat
>> +packet:  

Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Linus Torvalds
On Fri, Sep 30, 2016 at 11:40 AM, Junio C Hamano  wrote:
>
> There is another instance buried deep in an obscure macro.  A
> minimum fix may look like this, but I really hope somebody else
> finds a better approach.

Heh. Yeah, that's just ugly. I assume this is why the odd git fetch
pretty-printing test was off by one column..

Considering that TRANSPORT_SUMMARY and TRANSPORT_SUMMARY_WIDTH are
both used in exactly one place each, I'd suggest getting rid of that
crazy macro, and just expanding it in those places to avoid these
kinds of crazy "hiding variables inside complex defines thning".

And maybe just deciding to hardcode TRANSPORT_SUMMARY_WIDTH to 17
(which was it's original default value and presumably is what the test
is effectively hardcoded for too), and avoiding that complexity
entirely.

Linus


Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Junio C Hamano
Junio C Hamano  writes:

> From: Junio C Hamano 
> Date: Thu, 29 Sep 2016 21:19:20 -0700
> Subject: [PATCH] abbrev: adjust to the new world order
>
> The default_abbrev used to be a concrete value usable as the default
> abbreviation length.  The code that sets custom abbreviation length,
> in response to command line argument, often did something like:
>
>   if (skip_prefix(arg, "--abbrev=", ))
>   abbrev = atoi(arg);
>   else if (!strcmp("--abbrev", ))
>   abbrev = DEFAULT_ABBREV;
>   /* make the value sane */
>   if (abbrev < 0 || 40 < abbrev)
>   abbrev = ... some sane value ...
>
> The new world order however is that the default_abbrev is a negative
> value that signals find_unique_abbrev() that it needs to dynamically
> find out a good default value.  We shouldn't coerce a negative value
> into a random positive value like the above sample code.
>
> Signed-off-by: Junio C Hamano 

There is another instance buried deep in an obscure macro.  A
minimum fix may look like this, but I really hope somebody else
finds a better approach.  Peff alluded to "when it is still -1
substituting it with a reasonable value like 7" in a separate
thread, and we probably would want a way to allow accessing that
"reasonable value like 7" without triggering auto sizing logic
too early.

With this and the patch in the message I am responding to, your
patch from the last night seems to pass all the tests for me.

 transport.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/transport.h b/transport.h
index 6fe3485325..8a96e22bb0 100644
--- a/transport.h
+++ b/transport.h
@@ -142,7 +142,7 @@ struct transport {
 #define TRANSPORT_PUSH_ATOMIC 8192
 #define TRANSPORT_PUSH_OPTIONS 16384
 
-#define TRANSPORT_SUMMARY_WIDTH (2 * DEFAULT_ABBREV + 3)
+#define TRANSPORT_SUMMARY_WIDTH (2 * (DEFAULT_ABBREV < 0 ? 7 : DEFAULT_ABBREV) 
+ 3)
 #define TRANSPORT_SUMMARY(x) (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - 
gettext_width(x)), (x)
 
 /* Returns a transport suitable for the url */


Re: [RFC/PATCH 0/2] place cherry pick line below commit title

2016-09-30 Thread Jonathan Tan

On 09/29/2016 02:56 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


This is somewhat of a follow-up to my previous e-mail with subject
"[PATCH] sequencer: support folding in rfc2822 footer" [1], in which I
proposed relaxing the definition of a commit message footer to allow
multiple-line field bodies (as described in RFC2822), but its strictness
was deemed deliberate.


It does not necessarily mean we can never change it when we did
something deliberately, though.  With a good enough justification,
and with a transitition plan if the backward incompatibility is
severe enough to warrant one, we can change things.

I vaguely recall that there were some discussion on the definition
of "what's a trailer line" with folks from the kernel land, perhaps
while discussing the interpret-trailers topic.  IIRC, when somebody
passes an improved version along, the resulting message's trailer
block may look like this:

Signed-off-by: Original Author 
[fixed typo in the variable names]
Signed-off-by: Somebhody Else 

and an obvious "wish" of theirs was to treat not just RFC2822-like
"a line that begins with token followed by a colon" but also these
short comments as part of the trailer block.  Your original wish in
[*1*] is to also treat "a line that begin with a whitespace that
follows a line that begins with token followed by a colon" as part
of the trailer block and I personally think that is a reasonable
thing to wish for, too.


If we allowed arbitrary lines in the trailer block, this would solve my 
original problem, yes.



I recall that I was somewhat surprised and dissapointed to see no
change to interpret-trailers when you tried [*1*], which was really
about improving the definition of what the trailer block is, by the
way.


Sorry, I had missed that.

Looking at that, it seems that sequencer.c started interpreting the last 
paragraph of the commit message as a footer and adding an exception for 
"cherry picked from" in commit b971e04 ("sequencer.c: always separate 
"(cherry picked from" from commit body", 2013-02-12). So the 
interpretations of sequencer.c and interpret-trailers were already 
divergent, but I should have probably at least discussed that.



Below is a patch set that allows placing the "cherry picked from" line
without taking into account the definition of a commit message footer.
For example, "git cherry-pick -x" (with the appropriate configuration
variable or argument) would, to this commit message:

  commit title

  This is an explanatory paragraph.

  Footer: foo

place the "(cherry picked from ...)" line below "commit title".

Would this be better?


It is not immediately obvious what such a change buys us.  Wouldn't
the current code place that line below "Footer: foo"?  I cannot
think of any reason why anybody would want to place "cherry-picked
from" immediately below the title and before the first line of the
body.


Yes, the current code would place it below "Footer: foo" without a blank 
line before it, but if it thinks that the so-called footer is not 
actually a footer, it would insert a blank line before that line.


As for a reason:

1) I do not have a specific reason for placing it in that exact 
position, but I would like to be able to place the "cherry picked from" 
line without affecting the last paragraph (specifically, without making 
the "cherry picked from" line the only line in the last paragraph). It 
seems to me that placing it below the title was the most straightforward 
way to do that - this way, Git can have its own idea of what a footer 
constitutes, and the user can treat it as completely separate from the 
"cherry picked from" line mechanism.


1a) (Avoiding the footer might also be a good way of more clearly 
defining what the footer is. For example, currently, "cherry picked 
from" is treated as a special case in sequencer.c but not in trailer.c, 
as far as I can tell. If we consistently avoided the footer, we wouldn't 
need such a special case anywhere.)


2) The Linux kernel's repository has some "commit ... upstream." lines 
in this position (below the commit title) - for example, in commit 
dacc0987fd2e.


[1] 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/dacc0987fd2e25a8b4b8c19778862ba12ce76d0a


Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Linus Torvalds
On Fri, Sep 30, 2016 at 10:54 AM, Linus Torvalds
 wrote:
>
>> So IMHO, the best combination is the init_default_abbrev() you posted in
>> [1], but initialized at the top of find_unique_abbrev(). And cached
>> there, obviously, in a similar way.
>
> That's certainly possible, but I'm really not happy with how the
> counting function looks.  And nobody actually stood up to say "yeah,
> that gets alternate loose objects right" or "if you have tons of those
> alternate loose objects you have other issues anyway". I think
> somebody would have to "own" that counting function, the advantage of
> just putting it into disambiguate_state is that we just get the
> counting for free..

Side note: maybe we can mix the two approaches, and keep the counting
in the disambiguation state, and just make the counting function do

init_object_disambiguation();
find_short_object_filename();
find_short_packed_object();
finish_object_disambiguation(, sha1);

and then just use "ds.nrobjects". So the counting would still be done
by the disambiguation code, it just woudln't be in get_short_sha1().

So here's another version that takes that approach. And if somebody
(hint hint) wants to do the counting differently, they can perhaps
send an incremental patch to do that.

(This patch also contains the few setup issues Junio found with the
new "default_abbrev is negative" model)

  Linus
 builtin/rev-parse.c |  5 +++--
 diff.c  |  2 +-
 environment.c   |  2 +-
 sha1_name.c | 39 ++-
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 4da1f1da2..cfb0f1510 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -671,8 +671,9 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
filter &= ~(DO_FLAGS|DO_NOREV);
verify = 1;
abbrev = DEFAULT_ABBREV;
-   if (arg[7] == '=')
-   abbrev = strtoul(arg + 8, NULL, 10);
+   if (!arg[7])
+   continue;
+   abbrev = strtoul(arg + 8, NULL, 10);
if (abbrev < MINIMUM_ABBREV)
abbrev = MINIMUM_ABBREV;
else if (40 <= abbrev)
diff --git a/diff.c b/diff.c
index 59920747d..c6d445915 100644
--- a/diff.c
+++ b/diff.c
@@ -3421,7 +3421,7 @@ void diff_setup_done(struct diff_options *options)
 */
read_cache();
}
-   if (options->abbrev <= 0 || 40 < options->abbrev)
+   if (options->abbrev > 40)
options->abbrev = 40; /* full */
 
/*
diff --git a/environment.c b/environment.c
index c1442df9a..fd6681e46 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd7c..684b36dba 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, 
void *);
 
 struct disambiguate_state {
int len; /* length of prefix in hex chars */
+   unsigned int nrobjects;
char hex_pfx[GIT_SHA1_HEXSZ + 1];
unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,12 @@ static void find_short_object_filename(struct 
disambiguate_state *ds)
 
if (strlen(de->d_name) != 38)
continue;
+
+   // We only look at the one subdirectory, and we assume
+   // each subdirectory is roughly similar, so each object
+   // we find probably has 255 other objects in the other
+   // fan-out directories
+   ds->nrobjects += 256;
if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
continue;
memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p,
 
open_pack_index(p);
num = p->num_objects;
+   ds->nrobjects += num;
last = num;
while (first < last) {
uint32_t mid = (first + last) / 2;
@@ -455,17 +463,46 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn 
fn, void *cb_data)
return ret;
 }
 
+static int get_automatic_abbrev(const char *hex)
+{
+   static int len;
+   struct disambiguate_state ds;
+
+   if (init_object_disambiguation(hex, 7, ) < 0)
+   return 7;
+
+   

Re: [PATCH] diff_unique_abbrev(): document its assumtion and limitation

2016-09-30 Thread Jeff King
On Fri, Sep 30, 2016 at 10:54:53AM -0700, Junio C Hamano wrote:

> This function is used to add "..." to displayed object names in
> "diff --raw --abbrev[=]" output.  It bases its behaviour on an
> untold assumption that the abbreviation length requested by the
> caller is "reasonble", i.e. most of the objects will abbreviate
> within the requested length and the resulting length would never
> exceed it by more than a few hexdigits (otherwise the resulting
> columns would not align).  Explain that in a comment.

Heh, I have actually have a similar patch that renames it to
diff_aligned_abbrev(). Because I wanted to add another function:

  static const char *diff_abbrev_oid(const struct object_id *oid,
 int abbrev)
  {
if (startup_info->have-repository)
return find_unique_abbrev(oid->hash, abbrev);
else {
char *hex = oid_to_hex(oid);
if (abbrev < 0) || abbrev > GIT_SHA1_HEXSZ)
die("BUG: oid abbreviation out of range: %d", abbrev);
hex[abbrev] = '\0';
return hex;
}
  }

and I didn't want people to confuse the two. Now that function _would_
want to be updated as a result of the other conversation (it would need
to do something sensible with "-1", like turning it into "7", or
whatever else is deemed reasonable outside of a repository).

Anyway. I just wonder if you want to give it a better name while you are
at it.

-Peff


Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Jeff King
On Fri, Sep 30, 2016 at 10:54:16AM -0700, Linus Torvalds wrote:

> On Fri, Sep 30, 2016 at 1:06 AM, Jeff King  wrote:
> >
> > I agree that this deals with the performance concerns by caching the
> > default_abbrev_len and starting there. I still think it's unnecessarily
> > invasive to touch get_short_sha1() at all, which is otherwise only a
> > reading function.
> 
> So the reason that d oesn't work is that the "disambiguate_state" data
> where we keep the number of objects is only visible within
> get_short_sha1().
> 
> So outside that function, you don't have any sane way to figure out
> how many objects. So then you have to do the extra counting function..

Right. I think you should do the extra counting function. It's a few
more lines, but the design is way less tangled.

> > So IMHO, the best combination is the init_default_abbrev() you posted in
> > [1], but initialized at the top of find_unique_abbrev(). And cached
> > there, obviously, in a similar way.
> 
> That's certainly possible, but I'm really not happy with how the
> counting function looks.  And nobody actually stood up to say "yeah,
> that gets alternate loose objects right" or "if you have tons of those
> alternate loose objects you have other issues anyway". I think
> somebody would have to "own" that counting function, the advantage of
> just putting it into disambiguate_state is that we just get the
> counting for free..

I don't think you _need_ get the alternate loose objects right. In fact,
I don't think you need to care about loose objects at all. For the
scales we're talking about, they're a rounding error. I would have done
it like this:

diff --git a/sha1_file.c b/sha1_file.c
index 65deaf9..1845502 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1382,6 +1382,32 @@ static void prepare_packed_git_one(char *objdir, int 
local)
strbuf_release();
 }
 
+static int approximate_object_count_valid;
+
+/*
+ * Give a fast, rough count of the number of objects in the repository. This
+ * ignores loose objects completely. If you have a lot of them, then either
+ * you should repack because your performance will be awful, or they are
+ * all unreachable objects about to be pruned, in which case they're not really
+ * interesting as a measure of repo size in the first place.
+ */
+unsigned long approximate_object_count(void)
+{
+   static unsigned long count;
+   if (!approximate_object_count_valid) {
+   struct packed_git *p;
+
+   prepare_packed_git();
+   count = 0;
+   for (p = packed_git; p; p = p->next) {
+   if (open_pack_index(p))
+   continue;
+   count += p->num_objects;
+   }
+   }
+   return count;
+}
+
 static void *get_next_packed_git(const void *p)
 {
return ((const struct packed_git *)p)->next;
@@ -1456,6 +1482,7 @@ void prepare_packed_git(void)
 
 void reprepare_packed_git(void)
 {
+   approximate_object_count_valid = 0;
prepare_packed_git_run_once = 0;
prepare_packed_git();
 }


Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Junio C Hamano
Jeff King  writes:

> I agree that this deals with the performance concerns by caching the
> default_abbrev_len and starting there. I still think it's unnecessarily
> invasive to touch get_short_sha1() at all, which is otherwise only a
> reading function.
>
> So IMHO, the best combination is the init_default_abbrev() you posted in
> [1], but initialized at the top of find_unique_abbrev(). And cached
> there, obviously, in a similar way.

Hmm. I am undecided; both approaches look OK to me.



[PATCH] diff_unique_abbrev(): document its assumtion and limitation

2016-09-30 Thread Junio C Hamano
This function is used to add "..." to displayed object names in
"diff --raw --abbrev[=]" output.  It bases its behaviour on an
untold assumption that the abbreviation length requested by the
caller is "reasonble", i.e. most of the objects will abbreviate
within the requested length and the resulting length would never
exceed it by more than a few hexdigits (otherwise the resulting
columns would not align).  Explain that in a comment.

Signed-off-by: Junio C Hamano 
---

 * I had to scratch my head wondering what impact Linus's
   auto-abbrev change will have on this code, which I wrote many
   years ago in 47dd0d59 ("diff: --abbrev option", 2005-12-13).

 diff.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index cefc13eb8e..428ed4f4c9 100644
--- a/diff.c
+++ b/diff.c
@@ -4108,7 +4108,8 @@ void diff_free_filepair(struct diff_filepair *p)
free(p);
 }
 
-/* This is different from find_unique_abbrev() in that
+/*
+ * This is different from find_unique_abbrev() in that
  * it stuffs the result with dots for alignment.
  */
 const char *diff_unique_abbrev(const unsigned char *sha1, int len)
@@ -4120,6 +4121,26 @@ const char *diff_unique_abbrev(const unsigned char 
*sha1, int len)
 
abbrev = find_unique_abbrev(sha1, len);
abblen = strlen(abbrev);
+
+   /*
+* In well-behaved cases, where the abbbreviated result is the
+* same as the requested length, append three dots after the
+* abbreviation (hence the whole logic is limited to the case
+* where abblen < 37); when the actual abbreviated result is a
+* bit longer than the requested length, we reduce the number
+* of dots so that they match the well-behaved ones.  However,
+* if the actual abbreviation is longer than the requested
+* length by more than three, we give up on aligning, and add
+* three dots anyway, to indicate that the output is not the
+* full object name.  Yes, this may be suboptimal, but this
+* appears only in "diff --raw --abbrev" output and it is not
+* worth the effort to change it now.  Note that this would
+* likely to work fine when the automatic sizing of default
+* abbreviation length is used--we would be fed -1 in "len" in
+* that case, and will end up always appending three-dots, but
+* the automatic sizing is supposed to give abblen that ensures
+* uniqueness across all objects (statistically speaking).
+*/
if (abblen < 37) {
static char hex[41];
if (len < abblen && abblen <= len + 2)
-- 
2.10.0-612-g22341905f2



Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Linus Torvalds
On Fri, Sep 30, 2016 at 1:06 AM, Jeff King  wrote:
>
> I agree that this deals with the performance concerns by caching the
> default_abbrev_len and starting there. I still think it's unnecessarily
> invasive to touch get_short_sha1() at all, which is otherwise only a
> reading function.

So the reason that d oesn't work is that the "disambiguate_state" data
where we keep the number of objects is only visible within
get_short_sha1().

So outside that function, you don't have any sane way to figure out
how many objects. So then you have to do the extra counting function..

> So IMHO, the best combination is the init_default_abbrev() you posted in
> [1], but initialized at the top of find_unique_abbrev(). And cached
> there, obviously, in a similar way.

That's certainly possible, but I'm really not happy with how the
counting function looks.  And nobody actually stood up to say "yeah,
that gets alternate loose objects right" or "if you have tons of those
alternate loose objects you have other issues anyway". I think
somebody would have to "own" that counting function, the advantage of
just putting it into disambiguate_state is that we just get the
counting for free..

 Linus


Re: [PATCH v2 03/11] i18n: add--interactive: mark strings with interpolation for translation

2016-09-30 Thread Jakub Narębski
W dniu 31.08.2016 o 14:31, Vasco Almeida pisze:

> Use of sprintf following die or error_msg is necessary for placeholder
> substitution take place.

No, it is not.  Though I don't think that we have in out Git::I18N
the support for Perl i18n placeholder substitution.

>From gettext manual:
https://www.gnu.org/software/gettext/manual/gettext.html#perl_002dformat

  15.3.16 Perl Format Strings

  There are two kinds format strings in Perl: those acceptable to the Perl
  built-in function printf, labelled as ‘perl-format’, and those acceptable
  to the libintl-perl function __x, labelled as ‘perl-brace-format’.

  Perl printf format strings are described in the sprintf section of
  ‘man perlfunc’.

  Perl brace format strings are described in the Locale::TextDomain(3pm)
  manual page of the CPAN package libintl-perl. In brief, Perl format uses
  placeholders put between braces (‘{’ and ‘}’). The placeholder must have
  the syntax of simple identifiers.
 
Git doesn't use Locale::TextDomain, from what I understand, to provide
fallback in no-gettext case.  Also, Locale::TextDomain is not in core.

The syntax, with the help of shorthand helper function, looks like this:
http://search.cpan.org/dist/libintl-perl/lib/Locale/TextDomain.pm#EXPORTED_FUNCTIONS
https://metacpan.org/pod/Locale::TextDomain#EXPORTED-FUNCTIONS

  __x MSGID, ID1 => VAL1, ID2 => VAL2, ...
  
  One of the nicest features in Perl is its capability to interpolate
  variables into strings:

print "This is the $color $thing.\n";

  This nice feature might con you into thinking that you could now write

print __"This is the $color $thing.\n";

  [But this doesn't work...]

  [...] The Perl backend to GNU gettext has defined an alternative format
  [to using printf / sprintf] for interpolatable strings:

"This is the {color} {thing}.\n";

  Instead of Perl variables you use place-holders (legal Perl variables
  are also legal place-holders) in curly braces, and then you call

print __x ("This is the {color} {thing}.\n", 
   thing => $thang,
   color => $color);

> Signed-off-by: Vasco Almeida 
> ---
>  git-add--interactive.perl | 26 ++
>  1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index e11a33d..4e1e857 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -612,12 +612,12 @@ sub list_and_choose {
>   else {
>   $bottom = $top = find_unique($choice, @stuff);
>   if (!defined $bottom) {
> - error_msg "Huh ($choice)?\n";
> + error_msg sprintf(__("Huh (%s)?\n"), 
> $choice);

So this would be, self explained without need of comment
for translators:

  + error_msg __x ("Huh ({choice})?\n"), 
choice => $choice);


>   next TOPLOOP;
>   }

Though this is probably more work that you wanted to do.
The __x might be defined like this (borrowing from Locale::TextDomain),
which needs to be put into perl/Git/I18N.pm

  sub __ ($);
  sub __expand ($%);

  # With interpolation.
  sub __x ($@)
  {
my ($msgid, %vars) = @_;

return __expand (__($msgid), %vars);
  }
  
  sub __expand ($%)
  {
my ($translation, %args) = @_;

my $re = join '|', map { quotemeta $_ } keys %args;
$translation =~ s/\{($re)\}/defined $args{$1} ? $args{$1} : "{$1}"/ge;

return $translation;
  }



Best regards,
-- 
Jakub Narębski


Re: [PATCH v2 02/11] i18n: add--interactive: mark simple here documents for translation

2016-09-30 Thread Jakub Narębski
W dniu 31.08.2016 o 14:31, Vasco Almeida pisze:
> Mark messages in here document without interpolation for translation.
> 
> Marking for translation by removing here documents this way, rather than
> take advantage of "print __ << EOF" way, makes other instances of help
> messages in clean.c match the first two in this file.  Otherwise,
> reusing here document would add a trailer newline to the message, making
> them not match 100%, hence creating two entries in pot template for
> translation rather than a single entry.

This is good catch, but I think it goes backwards with the solution.

If the text to be translated is multi-line, and it must end with newline,
why is this final newline not included in the msgid?  This would involve
turning printf_ln into printf, and adding trailing newline in final
entry for builtin/clean.c:295, etc. - I think it is better solution than
uglyifing git-add--interactive.perl

Though it is not much of uglifying thanks to Perl support for embedded
newlines in double-quoted strings.

> 
> Signed-off-by: Vasco Almeida 
> ---
>  git-add--interactive.perl | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index fb8e5de..e11a33d 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -636,25 +636,25 @@ sub list_and_choose {
>  }
>  
>  sub singleton_prompt_help_cmd {
> - print colored $help_color, <<\EOF ;
> -Prompt help:
> + print colored $help_color, __(
> +"Prompt help:
>  1  - select a numbered item
>  foo- select item based on unique prefix
> -   - (empty) select nothing
> -EOF
> +   - (empty) select nothing"),
> +"\n";
>  }
[... enough for information ...]

Regards,
-- 
Jakub Narębski




Re: [PATCH v6 3/4] ls-files: pass through safe options for --recurse-submodules

2016-09-30 Thread Brandon Williams
Pass through some known-safe options when recursing into submodules.
(--cached, -v, -t, -z, --debug, --eol)

If other unsafe options are given the caller will be errored out.

Signed-off-by: Brandon Williams 
---

Something more like this correct? I ditched the extra parameters and
reworded the commit msg to reflect this.

 builtin/ls-files.c | 30 +++---
 t/t3007-ls-files-recurse-submodules.sh | 16 
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 63befed..b6144a5 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -30,6 +30,7 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
+static struct argv_array submodules_options = ARGV_ARRAY_INIT;
 
 static const char *prefix;
 static const char *super_prefix;
@@ -168,6 +169,25 @@ static void show_killed_files(struct dir_struct *dir)
}
 }
 
+/*
+ * Compile an argv_array with all of the options supported by 
--recurse_submodules
+ */
+static void compile_submodule_options(const struct dir_struct *dir, int 
show_tag)
+{
+   if (line_terminator == '\0')
+   argv_array_push(_options, "-z");
+   if (show_tag)
+   argv_array_push(_options, "-t");
+   if (show_valid_bit)
+   argv_array_push(_options, "-v");
+   if (show_cached)
+   argv_array_push(_options, "--cached");
+   if (show_eol)
+   argv_array_push(_options, "--eol");
+   if (debug_mode)
+   argv_array_push(_options, "--debug");
+}
+
 /**
  * Recursively call ls-files on a submodule
  */
@@ -182,6 +202,9 @@ static void show_gitlink(const struct cache_entry *ce)
argv_array_push(, "ls-files");
argv_array_push(, "--recurse-submodules");
 
+   /* add supported options */
+   argv_array_pushv(, submodules_options.argv);
+
cp.git_cmd = 1;
cp.dir = ce->name;
status = run_command();
@@ -567,11 +590,12 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
if (require_work_tree && !is_inside_work_tree())
setup_work_tree();
 
+   if (recurse_submodules)
+   compile_submodule_options(, show_tag);
+
if (recurse_submodules &&
(show_stage || show_deleted || show_others || show_unmerged ||
-show_killed || show_modified || show_resolve_undo ||
-show_valid_bit || show_tag || show_eol || with_tree ||
-(line_terminator == '\0')))
+show_killed || show_modified || show_resolve_undo || with_tree))
die("ls-files --recurse-submodules unsupported mode");
 
if (recurse_submodules && error_unmatch)
diff --git a/t/t3007-ls-files-recurse-submodules.sh 
b/t/t3007-ls-files-recurse-submodules.sh
index b5a53c3..33a2ea7 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -34,6 +34,18 @@ test_expect_success 'ls-files correctly outputs files in 
submodule' '
test_cmp expect actual
 '
 
+test_expect_success 'ls-files correctly outputs files in submodule with -z' '
+   lf_to_nul >expect <<-\EOF &&
+   .gitmodules
+   a
+   b/b
+   submodule/c
+   EOF
+
+   git ls-files --recurse-submodules -z >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'ls-files does not output files not added to a repo' '
cat >expect <<-\EOF &&
.gitmodules
@@ -86,15 +98,11 @@ test_incompatible_with_recurse_submodules () {
"
 }
 
-test_incompatible_with_recurse_submodules -z
-test_incompatible_with_recurse_submodules -v
-test_incompatible_with_recurse_submodules -t
 test_incompatible_with_recurse_submodules --deleted
 test_incompatible_with_recurse_submodules --modified
 test_incompatible_with_recurse_submodules --others
 test_incompatible_with_recurse_submodules --stage
 test_incompatible_with_recurse_submodules --killed
 test_incompatible_with_recurse_submodules --unmerged
-test_incompatible_with_recurse_submodules --eol
 
 test_done
-- 
2.10.0



Re: [PATCH v6 3/4] ls-files: pass through safe options for --recurse-submodules

2016-09-30 Thread Brandon Williams
On 09/29, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > +static void compile_submodule_options(const struct dir_struct *dir, int 
> > show_tag)
> > +{
> > +   if (line_terminator == '\0')
> > +   argv_array_push(_options, "-z");
> > +   if (show_tag)
> > +   argv_array_push(_options, "-t");
> > +   if (show_valid_bit)
> > +   argv_array_push(_options, "-v");
> > +   if (show_cached)
> > +   argv_array_push(_options, "--cached");
> > +   if (show_deleted)
> > +   argv_array_push(_options, "--deleted");
> > +   if (show_modified)
> > +   argv_array_push(_options, "--modified");
> > +   if (show_others)
> > +   argv_array_push(_options, "--others");
> > +   if (dir->flags & DIR_SHOW_IGNORED)
> > +   argv_array_push(_options, "--ignored");
> > +   if (show_stage)
> > +   argv_array_push(_options, "--stage");
> > +   if (show_killed)
> > +   argv_array_push(_options, "--killed");
> > +   if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
> > +   argv_array_push(_options, "--directory");
> > +   if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES))
> > +   argv_array_push(_options, "--empty-directory");
> > +   if (show_unmerged)
> > +   argv_array_push(_options, "--unmerged");
> > +   if (show_resolve_undo)
> > +   argv_array_push(_options, "--resolve-undo");
> > +   if (show_eol)
> > +   argv_array_push(_options, "--eol");
> > +   if (debug_mode)
> > +   argv_array_push(_options, "--debug");
> > +}
> 
> With this and 4/4 applied, the documentation still says "--cached"
> is the only supported option.
> 
> Does it really make sense to pass all of these?  I understand "-z"
> and I suspect things like "-t" and "-v" that affect "how" things are
> shown may also happen to work, but I am not sure how much it makes
> sense for options that affect "what" things are shown.
> 
> What does it even mean to ask for say "--unmerged" to be shown, for
> example, from the superproject?  Recurse into submodules whose cache
> entries in the index of the superproject are unmerged, or something
> else?
> 
> I am inclined to say that it is probably better to keep the
> "--cached only" as documented, at least on the "what are shown"
> side.
> 
> Thanks.

You're right that probably makes the most sense for now.

-- 
Brandon Williams


"Purposes, Concepts,Misfits, and a Redesign of Git" (a research paper)

2016-09-30 Thread Konstantin Khomoutov
The "It Will Never Work in Theory" blog has just posted a summary of a
study which tried to identify shortcomings in the design of Git.

In the hope it might be interesting, I post this summary here.
URL: http://neverworkintheory.org/2016/09/30/rethinking-git.html

The except from that resource written by Greg Wilson, the blog author:
>8
Santiago Perez De Rosso and Daniel Jackson: "[Purposes, Concepts,
Misfits, and a Redesign of Git]
(http://people.csail.mit.edu/sperezde/pre-print-oopsla16.pdf)", _SPLASH
2016_. 

> Git is a widely used version control system that is powerful but
> complicated. Its complexity may not be an inevitable consequence of
> its power but rather evidence of flaws in its design. To explore this
> hypothesis, we analyzed the design of Git using a theory that
> identifies concepts, purposes, and misfits. Some well-known
> difficulties with Git are described, and explained as misfits in
> which underlying concepts fail to meet their intended purpose. Based
> on this analysis, we designed a reworking of Git (called Gitless)
> that attempts to remedy these flaws. 
> 
> To correlate misfits with issues reported by users, we conducted a
> study of Stack Overflow questions. And to determine whether users
> experienced fewer complications using Gitless in place of Git, we
> conducted a small user study. Results suggest our approach can be
> profitable in identifying, analyzing, and fixing design problems. 

This paper presents a detailed, well-founded critique of one of the
most powerful, but frustrating, tools in widespread use today. A
follow-up to earlier work published in 2013, it is distinguished from
most other discussion of software design by three things: 

  1. It clearly describes its design paradigm, which comprises
_concepts_ (the major elements of the user's mental model of the
system), _purposes_ (which motivate the concepts), and _misfits_ (which
are instances where concepts do not satisfy purposes, or contradict one
another). 

  2. It lays out Git's concepts and purposes, analyzes its main
features in terms of them, and uses that analysis to identify
mis-matches. 

  3. Crucially, it then analyzes independent discussion of Git (on
Stack Overflow) to see if users are stumbling over the misfits
identified in step 2. 

That would count as a major contribution on its own, but the authors go
further. They have designed a tool called Gitless that directly
addresses the shortcomings they have identified, and the penultimate
section of this paper presents a usability study that compares it to
standard Git. Overall, subjects found Gitles more satisfying and less
frustrating than Git, even though there was no big difference in
efficiency, difficulty, or confusion. Quoting the paper, "This apparent
contradiction might be due to the fact that all of the participants had
used Git before but were encountering Gitless for the first time
without any substantive training. Some participants (2 regular, 1
expert) commented that indeed their problems with Gitless were mostly
due to their lack of practice using it." 

This paper is one of the best examples I have ever seen of how software
designs ought to be critiqued. It combines an explicit, coherent
conceptual base, detailed analysis of a specific system, design
grounded in that analysis, and an empirical check of that design.
Sadly, nothing shows the actual state of our profession more clearly
than the way this work has been greeted: 

> In some respects, this project has been a fool's errand. We picked a
> product that was popular and widely used so as not to be investing
> effort in analyzing a strawman design; we thought that its popularity
> would mean that a larger audience would be interested in our
> experiment. In sharing our research with colleagues, however, we have
> discovered a significant polarization. Experts, who are deeply
> familiar with the product, have learned its many intricacies,
> developed complex, customized workflows, and regularly exploit its
> most elaborate features, are often defensive and resistant to the
> suggestion that the design has flaws. In contrast, less intensive
> users, who have given up on understanding the product, and rely on
> only a handful of memorized commands, are so frustrated by their
> experience that an analysis like ours seems to them belaboring the
> obvious.
>8
(This text is Copyright © Never Work in Theory, under the CC license.)


Re: [PATCH 1/5] pretty: allow formatting DATE_SHORT

2016-09-30 Thread SZEDER Gábor
> On Thu, Sep 29, 2016 at 1:33 AM, Jeff King  wrote:
> > There's no way to do this short of "%ad" and --date=short,
> > but that limits you to having a single date format in the
> > output.
> >
> > This would possibly be better done with something more like
> > "%ad(short)".
> >
> > Signed-off-by: Jeff King 
> > ---
> >  pretty.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/pretty.c b/pretty.c
> > index 493edb0..c532c17 100644
> > --- a/pretty.c
> > +++ b/pretty.c
> > @@ -727,6 +727,9 @@ static size_t format_person_part(struct strbuf *sb, 
> > char part,
> > case 'I':   /* date, ISO 8601 strict */
> > strbuf_addstr(sb, show_ident_date(, 
> > DATE_MODE(ISO8601_STRICT)));
> > return placeholder_len;
> > +   case 's':
> > +   strbuf_addstr(sb, show_ident_date(, DATE_MODE(SHORT)));
> > +   return placeholder_len;
> > }
> >
> >  skip:
> > --
> > 2.10.0.566.g5365f87
> >
> 
> Nice. I use date=short in some of my aliases and switching to this is
> nicer. I assume this turns into "%(as)"?
> 
> What about documenting this in  pretty-formats.txt?

Here you go :)

  
http://public-inbox.org/git/1444235305-8718-1-git-send-email-sze...@ira.uka.de/



Re: [PATCH v2] gpg-interface: use more status letters

2016-09-30 Thread Michael J Gruber
Ramsay Jones venit, vidit, dixit 28.09.2016 23:09:
> 
> 
> On 28/09/16 20:59, Junio C Hamano wrote:
>> Michael J Gruber  writes:
>  
>>> +  "X" for a good expired signature, or good signature made by an expired 
>>> key,
>>
>> As an attempt to clarify that we cover both EXPSIG and EXPKEYSIG
>> cases, I think this is good enough.  I may have phrased the former
>> slightly differently, though: "a good signature that has expired".
>>
>> I have no strong opinion if we want to stress that we cover both
>> cases, though, which is I think what Ramsay's comment was about.
> 
> Kinda! ;-)
> 
> I'm not sure that it is a good idea to mash both EXPSIG and EXPKEYSIG
> into one status letter, but I was also fishing for some information
> about EXPSIG. I was only vaguely aware that a signature could expire
> _independently_ of the key used to do the signing. Also, according to
> https://www.gnupg.org/documentation/manuals/gnupg/Automated-signature-checking.html
> for the EXPSIG case 'Note, that this case is currently not implemented.'

A key can have an expiration date.

A signature can have an expiration date.

The "goodness" of a signature is independent of the expiraton dates.

Signature expiration is implemented, I tested that (gpg 1 aka "classic").

> Hmm, I guess these are so closely related that a single status letter
> is OK, but I think I would prefer your phrasing; namely:
> 
>  "X" for a good signature that has expired, or a good signature made with an 
> expired key,
> 

I'm open to whatever phrasing you deem clearer.

Also, I'm open to using another letter for EXPKEYSIG but couldn't decide
between 'Y', 'Z', 'K'. 'K' could be confused with REVKEYSIG, I'm afraid.
'Y' is next to 'X' and contained in 'KEY', it would be my first choice.

Cheers,
Michael



Re: [PATCH v2] gpg-interface: use more status letters

2016-09-30 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 28.09.2016 21:59:
> Michael J Gruber  writes:
> 
>> - Use GNUPGHOME="$HOME/gnupg-home-not-used" just like in other tests (lib).
> 
> If you are not using /dev/null, I expected you to do
> 
>   . ./test-lib.sh
>   GNUPGHOME_saved=$GNPGHOME
> . "$TEST_DIRECTORY/lib-gpg.sh"
> 
> and then use
> 
>   GNUPGHOME="$GNUPGHOME_saved" git log -1 ...
> 
> in the test.
> 
> Otherwise, you are not futureproofing your use and only adding to
> maintenance burden.  The gnupg-home-not-used hack may turn out to be
> a problematic and test-lib.sh may update to point to somewhere else,
> which will leave your copy still pointing at the old problematic
> place).

Well, I understood you told me to do what test-lib.sh does.

You obviously wanted me to piggy-bak on test-lib.sh's behavior instead.

I don't know what's more likely to break - the latter relies on
test-lib.sh's setting GNUPGHOME to a non existing gpg home, which is
something funny to do if you don't even plan to use gpg.

>> - Do not parse for signer UID in the ERRSIG case (and test that we do not).
> 
> Good.
> 
>> - Retreat "rather" addition from the doc: good/valid are terms that we use
>>   differently from gpg anyways.
> 
> OK.
> 
>> +  "X" for a good expired signature, or good signature made by an expired 
>> key,
> 
> As an attempt to clarify that we cover both EXPSIG and EXPKEYSIG
> cases, I think this is good enough.  I may have phrased the former
> slightly differently, though: "a good signature that has expired".
> 
> I have no strong opinion if we want to stress that we cover both
> cases, though, which is I think what Ramsay's comment was about.

I'll comment in the reply to his 2nd e-mail

Michael




Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Jeff King
On Thu, Sep 29, 2016 at 06:18:03PM -0700, Linus Torvalds wrote:

> On Thu, Sep 29, 2016 at 5:57 PM, Linus Torvalds
>  wrote:
> >
> > Actually, all the other cases seem to be "parse a SHA1 with a known
> > length", so they really don't have a negative length.  So this seems
> > ok, and is easier to verify than the "what all contexts might use
> > DEFAULT_ABBREV" thing. There's only a few callers, and it's a static
> > function so it's easy to check it locally in sha1_name.c.
> 
> Here's my original patch with just a tiny change that instead of
> starting the automatic guessing at 7 each time, it starts at
> "default_automatic_abbrev", which is initialized to 7.
> 
> The difference is that if we decide that "oh, that was too small, need
> to repeat", we also update that "default_automatic_abbrev" value, so
> that we won't start at the number that we now know was too small.
> 
> So it still loops over the abbrev values, but now it only loops a
> couple of times.
> 
> I actually verified the performance impact by doing
> 
>   time git rev-list --abbrev-commit HEAD > /dev/null
> 
> on the kernel git tree, and it does actually matter. With my original
> patch, we wasted a noticeable amount of time on just the extra
> looping, with this it's down to the same performance as just doing it
> once at init time (it's about 12s vs 9s on my laptop).

I agree that this deals with the performance concerns by caching the
default_abbrev_len and starting there. I still think it's unnecessarily
invasive to touch get_short_sha1() at all, which is otherwise only a
reading function.

So IMHO, the best combination is the init_default_abbrev() you posted in
[1], but initialized at the top of find_unique_abbrev(). And cached
there, obviously, in a similar way.

-Peff

[1] 
http://public-inbox.org/git/CA+55aFyVEQ+8TBBUm5KG9APtd9wy8cp_mRO=3nj12dxznla...@mail.gmail.com/


Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

2016-09-30 Thread Jeff King
On Thu, Sep 29, 2016 at 04:13:38PM -0700, Junio C Hamano wrote:

> There are very early ones in the program startup sequence in the
> following functions, but I do not think of a reason why our new and
> early call to prepare_packed_git() might be problematic, given that
> all of them require us to have an access to the repository (i.e.
> this change cannot introduce a regression where a command used to
> work outside a repository but barf when prepare_packed_git() is
> called early):
> 
>  - builtin/describe.c
>  - builtin/rev-list.c
>  - builtin/rev-parse.c
> 
> I thought that the one in diff.c might be problematic when the "git
> diff" command is run outside a repository with the "--no-index"
> option, but it appears that init_default_abbrev() seems to be OK
> when run outside a repository.

Actually, "diff --no-index" is currently buggy in this regard. In the
followup series to jk/setup-sequence-update (which I mentioned but
haven't posted yet), I teach get_object_dir() not to blindly default to
".git", and found that "diff --no-index" is perfectly happy to look in
".git/objects" for find_unique_abbrev(), even when we know there's no
repository (or it has an unknown vintage).

I fixed it there by just using the default abbrev value for out-of-repo
diffs, and skip calling find_unique_abbrev() at all. That would here,
too.

But if we add object-store initialization at other times, it's a
potential conflict. IMHO this should stay inside find_unique_abbrev(),
where we know we already must look at the object store.

-Peff


Re: [PATCH 1/5] pretty: allow formatting DATE_SHORT

2016-09-30 Thread Jacob Keller
On Thu, Sep 29, 2016 at 1:33 AM, Jeff King  wrote:
> There's no way to do this short of "%ad" and --date=short,
> but that limits you to having a single date format in the
> output.
>
> This would possibly be better done with something more like
> "%ad(short)".
>
> Signed-off-by: Jeff King 
> ---
>  pretty.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/pretty.c b/pretty.c
> index 493edb0..c532c17 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -727,6 +727,9 @@ static size_t format_person_part(struct strbuf *sb, char 
> part,
> case 'I':   /* date, ISO 8601 strict */
> strbuf_addstr(sb, show_ident_date(, 
> DATE_MODE(ISO8601_STRICT)));
> return placeholder_len;
> +   case 's':
> +   strbuf_addstr(sb, show_ident_date(, DATE_MODE(SHORT)));
> +   return placeholder_len;
> }
>
>  skip:
> --
> 2.10.0.566.g5365f87
>

Nice. I use date=short in some of my aliases and switching to this is
nicer. I assume this turns into "%(as)"?

What about documenting this in  pretty-formats.txt?

Thanks,
Jake