Re: [PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-16 Thread Josh Triplett
On Fri, Sep 16, 2016 at 05:49:22PM -0700, Jeff King wrote:
> On Fri, Sep 16, 2016 at 10:27:45AM -0700, Josh Triplett wrote:
> 
> > By far, the most common subject-prefix I've seen other than "PATCH" is
> > "RFC PATCH" (or occasionally "PATCH RFC").  Seems worth optimizing for
> > the common case, to avoid having to spell it out the long way as
> > --subject-prefix='RFC PATCH'.
> 
> "RFC" is the most common one for me, too. And if it ends here, I'm OK
> with it. But I'm a little worried with ending up with a proliferation of
> options.

I haven't seen a significant number of variations on subject prefixes; I
can't think of any other prefix I've seen often enough to suggest an
option.

> If we had a short-option for --subject-prefix, then:
> 
>   -P RFC
> 
> is not so bad compared to "--rfc". But if you want to spell it as "RFC
> PATCH" that's getting a bit longer. We could have a short option for
> "tag this in the subject prefix _in addition_ to writing PATCH". And
> then you could do:
> 
>   -T RFC
> 
> I dunno. One other thing to consider is that format-patch takes
> arbitrary diff options, so we'd want to avoid stomping on them with any
> short options (which is why I used "-T" instead of "-t", though I find
> it unlikely that many people use the latter with format-patch). That's a
> point in favor of --rfc, I think.

I agree; the short option space seems more contentious.  And in any
case, I find --rfc more ergonomic than "-T RFC". :)

> >  builtin/log.c   | 10 ++
> >  t/t4014-format-patch.sh |  9 +
> >  2 files changed, 19 insertions(+), 0 deletions(-)
> 
> Documentation?

Oops, thanks.  I'll send v2 shortly.

> > +static int rfc_callback(const struct option *opt, const char *arg, int 
> > unset)
> > +{
> > +   subject_prefix = 1;
> > +   ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH");
> > +   return 0;
> > +}
> 
> I was going to complain that you don't free() the previous value, but
> actually the other callers do not xstrdup() in the first place (and we
> do not need to do so here, either, as it's a string literal). We
> actually _do_ allocate a new copy when reading the value from config,
> but it's probably not a big deal in practice to leak that.
> 
> I also wonder if you could implement this as just:
> 
>   return subject_prefix_callback(opt, "RFC PATCH", unset);
> 
> And then if you write the documentation as:
> 
>   --rfc::
>   Behave as if --subject-prefix="RFC PATCH" was specified.
> 
> then it will be trivially correct. :)

Nice idea; will do.

> > +cat >expect <<'EOF'
> > +Subject: [RFC PATCH 1/1] header with . in it
> > +EOF
> > +test_expect_success '--rfc' '
> > +   git format-patch -n -1 --stdout --rfc >patch &&
> > +   grep ^Subject: patch >actual &&
> > +   test_cmp expect actual
> > +'
> 
> Our usual style these days is to set up expectations inside the test
> blocks (and use "<<-" to get nice indentation; we also typically use
> "\EOF" but that's purely style).

I copied this from a test immediately above it. :)

I can change it easily enough, though.

- Josh Triplett


Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

>   From: bogosity
> - a list
> - of stuff
>
> Unchanged, the subsequent patch would break this test because it would
> interpret that as a multi-line "From" in-body header when in-body
> headers are *not* disabled.

Yes, that is totally expected.  So I would be perfectly fine if your
patch changed the test vector for that case, saying "Allowing a
folded in-body header means the expected result for the above three
lines has to change".



Re: [PATCH] ls-files: add pathspec matching for submodules

2016-09-16 Thread Junio C Hamano
Brandon Williams  writes:

> ...
>   [--full-name] [--recurse-submodules]
> - [--output-path-prefix=]
> + [--submodule-prefix=]
>   [--abbrev] [--] [...]
>  
> ---output-path-prefix=::
> +--submodule-prefix=::
>   Prepend the provided path to the output of each file
> ...
>  static int show_eol;
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
>  static int recurse_submodules;
> ...
>   static struct strbuf full_name = STRBUF_INIT;
> - if (output_path_prefix && *output_path_prefix) {
> + if (submodule_prefix && *submodule_prefix) {
>   strbuf_reset(_name);
> - strbuf_addstr(_name, output_path_prefix);
> + strbuf_addstr(_name, submodule_prefix);
>   strbuf_addstr(_name, name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

> + /* Add pathspec args */
> + argv_array_push(, "--");
> + for (i = 0; i < pathspec.nr; ++i)
> + argv_array_push(, pathspec.items[i].original);

OK, so as discussed previously with Heiko and Stefan, the idea is to

 - pass the original pathspec as-is,

 - when --submodule-prefix is given, a path discovered in a
   submodule repository is first prefixed with that string before
   getting checked to see if it matches the original pathspec.

And this loop is about relaying the original pathspec.

> @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
>  
>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>  {
> + struct strbuf name = STRBUF_INIT;
>   int len = max_prefix_len;
> + if (submodule_prefix)
> + strbuf_addstr(, submodule_prefix);
> + strbuf_addstr(, ce->name);
>  
>   if (len >= ce_namelen(ce))
> - die("git ls-files: internal error - cache entry not superset of 
> prefix");
> + die("git ls-files: internal error - cache entry not "
> + "superset of prefix");

This is not such a great thing to do.  Upon a bug report, we can no
longer do

git grep 'cache entry not superset'

to see where the error message is coming from.

> - if (!match_pathspec(, ce->name, ce_namelen(ce),
> - len, ps_matched,
> - S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
> - return;
> - if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
> + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> + submodule_path_match(, name.buf, ps_matched)) {
>   show_gitlink(ce);
> - return;
> - }
> + } else if (match_pathspec(, name.buf, name.len,
> +   len, ps_matched,
> +   S_ISDIR(ce->ce_mode) ||
> +   S_ISGITLINK(ce->ce_mode))) {
> + if (tag && *tag && show_valid_bit &&
> + ...

Argh.  If we had a preparatory clean-up step, would it have helped
to avoid this big re-indentation that makes the patch harder to read
than necessary, I wonder?

Another way would have been to "goto" from the end of this block

> + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> + submodule_path_match(, name.buf, ps_matched)) {

where we used to "return" out to the central clean-up location, i.e.
here.

> + strbuf_release();
>  }


>   parse_pathspec(, 0,
>  PATHSPEC_PREFER_CWD |
>  PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>  prefix, argv);
>  
> - /* Find common prefix for all pathspec's */
> - max_prefix = common_prefix();
> + /*
> +  * Find common prefix for all pathspec's
> +  * This is used as a performance optimization which violates correctness
> +  * in the recurse_submodules mode
> +  */

The two new lines phrase it overly negatively and also misleading.
I thought you were saying "We do this as optimization anyway; damn
the correctness in the submodule case!" in my first reading before
reading the statements the comment talks about.  "This optimization
unfortunately cannot be done when recursing into submodules" would
have been better.

> + if (recurse_submodules)
> + max_prefix = NULL;
> + else
> + max_prefix = common_prefix();
>   max_prefix_len = max_prefix ? strlen(max_prefix) : 0;

> diff --git a/dir.c b/dir.c
> index 0ea235f..630dc7a 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count)
>   return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +static int prefix_fnmatch(const struct pathspec_item *item,
> +const char *pattern, const char *string,
> +int prefix)
> 

[PATCH] ls-files: add pathspec matching for submodules

2016-09-16 Thread Brandon Williams
Pathspecs can be a bit tricky when trying to apply them to submodules.
This change permits the pathspec logic to perform a prefix match against
submodules since a pathspec could refer to a file inside of a submodule.

Signed-off-by: Brandon Williams 
---
 Documentation/git-ls-files.txt |   4 +-
 builtin/ls-files.c | 143 +++--
 dir.c  |  62 +-
 dir.h  |   4 +
 t/t3007-ls-files-recurse-submodules.sh |  66 +--
 5 files changed, 208 insertions(+), 71 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index a623ebf..09e4449 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -19,7 +19,7 @@ SYNOPSIS
[--exclude-standard]
[--error-unmatch] [--with-tree=]
[--full-name] [--recurse-submodules]
-   [--output-path-prefix=]
+   [--submodule-prefix=]
[--abbrev] [--] [...]
 
 DESCRIPTION
@@ -143,7 +143,7 @@ a space) at the start of each line:
Recursively calls ls-files on each submodule in the repository.
Currently there is only support for the --cached mode.
 
---output-path-prefix=::
+--submodule-prefix=::
Prepend the provided path to the output of each file
 
 --abbrev[=]::
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 687e475..dc1e076 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -29,7 +29,7 @@ static int show_valid_bit;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
-static const char *output_path_prefix;
+static const char *submodule_prefix;
 static int recurse_submodules;
 
 static const char *prefix;
@@ -78,9 +78,9 @@ static void write_name(const char *name)
 * churn.
 */
static struct strbuf full_name = STRBUF_INIT;
-   if (output_path_prefix && *output_path_prefix) {
+   if (submodule_prefix && *submodule_prefix) {
strbuf_reset(_name);
-   strbuf_addstr(_name, output_path_prefix);
+   strbuf_addstr(_name, submodule_prefix);
strbuf_addstr(_name, name);
name = full_name.buf;
}
@@ -177,12 +177,30 @@ static void show_gitlink(const struct cache_entry *ce)
 {
struct child_process cp = CHILD_PROCESS_INIT;
int status;
+   int i;
 
argv_array_push(, "ls-files");
argv_array_push(, "--recurse-submodules");
-   argv_array_pushf(, "--output-path-prefix=%s%s/",
-output_path_prefix ? output_path_prefix : "",
+   argv_array_pushf(, "--submodule-prefix=%s%s/",
+submodule_prefix ? submodule_prefix : "",
 ce->name);
+   /* add options */
+   if (show_eol)
+   argv_array_push(, "--eol");
+   if (show_valid_bit)
+   argv_array_push(, "-v");
+   if (show_stage)
+   argv_array_push(, "--stage");
+   if (show_cached)
+   argv_array_push(, "--cached");
+   if (debug_mode)
+   argv_array_push(, "--debug");
+
+   /* Add pathspec args */
+   argv_array_push(, "--");
+   for (i = 0; i < pathspec.nr; ++i)
+   argv_array_push(, pathspec.items[i].original);
+
cp.git_cmd = 1;
cp.dir = ce->name;
status = run_command();
@@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
 
 static void show_ce_entry(const char *tag, const struct cache_entry *ce)
 {
+   struct strbuf name = STRBUF_INIT;
int len = max_prefix_len;
+   if (submodule_prefix)
+   strbuf_addstr(, submodule_prefix);
+   strbuf_addstr(, ce->name);
 
if (len >= ce_namelen(ce))
-   die("git ls-files: internal error - cache entry not superset of 
prefix");
+   die("git ls-files: internal error - cache entry not "
+   "superset of prefix");
 
-   if (!match_pathspec(, ce->name, ce_namelen(ce),
-   len, ps_matched,
-   S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
-   return;
-   if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
+   if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+   submodule_path_match(, name.buf, ps_matched)) {
show_gitlink(ce);
-   return;
-   }
+   } else if (match_pathspec(, name.buf, name.len,
+ len, ps_matched,
+ S_ISDIR(ce->ce_mode) ||
+ S_ISGITLINK(ce->ce_mode))) {
+   if (tag && *tag && show_valid_bit &&
+   (ce->ce_flags & CE_VALID)) {
+   static char alttag[4];
+   memcpy(alttag, tag, 3);
+   if 

Re: [PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-16 Thread Jeff King
On Fri, Sep 16, 2016 at 10:27:45AM -0700, Josh Triplett wrote:

> By far, the most common subject-prefix I've seen other than "PATCH" is
> "RFC PATCH" (or occasionally "PATCH RFC").  Seems worth optimizing for
> the common case, to avoid having to spell it out the long way as
> --subject-prefix='RFC PATCH'.

"RFC" is the most common one for me, too. And if it ends here, I'm OK
with it. But I'm a little worried with ending up with a proliferation of
options.

If we had a short-option for --subject-prefix, then:

  -P RFC

is not so bad compared to "--rfc". But if you want to spell it as "RFC
PATCH" that's getting a bit longer. We could have a short option for
"tag this in the subject prefix _in addition_ to writing PATCH". And
then you could do:

  -T RFC

I dunno. One other thing to consider is that format-patch takes
arbitrary diff options, so we'd want to avoid stomping on them with any
short options (which is why I used "-T" instead of "-t", though I find
it unlikely that many people use the latter with format-patch). That's a
point in favor of --rfc, I think.

>  builtin/log.c   | 10 ++
>  t/t4014-format-patch.sh |  9 +
>  2 files changed, 19 insertions(+), 0 deletions(-)

Documentation?

> +static int rfc_callback(const struct option *opt, const char *arg, int unset)
> +{
> + subject_prefix = 1;
> + ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH");
> + return 0;
> +}

I was going to complain that you don't free() the previous value, but
actually the other callers do not xstrdup() in the first place (and we
do not need to do so here, either, as it's a string literal). We
actually _do_ allocate a new copy when reading the value from config,
but it's probably not a big deal in practice to leak that.

I also wonder if you could implement this as just:

  return subject_prefix_callback(opt, "RFC PATCH", unset);

And then if you write the documentation as:

  --rfc::
Behave as if --subject-prefix="RFC PATCH" was specified.

then it will be trivially correct. :)

> +cat >expect <<'EOF'
> +Subject: [RFC PATCH 1/1] header with . in it
> +EOF
> +test_expect_success '--rfc' '
> + git format-patch -n -1 --stdout --rfc >patch &&
> + grep ^Subject: patch >actual &&
> + test_cmp expect actual
> +'

Our usual style these days is to set up expectations inside the test
blocks (and use "<<-" to get nice indentation; we also typically use
"\EOF" but that's purely style).

-Peff


Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example

2016-09-16 Thread Jonathan Tan

On 09/16/2016 03:55 PM, Junio C Hamano wrote:

Hmph, these:

 t/t5100/info0008--no-inbody-headers  | 5 +
 t/t5100/msg0008--no-inbody-headers   | 6 ++
 t/t5100/msg0015--no-inbody-headers   | 1 +

have --no-inbody-headers in their names; wouldn't that an indication
that they are expected output when mailinfo is run while in-body
header feature disabled?


Yes, that's correct (they are the test data for when the in-body header 
feature is disabled).



I would have expected that it would make more sense to make no
change to sample.mbox and have updated expectation to outputs in the
case where in-body header feature is enabled.


The sample.mbox file contains the following:

  From nobody Mon Sep 17 00:00:00 2001
  From: A U Thor 
  Subject: check bogus body header (from)
  Date: Fri, 9 Jun 2006 00:44:16 -0700

  From: bogosity
- a list
- of stuff

Unchanged, the subsequent patch would break this test because it would 
interpret that as a multi-line "From" in-body header when in-body 
headers are *not* disabled.


Besides changing sample.mbox, the other way to make sure that this test 
passes is to suppress the test when in-body headers are *not* disabled, 
but looking at t5100* (directory and script), it seemed more 
straightforward to modify sample.mbox.


The patch I sent added a blank line after "From: bogosity", but removing 
the spaces before "- a list" and "- of stuff" would work too.



To make sure this new feature will not break in the future, we would
want a brand new message with a folded in-body header added to the
sample.mbox, and see how it is parsed by mailinfo with in-body
header feature enabled (and disabled).


OK, I'll add this test. (The subsequent patch already has the brand new 
message, but not the test where in-body headers are disabled.)


Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Jonathan Tan

On 09/16/2016 04:04 PM, Junio C Hamano wrote:

Jonathan Tan  writes:

I'm concerned about what happens if check_header fails - we would then
have some lines which need to be treated as log messages. (At least,
they are currently treated that way.)


I actually think we should refactor check_header() further so that
in-body header processing does not even see things that shouldn't be
changed.  The current check_header() should be used only for real
mail headers, and then a reduced version of check_header() that is
called for in-body will _ONLY_ handle the header lines that are
handled by the first "search for the interesting parts" loop.

And of course we would update your "does it look like rfc2822?" to
match what are handled by the "interesting parts" loop.  That I
think would match the current behaviour much better, I would think.


There would be a bit of code duplication in that this "does it look like 
rfc2822" function would also need to account for duplicate headers (e.g. 
2 "Subject:" lines in the in-body headers) because check_header would 
reject the 2nd one, but that is minor. (Alternatively, we could just 
allow duplicate headers in the in-body headers.)



The ">From " and "[PATCH]" cases in check_header() should not even
be there.  We should handle them inside handle_commit_msg(), as
these two cases should never appear in the real header part of a
message.



And if we clean it up like that, I do not think we would ever need
to worry about "ah, it looked like a header but it is not after
all".  And not having to worry about it is a good thing and should
be one of the primary goals in this conversion, I whould think.


Yes, this makes sense. I'll go ahead and make a patch set implementing 
this (unless anyone has any objections).


Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone

2016-09-16 Thread Junio C Hamano
Kevin Wern  writes:

> Use transport_download_primer and transport_prime_clone in git clone.
> This only supports a fully connected packfile.
>
> transport_prime_clone and transport_download_primer are executed
> completely independent of transport_(get|fetch)_remote_refs, et al.
> transport_download_primer is executed based on the existence of an
> alt_resource. The idea is that the "prime clone" execution should be
> able to attempt retrieving an alternate resource without dying, as
> opposed to depending on the result of upload pack's "capabilities" to
> indicate whether or not the client can attempt it.
>
> If a resumable resource is available, execute a codepath with the
> following modular components:
> - downloading resource to a specific directory
> - using the resource (for pack, indexing and generating the bundle
>   file)
> - cleaning up the resource (if the download or use fails)
> - cleaning up the resource (if the download or use succeeds)
>
> If resume is interrupted on the client side, the alternate resource
> info is written to the RESUMABLE file in the git directory.
>
> On resume, the required info is extracted by parsing the created
> config file, and that info is used to determine the work and git
> directories. If these cannot be determined, the program exits.
> The writing of the refspec and determination of the initial git
> directories are skipped, along with transport_prime_clone.
>
> The main purpose of this series of patches is to flesh out a codepath
> for automatic resuming, manual resuming, and leaving a resumable
> directory on exit--the logic for when to do these still needs more
> work.
>
> Signed-off-by: Kevin Wern 
> ---
>  Documentation/git-clone.txt |  16 ++
>  builtin/clone.c | 590 
> +---
>  t/t9904-git-prime-clone.sh  | 181 ++
>  3 files changed, 698 insertions(+), 89 deletions(-)
>  create mode 100755 t/t9904-git-prime-clone.sh
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index b7c467a..5934bb6 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -16,6 +16,7 @@ SYNOPSIS
> [--depth ] [--[no-]single-branch]
> [--recursive | --recurse-submodules] [--] 
> []
> +'git clone --resume '
>  
>  DESCRIPTION
>  ---
> @@ -172,6 +173,12 @@ objects from the source repository into a pack in the 
> cloned repository.
>   via ssh, this specifies a non-default path for the command
>   run on the other end.
>  
> +--prime-clone ::
> +-p ::

Not many other options have single letter shorthand.  Is it expected
that it is worth to let this option squat on a short-and-sweet "-p",
perhaps because it is so frequently used?

> +--resume::
> + Resume a partially cloned repo in a "resumable" state. This
> + can only be specified with a single local directory ( + dir>). This is incompatible with all other options.
> +
> +::
> + The directory of the partial clone. This could be either the
> + work directory or the git directory.

I think these should be described this way:

--resume ::
description if what resume option does and how resumable_dir
is used by the option.

in a single bullet point.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 9ac6c01..d9a13dc 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -8,7 +8,9 @@
>   * Clone a repository into a different directory that does not yet exist.
>   */
>  
> +#include "cache.h"
>  #include "builtin.h"

I do not think you need to include cache.h if you are including
builtin.h; Documentation/CodingGuidelines says:

 - The first #include in C files, except in platform specific compat/
   implementations, must be either "git-compat-util.h", "cache.h" or
   "builtin.h".  You do not have to include more than one of these.

> @@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = {
>  
>  static int option_no_checkout, option_bare, option_mirror, 
> option_single_branch = -1;
>  static int option_local = -1, option_no_hardlinks, option_shared, 
> option_recursive;
> +static int option_resume;
>  static char *option_template, *option_depth;
> -static char *option_origin = NULL;
> +static const char *option_origin = NULL;

Is this change related to anything you are doing here?

If you are fixing things while at it, please don't ;-) If you really
want to, please also remove " = NULL", from this line and also from
the next line.  Also do not add " = NULL" at the end of alt_res.

>  static char *option_branch = NULL;
>  ...
> +static const struct alt_resource *alt_res = NULL;

> +static char *get_filename(const char *dir)
> +{
> + char *dir_copy = xstrdup(dir);
> + strip_trailing_slashes(dir_copy);
> + char *filename, *final = NULL;
> +
> + filename = find_last_dir_sep(dir);
> +
> + if (filename && *(++filename))
> + final = 

Re: [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT

2016-09-16 Thread Junio C Hamano
Kevin Wern  writes:

> Add option RUN_COMMAND_NO_STDOUT, which sets no_stdout on a child
> process.
>
> This will be used by git clone when calling index-pack on a downloaded
> packfile.

If it is just one caller, would't it make more sense for that caller
set no_stdout explicitly itself?


Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

> On 09/16/2016 01:59 PM, Junio C Hamano wrote:
>> if (mi->in_line_header->len) {
>> /* we have read the beginning of one in-line header */
>> if (line->len && isspace(*line->buf) &&
>> !(mi->use_scissors && is_scissors_line(line))) {
>
> Minor note: this means that the scissors check appears twice in the
> code, once here and once below (for the non-header case).

Yes.  I actually was wondering if it is even more sensible to always
have the scissors check at the very beginning.  Even if we saw a
half-written in-body header already in the message, when we see a
scissors line, we clear the slate and restart as if the line after
the scissors is the first line in the body of the message.

>> append to mi->in_line_header strbuf;
>> return 0;
>> }
>> /* otherwise we know mi->in_line_header is now complete */
>> check_header(mi, mi->in_line_header, ...);
>
> (Sorry - should have also noticed this in your original e-mail.)
>
> I'm concerned about what happens if check_header fails - we would then
> have some lines which need to be treated as log messages. (At least,
> they are currently treated that way.)

I actually think we should refactor check_header() further so that
in-body header processing does not even see things that shouldn't be
changed.  The current check_header() should be used only for real
mail headers, and then a reduced version of check_header() that is
called for in-body will _ONLY_ handle the header lines that are
handled by the first "search for the interesting parts" loop.

And of course we would update your "does it look like rfc2822?" to
match what are handled by the "interesting parts" loop.  That I
think would match the current behaviour much better, I would think.

The ">From " and "[PATCH]" cases in check_header() should not even
be there.  We should handle them inside handle_commit_msg(), as
these two cases should never appear in the real header part of a
message.

And if we clean it up like that, I do not think we would ever need
to worry about "ah, it looked like a header but it is not after
all".  And not having to worry about it is a good thing and should
be one of the primary goals in this conversion, I whould think.

Thanks.




Re: [PATCH 03/10] diff.c: drop tautologous condition in emit_line_0

2016-09-16 Thread Stefan Beller
On Mon, Sep 12, 2016 at 4:53 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> diff --git a/diff.c b/diff.c
>> index 156c2aa..9d2e704 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -460,8 +460,7 @@ static void emit_line_0(struct diff_options *o, const 
>> char *set, const char *res
>>
>>   if (len == 0) {
>>   has_trailing_newline = (first == '\n');
>> - has_trailing_carriage_return = (!has_trailing_newline &&
>> - (first == '\r'));
>> + has_trailing_carriage_return = (first == '\r');
>>   nofirst = has_trailing_newline || has_trailing_carriage_return;
>>   } else {
>>   has_trailing_newline = (len > 0 && line[len-1] == '\n');
>
> Interesting.
>
> This may be a mis-conversion at 250f7993 ("diff.c: split emit_line()
> from the first char and the rest of the line", 2009-09-14), I
> suspect.  The original took line[] with length and peeked for '\n',
> and when it saw one, it decremented length before checking
> line[len-1] for '\r'.
>
> But of course if there is only one byte on the line (i.e. len == 0
> after first is stripped off), it cannot be both '\n' or '\r' at the
> same time.
>
> Thanks for spotting.

Oh, right, it used to be possible to remove \r\n completely and that information
was then kept as has_trailing_newline = has_trailing_carriage_return = 1;
and the resulting line is kept completely without ending line.

After some thought I don't think I can use this mis-conversion
to trigger a bug though, because the len=0 can only ever happen
if first is '\n' alone essentially.

Another thing I noticed when playing around with diffs:

$ printf "\r\n" >crlf
$ git commit crlf -m "add file crlf, empty line"
$ printf "non zero length\r\n" >crlf
$ diff --git a/crlf b/crlf
$ index d3f5a12..ece7140 100644
--- a/crlf
+++ b/crlf
@@ -1 +1 @@
-
+non zero length^M
$ # The - line is missing a ^M ?


Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

> On 09/16/2016 12:19 PM, Junio C Hamano wrote:
>> Jonathan Tan  writes:
>>
>>> An existing sample message (0015) in the tests for mailinfo contains an
>>> indented line immediately after an in-body header (without any
>>> intervening blank line).
>>
>> This comes from d25e5159 ("git am/mailinfo: Don't look at in-body
>> headers when rebasing", 2009-11-20), where we want to make sure that
>> a "From: bogosity" that isn't meant to be an in-body header is not
>> identified as such, even when it is immediately followed by a
>> non-blank line.  "From: bogosity" is for msg0015 but the same
>> applies to the header-looking block for msg0008.
>>
>> Adding a blank line there will defeat the whole point of the test,
>> which is to make sure we don't do anything funky when --no-inbody-headers
>> is asked for, no?
>
> Before I revise the patch set...I think that the point of 0015 would
> be handled by 0008 (after this patch is applied), but if you prefer
> that 0015 retain its purpose, I can unindent the bullet list in 0015
> instead of adding the extra line (and then dropping all 0008
> changes). Would that be better? (0015 needs to be changed somehow,
> because its indented line would be interpreted as a continuation line
> after RFC/PATCH 3/3 is applied.)

Hmph, these:

 t/t5100/info0008--no-inbody-headers  | 5 +
 t/t5100/msg0008--no-inbody-headers   | 6 ++
 t/t5100/msg0015--no-inbody-headers   | 1 +

have --no-inbody-headers in their names; wouldn't that an indication
that they are expected output when mailinfo is run while in-body
header feature disabled?

I would have expected that it would make more sense to make no
change to sample.mbox and have updated expectation to outputs in the
case where in-body header feature is enabled.

To make sure this new feature will not break in the future, we would
want a brand new message with a folded in-body header added to the
sample.mbox, and see how it is parsed by mailinfo with in-body
header feature enabled (and disabled).



Re: [PATCH 07/11] Resumable clone: add resumable download to http/curl

2016-09-16 Thread Junio C Hamano
Kevin Wern  writes:

> +int http_download_primer(const char *url, const char *out_file)
> +{
> + int ret = 0, try_count = HTTP_TRY_COUNT;
> + struct http_get_options options = {0};
> + options.progress = 1;
> +
> + if (file_exists(out_file)) {
> + fprintf(stderr,
> + "File already downloaded: '%s', skipping...\n",
> + out_file);
> + return ret;
> + }
> +
> + do {
> + if (try_count != HTTP_TRY_COUNT) {
> + fprintf(stderr, "Connection interrupted for some "
> + "reason, retrying (%d attempts left)\n",
> + try_count);
> + struct timeval time = {10, 0}; // 1s

We do not use // comment.

> + select(0, NULL, NULL, NULL, );
> + }
> + ret = http_get_file(url, out_file, );

I didn't realize that http_get_file() -> http_request() codepath,
when it is the output file, already can do the "ftell and request
the reminder".  Very nice.

> @@ -1136,7 +1138,10 @@ static int handle_curl_result(struct slot_results 
> *results)
>   curl_easy_strerror(results->curl_result),
>   sizeof(curl_errorstr));
>  #endif
> - return HTTP_ERROR;
> + if (results->http_code >= 400)
> + return HTTP_ERROR;
> + else
> + return HTTP_ERROR_RESUMABLE;
>   }
>  }

Hmm, is "anything below 400" a good definition of resumable errors?


Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example

2016-09-16 Thread Jonathan Tan

On 09/16/2016 12:19 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


An existing sample message (0015) in the tests for mailinfo contains an
indented line immediately after an in-body header (without any
intervening blank line).


This comes from d25e5159 ("git am/mailinfo: Don't look at in-body
headers when rebasing", 2009-11-20), where we want to make sure that
a "From: bogosity" that isn't meant to be an in-body header is not
identified as such, even when it is immediately followed by a
non-blank line.  "From: bogosity" is for msg0015 but the same
applies to the header-looking block for msg0008.

Adding a blank line there will defeat the whole point of the test,
which is to make sure we don't do anything funky when --no-inbody-headers
is asked for, no?


Before I revise the patch set...I think that the point of 0015 would be 
handled by 0008 (after this patch is applied), but if you prefer that 
0015 retain its purpose, I can unindent the bullet list in 0015 instead 
of adding the extra line (and then dropping all 0008 changes). Would 
that be better? (0015 needs to be changed somehow, because its indented 
line would be interpreted as a continuation line after RFC/PATCH 3/3 is 
applied.)


git-subtree pull issue

2016-09-16 Thread Alexander Inyukhin
When git-subtree is pulling data using the tag reference,
it writes the tag's sha1 into a metadata.
It could be a problem next time, since this commit object
is not a part of main tree, and could be lost.

Steps to reproduce:

# add some stuff to a existing tree
# history will refer to v0.1 tag object instead of v0.1^{commit}
git subtree add --squash -P dir/ repo v0.1
# prune all dangling objects including external v0.1 tag
git gc --aggressive --prune=all
# this will fail
git subtree pull --squash -P dir/ repo v0.2


Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Jonathan Tan

On 09/16/2016 01:59 PM, Junio C Hamano wrote:

if (mi->in_line_header->len) {
/* we have read the beginning of one in-line header */
if (line->len && isspace(*line->buf) &&
!(mi->use_scissors && is_scissors_line(line))) {


Minor note: this means that the scissors check appears twice in the 
code, once here and once below (for the non-header case).



append to mi->in_line_header strbuf;
return 0;
}
/* otherwise we know mi->in_line_header is now complete */
check_header(mi, mi->in_line_header, ...);


(Sorry - should have also noticed this in your original e-mail.)

I'm concerned about what happens if check_header fails - we would then 
have some lines which need to be treated as log messages. (At least, 
they are currently treated that way.)


To treat them as log messages, we would need to convert them into UTF-8, 
which may possibly fail, so we would have to figure out how to clean up 
(we have to clean up because we cannot `die` immediately, at least to 
preserve the current behavior). Also, we are likely to detect such a 
failure only while processing a subsequent line - this non-"fail fast" 
currently is fine, but I'm concerned that it will hinder future 
development (especially when debugging).


Minor note: the buffer would also need to be more complicated (instead 
of the current single buffer), either:


o store newlines in that buffer (and we would need to remove all
  newlines before passing to check_header), or
o 2 buffers: one with newlines (for log messages) and one without (for
  check_header).

In light of the above (multiple scissors checks, late detection of 
failure, more complicated buffer), it seems clearer to me to just change 
the order of the checks (as in RFC/PATCH 1/3). This necessitates holding 
on to the old un-decoded buf and len, but this seems easier to me than 
the above.



strbuf_reset(>in_line_header);
}
...


[wishlist] disable boring messages

2016-09-16 Thread Alexander Inyukhin
Hi,

is it possible to make git silent, when nothing interesting
is happening?

I have a lot of repos and a batch script to update them all,
and I want to get rid of 'Fetching origin' and 'Already up-to-date.'
messages leaving only new refs and tags.


Re: [PATCH] mailinfo: unescape quoted-pair in header fields

2016-09-16 Thread Jeff King
On Fri, Sep 16, 2016 at 11:02:04PM +0200, Kevin Daudt wrote:

> rfc2822 has provisions for quoted strings in structured header fields,
> but also allows for escaping these with so-called quoted-pairs.
> 
> The only thing git currently does is removing exterior quotes, but
> quotes within are left alone.
> 
> Tell mailinfo to remove exterior quotes and remove escape characters from the
> author so that they don't show up in the commits author field.
> 
> Signed-off-by: Kevin Daudt 
> ---
> The only thing I could not easily fix is the prevent git am from
> removing any quotes around the author. This is done in fmt_ident,
> which calls `strbuf_addstr_without_crud`. 

Ah, OK. I was wondering where that stripping was being done. That makes
sense, and makes me doubly confident this is the right place to be doing
it, since the other quote-stripping was not even intentional, but just a
side effect of the low-level routines.

I think it is OK to leave it in place. If you really want your name to
be:

  "My Name is Always in Quotes"

then tough luck. Git does not support it via git-am, but nor does it via
git-commit, etc.

>  mailinfo.c | 54 
> ++
>  t/t5100-mailinfo.sh|  6 ++
>  t/t5100/quoted-pair.expect |  5 +
>  t/t5100/quoted-pair.in |  9 
>  4 files changed, 74 insertions(+)
>  create mode 100644 t/t5100/quoted-pair.expect
>  create mode 100644 t/t5100/quoted-pair.in
> 
> diff --git a/mailinfo.c b/mailinfo.c
> index e19abe3..04036f3 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const 
> struct strbuf *line)
>   get_sane_name(>name, >name, >email);
>  }
>  
> +static int unquote_quoted_string(struct strbuf *line)
> +{
> + struct strbuf outbuf;
> + const char *in = line->buf;
> + int c, take_next_literally = 0;
> + int found_error = 0;
> + char escape_context=0;

Style: whitespace around "=".

I had to wonder why we needed both escape_context and
take_next_literally; shouldn't we just need a single state bit. But
escape_context is not "escape the next character", it is "we are
currently in a mode where we should be escaping".

Could we give it a more descriptive name? I guess it is more than just
"we are in a mode", but rather "here is the character that will end the
escaped mode". Maybe a comment would be more appropriate.

> + while ((c = *in++) != 0) {
> + if (take_next_literally) {
> + take_next_literally = 0;
> + } else {

OK, so that means the previous one was backslash-quoted, and we don't do
any other cleverness. Good.

> + switch (c) {
> + case '"':
> + if (!escape_context)
> + escape_context = '"';
> + else if (escape_context == '"')
> + escape_context = 0;
> + continue;

And here we open or close the quoted portion, depending. Makes sense.

> + case '\\':
> + if (escape_context) {
> + take_next_literally = 1;
> + continue;
> + }
> + break;

I didn't look in the RFC. Is:

  From: my \"name\" 

really the same as:

  From: "my \\\"name\\\"" 

? That seems weird, but I think it may be that the former is simply
bogus (you are not supposed to use backslashes outside of the quoted
section at all).

> + case '(':
> + if (!escape_context)
> + escape_context = '(';
> + else if (escape_context == '(')
> + found_error = 1;
> + break;

Hmm. Is:

  From: Name (Comment with (another comment))

really disallowed? RFC2822 seems to say that "comment" can contain
"ccontent", which can itself be a comment.

This is obviously getting pretty silly, but if we are going to follow
the RFC, I think you actually have to do a recursive parse, and keep
track of an arbitrary depth of context.

I dunno. This method probably covers most cases in practice, and it's
easy to reason about.

> + case ')':
> + if (escape_context == '(')
> + escape_context = 0;
> + break;
> + }
> + }
> +
> + strbuf_addch(, c);
> + }
> +
> + strbuf_reset(line);
> + strbuf_addbuf(line, );
> + strbuf_release();

I think you can use strbuf_swap() here to avoid copying the line an
extra time, like:

  strbuf_swap(line, );
  strbuf_release();

Another option would be to just:


Re: [PATCH 03/11] pkt-line: create gentle packet_read_line functions

2016-09-16 Thread Junio C Hamano
Kevin Wern  writes:

>   /* And complain if we didn't get enough bytes to satisfy the read. */
>   if (ret < size) {
> - if (options & PACKET_READ_GENTLE_ON_EOF)
> + if (options & (PACKET_READ_GENTLE_ON_EOF | 
> PACKET_READ_GENTLE_ALL))
>   return -1;

The name _ALL suggested to me that there may be multiple "under this
condition, be gentle", "under that condition, be gentle", and _ALL
is used as a catch-all "under any condition, be gentle".  If you
defined _ALL symbol to have all GENTLE bits on, this line could have
become

if (options & PACKET_READ_GENTLE_ALL)

> @@ -205,15 +209,23 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
>   if (ret < 0)
>   return ret;
>   len = packet_length(linelen);
> - if (len < 0)
> + if (len < 0) {
> + if (options & PACKET_READ_GENTLE_ALL)
> + return -1;

On the other hand, however, you do want to die here when only
GENTLE_ON_EOF is set.

Taking the above two observations together, I'd have to say that
_ALL is probably a misnomer.  I agree with a need for a flag with
the behaviour you defined in this patch, though.

>   die("protocol error: bad line length character: %.4s", linelen);

>  static char *packet_read_line_generic(int fd,
> char **src, size_t *src_len,
> -   int *dst_len)
> +   int *dst_len, int flags)

The original one is called options, not flags, and it would be
easier to follow if it is consistently called options, instead of
requiring the reader to keep track of "ah, it is called flags here
but the callee renames it to options".

> +/*
> + * Same as packet_read_line, but does not die on any errors (uses
> + * PACKET_READ_GENTLE_ALL).
> + */
> +char *packet_read_line_gentle(int fd, int *len_p);
> +
> +/*
> + * Same as packet_read_line_buf, but does not die on any errors (uses
> + * PACKET_READ_GENTLE_ALL).
> + */
> +char *packet_read_line_buf_gentle(char **src_buf, size_t *src_len, int 
> *size);

I think most if not all "do the same thing as do_something() but
report errors instead of dying" variant of functions are named
do_something_gently(), not do_something_gentle().



Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Jeff King
On Fri, Sep 16, 2016 at 10:37:24AM -0700, Jonathan Tan wrote:

> Mailinfo currently handles multi-line headers, but it does not handle
> multi-line in-body headers. Teach it to handle such headers, for
> example, for this input:
> 
>   Subject: a very long
>broken line
> 
>   Subject: another very long
>broken line
> 
> interpret the in-body subject to be "another very long broken line"
> instead of "another very long".

This puzzled me; we should stop parsing in-body headers after the first
blank line. But then I realized you probably meant the first "Subject"
to be the real mail header.

I wonder if it would be more obvious with an example like:

  From: ...
  Date: ...
  Subject: the actual mail subject

  Subject: a very long
broken line

Or something.

-Peff


Re: [RFC/PATCH 1/3] mailinfo: refactor commit message processing

2016-09-16 Thread Jeff King
On Fri, Sep 16, 2016 at 12:12:50PM -0700, Junio C Hamano wrote:

> > +static int check_header_raw(struct mailinfo *mi,
> > +   char *buf, size_t len,
> > +   struct strbuf *hdr_data[], int overwrite) {
> > +   const struct strbuf sb = {0, len, buf};
> > +   return check_header(mi, , hdr_data, overwrite);
> > +}
> 
> IIUC, this is a helper for callers that do not have a strbuf but
> instead have  pair to perform the same check_header() the
> callers that have strbuf can do.
> 
> As check_header() uses the strbuf as a read-only entity, wrapping
> the  pair in a temporary strbuf like this is safe.
> 
> The incoming  should conceptually be "const char *", but it's
> OK.

I think the "right" way to do this would be to continue taking a "char
*", and then strbuf_attach() it. That saves us from unexpectedly
violating any strbuf invariants.

If our assumption that check_header() does not touch the
contents turns out to be wrong, neither strategy would inform our
caller, though. I think you'd want something like:

  assert(sb.buf == buf);

after check_header() returns (though I guess we are in theory protected
by the "const").

That being said...

> If check_header() didn't call any helper function that gets passed
>  as a strbuf, or if convertiong the helper function to take a
>  pair instead, I would actually suggest refactoring this
> the other way around, though.  That is, move the implementation of
> check_header() to this function, updating its reference to line->buf
> and line->len to reference to  and , and then make
> check_header() a thin wrapper that does
> 
> check_header(mi, const struct strbuf *line,
>  struct strbuf *hdr_data[], int overwrite)
> {
> return check_header_raw(mi, line->buf, line->len,
> hdr_data, overwrite);
> }

This is _way_ better, and it looks like check_header() could handle it
easily. Looking at it, I also suspect the cascading if in that function
could be made more pleasant by modeling cmp_header()'s interface after
skip_prefix_mem(), but that is totally orthogonal and optional.

-Peff


[PATCH] mailinfo: unescape quoted-pair in header fields

2016-09-16 Thread Kevin Daudt
rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.

The only thing git currently does is removing exterior quotes, but
quotes within are left alone.

Tell mailinfo to remove exterior quotes and remove escape characters from the
author so that they don't show up in the commits author field.

Signed-off-by: Kevin Daudt 
---
The only thing I could not easily fix is the prevent git am from removing any 
quotes around the author. This is done in fmt_ident, which calls 
`strbuf_addstr_without_crud`. 

 mailinfo.c | 54 ++
 t/t5100-mailinfo.sh|  6 ++
 t/t5100/quoted-pair.expect |  5 +
 t/t5100/quoted-pair.in |  9 
 4 files changed, 74 insertions(+)
 create mode 100644 t/t5100/quoted-pair.expect
 create mode 100644 t/t5100/quoted-pair.in

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..04036f3 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const 
struct strbuf *line)
get_sane_name(>name, >name, >email);
 }
 
+static int unquote_quoted_string(struct strbuf *line)
+{
+   struct strbuf outbuf;
+   const char *in = line->buf;
+   int c, take_next_literally = 0;
+   int found_error = 0;
+   char escape_context=0;
+
+   strbuf_init(, line->len);
+
+   while ((c = *in++) != 0) {
+   if (take_next_literally) {
+   take_next_literally = 0;
+   } else {
+   switch (c) {
+   case '"':
+   if (!escape_context)
+   escape_context = '"';
+   else if (escape_context == '"')
+   escape_context = 0;
+   continue;
+   case '\\':
+   if (escape_context) {
+   take_next_literally = 1;
+   continue;
+   }
+   break;
+   case '(':
+   if (!escape_context)
+   escape_context = '(';
+   else if (escape_context == '(')
+   found_error = 1;
+   break;
+   case ')':
+   if (escape_context == '(')
+   escape_context = 0;
+   break;
+   }
+   }
+
+   strbuf_addch(, c);
+   }
+
+   strbuf_reset(line);
+   strbuf_addbuf(line, );
+   strbuf_release();
+
+   return found_error;
+
+}
+
 static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 {
char *at;
size_t el;
struct strbuf f;
 
+
strbuf_init(, from->len);
strbuf_addbuf(, from);
 
+   unquote_quoted_string();
+
at = strchr(f.buf, '@');
if (!at) {
parse_bogus_from(mi, from);
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 1a5a546..d0c21fc 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -142,4 +142,10 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
test_cmp expect mboxrd/msg
 '
 
+test_expect_success 'mailinfo unescapes rfc2822 quoted-string' '
+mkdir quoted-pair &&
+git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in 
>quoted-pair/info &&
+test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect quoted-pair/info
+'
+
 test_done
diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect
new file mode 100644
index 000..cab1bce
--- /dev/null
+++ b/t/t5100/quoted-pair.expect
@@ -0,0 +1,5 @@
+Author: Author "The Author" Name
+Email: someb...@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/quoted-pair.in b/t/t5100/quoted-pair.in
new file mode 100644
index 000..e2e627a
--- /dev/null
+++ b/t/t5100/quoted-pair.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "Author \"The Author\" Name" 
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted-pair
+
+
+
+---
+patch
-- 
2.10.0.86.g6ffa4f1.dirty



Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

>> handle_commit_msg(...)
>> {
>>  if (mi->in_line_header->len) {
>>  /* we have read the beginning of one in-line header */
>>  if (line->len && isspace(*line->buf))
>
> This would mean that a message like the following:
>
>   From: Me 
> -- 8< -- this scissors line will be treated as part of "From"
>
> would have its scissors line treated as a header.
>
> The main reason why I reordered the checks (in RFC/PATCH 1/3) is to
> avoid this (treating a scissors line with an initial space immediately
> following an in-body header as part of a header).
>
> (If this is not a concern then yes, I agree that the way you described
> is simpler and better.)

Ahh, OK.  I do not think anybody sane would do the "From:" thing,
but with the "does it look like 2822 header" check to decide if the
first header-looking line should be queued, another failure mode may
be:

any-random-alpha-and-dash-string: 
 -- >8 -- cut here -- >8 --
Subject: real subject

The first line of the real message

I personally do not think it matters that much, but if we wanted to
protect us from it we could easily do

if (mi->in_line_header->len) {
/* we have read the beginning of one in-line header */
if (line->len && isspace(*line->buf) &&
!(mi->use_scissors && is_scissors_line(line))) {
append to mi->in_line_header strbuf;
return 0;
}
/* otherwise we know mi->in_line_header is now complete */
check_header(mi, mi->in_line_header, ...);
strbuf_reset(>in_line_header);
}
...

instead, I think.


Re: [PATCH 01/11] Resumable clone: create service git-prime-clone

2016-09-16 Thread Junio C Hamano
Kevin Wern  writes:

> Create git-prime-clone, a program to be executed on the server that
> returns the location and type of static resource to download before
> performing the rest of a clone.
>
> Additionally, as this executable's location will be configurable (see:
> upload-pack and receive-pack), add the program to
> BINDIR_PROGRAMS_NEED_X, in addition to the usual builtin places. Add
> git-prime-clone executable to gitignore, as well
>
> Signed-off-by: Kevin Wern 
> ---

I wonder if we even need a separate service like this.

Wouldn't a new protocol capability that is advertised from
upload-pack sufficient to tell the "git clone" that it can
and should consider priming from this static resource?

> +static void prime_clone(void)
> +{
> + if (!enabled) {
> + fprintf(stderr, _("prime-clone not enabled\n"));
> + }
> + else if (url && filetype){
> + packet_write(1, "%s %s\n", filetype, url);
> + }
> + else if (url || filetype) {
> + if (filetype)
> + fprintf(stderr, _("prime-clone not properly "
> +   "configured: missing url\n"));
> + else if (url)
> + fprintf(stderr, _("prime-clone not properly "
> +   "configured: missing filetype\n"));
> + }
> + packet_flush(1);
> +}

Two minor comments:

 - For whom are you going to localize these strings?  This program
   is running on the server side and we do not know the locale
   preferred by the end-user who is sitting on the other end of the
   connection, no?

 - Turn "}\n\s+else " into "} else ", please.



Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Jonathan Tan

On 09/16/2016 01:17 PM, Junio C Hamano wrote:

In other words, wouldn't something like the illustration at the end
of this message sufficient?  If the body consists solely of in-body
header without any patch or patchbreak, we may reach EOF with
something in mi->in_line_header buffer and nothing in
mi->log_message and without this function getting any chance to
return 1, so a careful caller may want to flush in_line_header, but
the overall result of the mailinfo subsystem in such a case would be
an error ("you didn't have any patch or a message?"), so it may not
matter too much.


Noted. (This was one of my concerns - that the caller should, but did 
not, flush.)



What am I missing?

handle_commit_msg(...)
{
if (mi->in_line_header->len) {
/* we have read the beginning of one in-line header */
if (line->len && isspace(*line->buf))


This would mean that a message like the following:

  From: Me 
-- 8< -- this scissors line will be treated as part of "From"

would have its scissors line treated as a header.

The main reason why I reordered the checks (in RFC/PATCH 1/3) is to 
avoid this (treating a scissors line with an initial space immediately 
following an in-body header as part of a header).


(If this is not a concern then yes, I agree that the way you described 
is simpler and better.)



append to mi->in_line_header strbuf;
return 0;
/* otherwise we know mi->in_line_header is now complete */
check_header(mi, mi->in_line_header, ...);
strbuf_reset(>in_line_header);
}

if (mi->header_stage && (it is a blank line))
return 0;

if (mi->use_inbody_headers && mi->header_stage &&
(the line looks like beginning of 2822 header)) {
strbuf_addbuf(>in_line_header, line);
return 0;
}
/* otherwise we are no longer looking at headers */
mi->header_stage = 0;

/* normalize the log message to UTF-8. */
if (convert_to_utf8(mi, line, mi->charset.buf))
return 0; /* mi->input_error already set */

if (mi->use_scissors && is_scissors_line(line)) {
int i;

strbuf_setlen(>log_message, 0);
mi->header_stage = 1;

/*
 * We may have already read "secondary headers"; purge
 * them to give ourselves a clean restart.
 */
for (i = 0; header[i]; i++) {
if (mi->s_hdr_data[i])
strbuf_release(mi->s_hdr_data[i]);
mi->s_hdr_data[i] = NULL;
}
return 0;
}

if (patchbreak(line)) {
if (mi->message_id)
strbuf_addf(>log_message,
"Message-Id: %s\n", mi->message_id);
return 1;
}

strbuf_addbuf(>log_message, line);
return 0;
}




Re: [PATCH 00/11] Resumable clone

2016-09-16 Thread Junio C Hamano
Kevin Wern  writes:

> It's been a while (sent a very short patch in May), but I've
> still been working on the resumable clone feature and checking up on
> the mailing list for any updates. After submitting the prime-clone
> service alone, I figured implementing the whole thing would be the best
> way to understand the full scope of the problem (this is my first real
> contribution here, and learning while working on such an involved
> feature has not been easy). 

It may not have been easy but I hope it has been a fun journey for
you ;-)

> On the client side, the transport_prime_clone and
> transport_download_primer APIs are built to be more robust (i.e. read
> messages without dying due to protocol errors), so that git clone can
> always try them without being dependent on the capability output of
> git-upload-pack. transport_download_primer is dependent on the success
> of transport_prime_clone, but transport_prime_clone is always run on an
> initial clone. Part of achieving this robustness involves adding
> *_gentle functions to pkt_line, so that prime_clone can fail silently
> without dying.

OK.

> Right now, a manually resumable directory is left behind only if the
> *client* is interrupted while a new junk mode, JUNK_LEAVE_RESUMABLE,
> is set (right before the download). For an initial clone, if the
> connection fails after automatic resuming, the client erases the
> partial resources and falls through to a normal clone. However, once a
> resumable directory is left behind by the program, it is NEVER
> deleted/abandoned after it is continued with --resume.

Sounds like you made a sensible design decision here.

>   - When running with ssh and a password, the credentials are
> prompted for twice. I don't know if there is a way to
> preserve credentials between executions. I couldn't find any
> examples in git's source.

We leave credentail reuse to keyring services like ssh-agent.



Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

> Instead of repeatedly calling "check_header" (as in this patch), one
> alternate method to accomplish this would be to have a buffer of
> potential header text in struct mailinfo to be flushed whenever a header
> is known to end (for example, if we detect the start of a new header),
> but this makes the logic more complicated - for example, the flushing
> would not only invoke check_header but would also need to reconstruct
> the original lines, possibly decode them into UTF-8, and store them in
> log_message, and any failures would be noticed a few "lines" away from
> the original failure point. Also, care would need to be taken to flush
> the buffer at all appropriate places.

I am not sure how much the UTF-8 decoding argument above matters.

The current way handle_commit_msg() is structured (before any of
your patches) is for it to take one raw line at a time and:

- If we haven't seen a non-header line (i.e. at the beginning,
  or we were reading in-body headers), return without doing
  anything.

- If we are told to honor in-body headers and if we haven't seen
  a non-header line, see if the line itself looks like a header
  and if so, handle it as an in-body header and return.  If that
  line is not an in-body header, continue processing.

- If the processing reaches at this point, we are done with the
  headers (i.e. mi->header_stage is set to 0).

- Make sure the line is in utf8.

- If it is a scissors line and we are told to honor scissors
  lines, ignore what we have read so far and go back to "we
  haven't seen a non-header line" state and return.

- If it is a patch break, return and signal the caller we are
  done with the log message.

- Otherwise accumulate the line as part of the log message.

The bug we want to address is in the second step.  We only look at
the first line of folded in-body header line, because we are fed one
line at a time.

If we keep the location of UTF8 conversion, and buffered the in-body
header in "struct mailinfo *mi" (like you seem to do in this patch),
what we will queue there will be _before_ conversion.  We'd call
check_header() on it once we know one logical line of a header is
accumulated, and check_header() would do the right conversion via
decode_header() etc., so I do not see why you need to worry about
the encoding issues at all.

I wonder if the simplest would be to introduce another state in the
state machine that is "we know we are processing in-body header, and
we have read early part of an in-body header line that may not be
complete".

In other words, wouldn't something like the illustration at the end
of this message sufficient?  If the body consists solely of in-body
header without any patch or patchbreak, we may reach EOF with
something in mi->in_line_header buffer and nothing in
mi->log_message and without this function getting any chance to
return 1, so a careful caller may want to flush in_line_header, but
the overall result of the mailinfo subsystem in such a case would be
an error ("you didn't have any patch or a message?"), so it may not
matter too much.

What am I missing?

handle_commit_msg(...)
{
if (mi->in_line_header->len) {
/* we have read the beginning of one in-line header */
if (line->len && isspace(*line->buf))
append to mi->in_line_header strbuf;
return 0;
/* otherwise we know mi->in_line_header is now complete */
check_header(mi, mi->in_line_header, ...);
strbuf_reset(>in_line_header);
}

if (mi->header_stage && (it is a blank line))
return 0;

if (mi->use_inbody_headers && mi->header_stage &&
(the line looks like beginning of 2822 header)) {
strbuf_addbuf(>in_line_header, line);
return 0;
}
/* otherwise we are no longer looking at headers */
mi->header_stage = 0;

/* normalize the log message to UTF-8. */
if (convert_to_utf8(mi, line, mi->charset.buf))
return 0; /* mi->input_error already set */

if (mi->use_scissors && is_scissors_line(line)) {
int i;

strbuf_setlen(>log_message, 0);
mi->header_stage = 1;

/*
 * We may have already read "secondary headers"; purge
 * them to give ourselves a clean restart.
 */
for (i = 0; header[i]; i++) {
if (mi->s_hdr_data[i])
strbuf_release(mi->s_hdr_data[i]);
mi->s_hdr_data[i] = NULL;
}
return 0;
}

if (patchbreak(line)) {
if (mi->message_id)
strbuf_addf(>log_message,

Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

> An existing sample message (0015) in the tests for mailinfo contains an
> indented line immediately after an in-body header (without any
> intervening blank line).

This comes from d25e5159 ("git am/mailinfo: Don't look at in-body
headers when rebasing", 2009-11-20), where we want to make sure that
a "From: bogosity" that isn't meant to be an in-body header is not
identified as such, even when it is immediately followed by a
non-blank line.  "From: bogosity" is for msg0015 but the same
applies to the header-looking block for msg0008.

Adding a blank line there will defeat the whole point of the test,
which is to make sure we don't do anything funky when --no-inbody-headers
is asked for, no?

> diff --git a/t/t5100/info0008--no-inbody-headers 
> b/t/t5100/info0008--no-inbody-headers
> new file mode 100644
> index 000..e8a2951
> --- /dev/null
> +++ b/t/t5100/info0008--no-inbody-headers
> @@ -0,0 +1,5 @@
> +Author: Junio C Hamano
> +Email: ju...@kernel.org
> +Subject: another patch
> +Date: Fri, 9 Jun 2006 00:44:16 -0700
> +
> diff --git a/t/t5100/msg0008--no-inbody-headers 
> b/t/t5100/msg0008--no-inbody-headers
> new file mode 100644
> index 000..d6e950e
> --- /dev/null
> +++ b/t/t5100/msg0008--no-inbody-headers
> @@ -0,0 +1,6 @@
> +From: A U Thor 
> +Subject: [PATCH] another patch
> +>Here is an empty patch from A U Thor.
> +
> +Hey you forgot the patch!
> +
> diff --git a/t/t5100/msg0015--no-inbody-headers 
> b/t/t5100/msg0015--no-inbody-headers
> index be5115b..44a6ce7 100644
> --- a/t/t5100/msg0015--no-inbody-headers
> +++ b/t/t5100/msg0015--no-inbody-headers
> @@ -1,3 +1,4 @@
>  From: bogosity
> +
>- a list
>- of stuff
> diff --git a/t/t5100/patch0008--no-inbody-headers 
> b/t/t5100/patch0008--no-inbody-headers
> new file mode 100644
> index 000..e69de29
> diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox
> index 8b2ae06..ba8b208 100644
> --- a/t/t5100/sample.mbox
> +++ b/t/t5100/sample.mbox
> @@ -656,6 +656,7 @@ Subject: check bogus body header (from)
>  Date: Fri, 9 Jun 2006 00:44:16 -0700
>  
>  From: bogosity
> +
>- a list
>- of stuff
>  ---


Re: [RFC/PATCH 1/3] mailinfo: refactor commit message processing

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

> Within the processing of the commit message, check for a scissors line
> or a patchbreak line first (before checking for in-body headers) so that
> a subsequent patch modifying the processing of in-body headers would not
> cause a scissors line or patchbreak line to be misidentified.
>
> If a line could be both an in-body header and a scissors line (for
> example, "From: -- >8 --"), this is considered a fatal error
> (previously, it would be interpreted as an in-body header).

The scissors line is designed to allow garbage other than scissors
and perforation marks to be on the same line, i.e.

/*
 * The mark must be at least 8 bytes long (e.g. "-- >8 --").
 * Even though there can be arbitrary cruft on the same line
 * (e.g. "cut here"), in order to avoid misidentification, the
 * perforation must occupy more than a third of the visible
 * width of the line, and dashes and scissors must occupy more
 * than half of the perforation.
 */

Even though it is not likely for people to do so, it would probably
be nicer if we can treat

From: -- >8 -- cut -- >8 -- >8 -- here -- >8 --

as a scissors line instead of making it a fatal error, by treating
that "From:" as just a random garbage.

But this is a minor point.  It is not worth to make it work like so
if the resulting code will become messier.

> The following enumeration shows that processing is the same except (as
> described above) the in-body header + scissors line case.
>
> o in-body header (check_header OK)
>   o passes UTF-8 conversion
> o [described above] is scissors line
> o [not possible] is patchbreak line
> o [not possible] is blank line
> o is none of the above - processed as header
>   o fails UTF-8 conversion - processed as header
> o not in-body header
>   o passes UTF-8 conversion
> o is scissors line - processed as scissors
> o is patchbreak line - processed as patchbreak
> o is blank line - ignored if in header_stage
> o is none of the above - log message
>   o fails UTF-8 conversion - input error
>
> As for the result left in "line" (after the invocation of
> handle_commit_msg), it is unused (by its caller, handle_filter, and by
> handle_filter's callers, handle_boundary and handle_body) unless this
> line is a patchbreak line, in which case handle_patch is subsequently
> called (in handle_filter) on "line". In this case, "line" must have
> passed UTF-8 conversion both before and after this patch, so the result
> is still the same overall.
>
> Signed-off-by: Jonathan Tan 
> ---
>  mailinfo.c | 145 
> -
>  1 file changed, 115 insertions(+), 30 deletions(-)
>
> diff --git a/mailinfo.c b/mailinfo.c
> index e19abe3..23a56c2 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -340,23 +340,56 @@ static struct strbuf *decode_b_segment(const struct 
> strbuf *b_seg)
>   return out;
>  }
>  
> -static int convert_to_utf8(struct mailinfo *mi,
> -struct strbuf *line, const char *charset)
> +/*
> + * Attempts to convert line into UTF-8, storing the result in line.
> + *
> + * This differs from convert_to_utf8 in that conversion non-success is not
> + * considered an error case - mi->input_error is not set, and no error 
> message
> + * is printed.
> + *
> + * If the conversion is unnecessary, returns 0 and stores NULL in old_buf (if
> + * old_buf is not NULL).
> + *
> + * If the conversion is successful, returns 0 and stores the unconverted 
> string
> + * in old_buf and old_len (if they are respectively not NULL).
> + *
> + * If the conversion is unsuccessful, returns -1.
> + */
> +static int try_convert_to_utf8(const struct mailinfo *mi, struct strbuf 
> *line,
> +const char *charset, char **old_buf,
> +size_t *old_len)
>  {
> - char *out;
> + char *utf8;
>  
> - if (!mi->metainfo_charset || !charset || !*charset)
> + if (!mi->metainfo_charset || !charset || !*charset ||
> + same_encoding(mi->metainfo_charset, charset)) {
> + if (old_buf)
> + *old_buf = NULL;
>   return 0;
> + }
>  
> - if (same_encoding(mi->metainfo_charset, charset))
> + utf8 = reencode_string(line->buf, mi->metainfo_charset, charset);
> + if (utf8) {
> + char *temp = strbuf_detach(line, old_len);
> + if (old_buf)
> + *old_buf = temp;
> + strbuf_attach(line, utf8, strlen(utf8), strlen(utf8));
>   return 0;
> - out = reencode_string(line->buf, mi->metainfo_charset, charset);
> - if (!out) {
> + }
> + return -1;
> +}
> +
> +/*
> + * Converts line into UTF-8, setting mi->input_error to -1 upon failure.
> + */
> +static int convert_to_utf8(struct mailinfo *mi,
> +struct strbuf 

Re: [RFC] extending pathspec support to submodules

2016-09-16 Thread Brandon Williams
On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano  wrote:
>
>  * Your program that runs in the top-level superproject still needs
>to be able to say "this pathspec from the top cannot possibly
>match anything in the submodule, so let's not even bother
>descending into it".
>

Yes, we would need to first check if the submodule is a prefix match to the
pathspec.  ie a submodule 'sub' would need to match the pathspec 'sub/somedir'
or '*.txt' but not the pathspecs 'subdirectory' or 'otherdir'

> > >So we may have to rethink what this option name should be.  "You
> > >are running in a repository that is used as a submodule in a
> > >larger context, which has the submodule at this path" is what the
> > >option tells the command; if any existing command already has
> > >such an option, we should use it.  If we are inventing one,
> > >perhaps "--submodule-path" (I didn't check if there are existing
> > >options that sound similar to it and mean completely different
> > >things, in which case that name is not usable)?
> >
> > Would it make sense to add the '--submodule-path' to a more generic
> > part of the code? It's not just ls-files/grep that have to solve exactly 
> > this
> > problem. Up to now we just did not go for those commands, though.
>
> Yes I think so, since it should also handle starting from a submodule
> with a pathspec to the superproject or other submodule. In case we
> go with my above suggestion I would suggest a more generic name since
> the option could also be passed to processes handling the superproject.
> E.g. something like --module-prefix or --repository-prefix comes to my
> mind, not checked though.

Yeah we may want to come up with a more descriptive option name now which can
be generally applied, especially if we are going to continue adding submodule
support for other commands.

-Brandon


Re: [RFC/PATCH 0/3] handle multiline in-body headers

2016-09-16 Thread Junio C Hamano
Jonathan Tan  writes:

> Thanks, Peff, for the explanation and the method to reproduce the issue.
>
> The issue seems to be in mailinfo.c - this patch set addresses that, and I 
> have
> also included a test for "git am" in t/t4150-am.sh to show the effect of this
> patch set on that command.

Thanks, will take a look.

> Jonathan Tan (3):
>   mailinfo: refactor commit message processing
>   mailinfo: correct malformed test example
>   mailinfo: handle in-body header continuations
>
>  mailinfo.c   | 165 
> ---
>  mailinfo.h   |   1 +
>  t/t4150-am.sh|  23 +
>  t/t5100-mailinfo.sh  |   4 +-
>  t/t5100/info0008--no-inbody-headers  |   5 ++
>  t/t5100/info0018 |   5 ++
>  t/t5100/msg0008--no-inbody-headers   |   6 ++
>  t/t5100/msg0015--no-inbody-headers   |   1 +
>  t/t5100/msg0018  |   2 +
>  t/t5100/patch0008--no-inbody-headers |   0
>  t/t5100/patch0018|   6 ++
>  t/t5100/sample.mbox  |  20 +
>  12 files changed, 206 insertions(+), 32 deletions(-)
>  create mode 100644 t/t5100/info0008--no-inbody-headers
>  create mode 100644 t/t5100/info0018
>  create mode 100644 t/t5100/msg0008--no-inbody-headers
>  create mode 100644 t/t5100/msg0018
>  create mode 100644 t/t5100/patch0008--no-inbody-headers
>  create mode 100644 t/t5100/patch0018


Re: [wishlist?] make submodule commands robust to having non-submodule Subprojects

2016-09-16 Thread Junio C Hamano
Heiko Voigt  writes:

> On Thu, Sep 15, 2016 at 11:27:54AM -0700, Junio C Hamano wrote:
>
>> If the trend in Git community collectively these days is to make
>> usage of submodules easier and smoother, I'd imagine that you would
>> want to teach "git add" that was given a submodule to "git submodule
>> add" instead by default, with an option "git add --no-gitmodules
>> sub" to disable it, or something like that.
>> 
>> >  $ git submodule add --fixup-modules-file ./sub sub
>> >  Adding .gitmodule entry only for `sub` to use `git -C remote
>> > show origin` as URL.
>> 
>> I agree that a feature like this is needed regardless of what
>> happens at "git add" time.
>
> How about just
>
>git submodule add 

When I said "a feature like this is needed", I didn't care about
exact syntax.  I am not sure how often people need the "fixup", what
kind of causes there are that they need the "fixup", and what the
distribution of vaious causes would be like.  If the _ONLY_ kind of
fixup necessary is "I meant to say 'git submodule add ./path path'
but I said 'git add path' instead", then I think it makes sense to
teach "submodule add" that the form "git submodule add " is a
short-pand for "git submodule add ./ ".  I am not sure
if we want to _ignore_ a gitlink that is already in the index
unconditionally, i.e. if it is a good idea to let the second one
override the first one

git submodule add $URL sub &&
git submodule add sub

in this sequence, though.

> ? I remember back in the days when I started with submodules thats the
> way I imagined submodules would work:
>
> 1. clone the submodule into a directory
> 2. git submodule add it
> 3. git commit everything
>
> Because that how you basically work with files.  So instead of adding
> another option I would rather like to autodetect that:
>
>  * its a relative path inside this repo that is passed to
>'git submodule add'
>  * there is no .gitmodules entry
>  * and no .git/config
> ==> create those from a remote in the submodule

In other words, I agree with the general direction but I'd add
another condition to the above three, i.e.

   * and there is no gitlink for that path in the index yet.



Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-16 Thread Junio C Hamano
Heiko Voigt  writes:

> On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote:
>> > By the way, with the two new patches, 'pu' seems to start failing
>> > some tests, e.g. 5533 5404 5405.
>> 
>> Ah ok I did only test on master, will look into those.
>
> Ok I had a look into these and the reason t5533 fails is because on pu
> --recurse-submodules is enabled by default and I missed the case when
> overwriting a ref. In that case we get the sha1 from the remote side as
> old. So we could catch that and fall back to all revisions there, but...
>
> ... tl;dr: The solution to use the old revisions from the remote side
> was too simple and does not make matters better but actually worse for
> some typical usecases. Its only in the last patch.

You may not even have the old one in your copy of the remote
repository if you haven't fetched from them and you are forcing your
push.  "rev-list  --not " may fail in such a case,
not producing the list of new commits.  You'd need to exclude old ones
you learned over the wire that you do not yet have locally.

> The most exact solution would be to use all actual remote refs available
> (not sure if we have them at this point in the process?) another
> solution would be to still append the --remotes= option as a
> fallback to reduce the revisions.

I'd say --remotes= is the least problematic thing to do.



Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-16 Thread Junio C Hamano
Heiko Voigt  writes:

> +static void append_hash_to_argv(const unsigned char sha1[20], void *data)
>  {
> - if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> + struct argv_array *argv = (struct argv_array *) data;
> + argv_array_push(argv, sha1_to_hex(sha1));
> +}

Hmph, why do I think I've seen this before in the previous patch?

... scans through this patch and finds that a similar one is
removed ;-)

OK.  This makes sense.

> +static void check_has_hash(const unsigned char sha1[20], void *data)
> +{
> + int *has_hash = (int *) data;
> +
> + if (!lookup_commit_reference(sha1))
> + *has_hash = 0;
> +}
> +
> +static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
> +{
> + int has_hash = 1;
> +
> + if (add_submodule_odb(path))
> + return 0;
> +
> + sha1_array_for_each_unique(hashes, check_has_hash, _hash);
> + return has_hash;
> +}
> +
> +static int submodule_needs_pushing(const char *path, struct sha1_array 
> *hashes)
> +{
> + if (!submodule_has_hashes(path, hashes))
>   return 0;

I think you meant well, but this optimization is wrong.  A mere
presence of an object does not mean that the current tip can reach
that object.  Imagine you pushed commit A earlier to them at the
tip, then pushed commit A~ to them at the tip, which is the current
state of the remote of the submodule, and since them they may have
GC'ed.  They no longer have the commit A.

For that matter, because you are doing this check by pretending as
if all the submodule objects are in the object store of the current
superproject you are working in, and saying "it exists there in the
submodule repository" when the only thing you know is it exists in
an object store of either the submodule repository, the superproject
repository, or any of the other submodule repositories, you really
cannot tell much from a mere presence of an object.  Not just the
remote of the submodule repository you are interested in, but the
submodule repository you are interested in itself, may not have that
object.

Drop the previous two helper functions and this short-cut.

>   if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
>   struct child_process cp = CHILD_PROCESS_INIT;
> - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
> "-n", "1" , NULL};
> +
> + argv_array_push(, "rev-list");
> + sha1_array_for_each_unique(hashes, append_hash_to_argv, 
> );
> + argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
> NULL);
> +
>   struct strbuf buf = STRBUF_INIT;
>   int needs_pushing = 0;
>  
> - argv[1] = sha1_to_hex(sha1);
> - cp.argv = argv;
>   prepare_submodule_repo_env(_array);
>   cp.git_cmd = 1;
>   cp.no_stdin = 1;
>   cp.out = -1;
>   cp.dir = path;
>   if (start_command())
> - die("Could not run 'git rev-list %s --not --remotes -n 
> 1' command in submodule %s",
> - sha1_to_hex(sha1), path);
> + die("Could not run 'git rev-list  --not 
> --remotes -n 1' command in submodule %s",
> + path);
>   if (strbuf_read(, cp.out, 41))
>   needs_pushing = 1;
>   finish_command();
> @@ -601,21 +628,6 @@ static void find_unpushed_submodule_commits(struct 
> commit *commit,
>   diff_tree_combined_merge(commit, 1, );
>  }

Good.  This is the optimization I alluded to in the review of the
first one in the series.


Re: [PATCH 2/2] serialize collection of refs that contain submodule changes

2016-09-16 Thread Junio C Hamano
Heiko Voigt  writes:

> diff --git a/submodule.c b/submodule.c
> index b04c066..a15e346 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list 
> *submodules)
>   string_list_clear(submodules, 1);
>  }
>  
> -int find_unpushed_submodules(unsigned char new_sha1[20],
> +static void append_hash_to_argv(const unsigned char sha1[20],
> + void *data)
> +{
> + struct argv_array *argv = (struct argv_array *) data;
> + argv_array_push(argv, sha1_to_hex(sha1));
> +}
> +
> +int find_unpushed_submodules(struct sha1_array *hashes,
>   const char *remotes_name, struct string_list *needs_pushing)
>  {
>   struct rev_info rev;
>   struct commit *commit;
> - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
> - int argc = ARRAY_SIZE(argv) - 1, i;
> - char *sha1_copy;
> + int i;
>   struct string_list submodules = STRING_LIST_INIT_DUP;
> + struct argv_array argv = ARGV_ARRAY_INIT;
>  
> - struct strbuf remotes_arg = STRBUF_INIT;
> -
> - strbuf_addf(_arg, "--remotes=%s", remotes_name);
>   init_revisions(, NULL);
> - sha1_copy = xstrdup(sha1_to_hex(new_sha1));
> - argv[1] = sha1_copy;
> - argv[3] = remotes_arg.buf;
> - setup_revisions(argc, argv, , NULL);
> +
> + /* argv.argv[0] will be ignored by setup_revisions */
> + argv_array_push(, "find_unpushed_submodules");
> + sha1_array_for_each_unique(hashes, append_hash_to_argv, );
> + argv_array_push(, "--not");
> + argv_array_pushf(, "--remotes=%s", remotes_name);
> +
> + setup_revisions(argv.argc, argv.argv, , NULL);

Yes, its about time to for us to lose that fixed-size argv[] and
replace it with an argv-array ;-).

>   if (prepare_revision_walk())
>   die("revision walk setup failed");

So this one used to get a single commit at the tip of what we pushed
in the superproject and was asked "Look at the history we just
pushed leading to the tip commit, and tell me if any of the ones new
to the remote requires submodule commits the remote does not yet
have".  Now the caller collects all the tip commits and asks us
once: "Here are the new tips we just pushed; in the history leading
to them, is there a commit that the remote did not have that requires
submodule history the remote does not yet have?".

Makes sort-of sense.

I speculated that you would be doing the same kind of optimization
to feed all positive commits to rev-list at once in each submodule
repository in the review of the previous one, but you didn't do it
here.  You did the same for superproject in this step.  Perhaps 3 or
4 would do so in the submodule repository.

One thing that makes me worried is how the ref cache layer interacts
with this.  I see you first call push_unpushed_submodules() when
ON_DEMAND is set, which would result in pushes in submodule
repositories, updating their remote tracking branches.  At that
point, before you make another call to find_unpushed_submodules(),
is our cached ref layer knows that the remote tracking branches
are now up to date (otherwise, we would incorrectly judge that these
submodules need pushing based on stale information)?

> diff --git a/transport.c b/transport.c
> index 94d6dc3..76e1daf 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -903,23 +903,29 @@ int transport_push(struct transport *transport,
>  
>   if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && 
> !is_bare_repository()) {
>   struct ref *ref = remote_refs;
> + struct sha1_array hashes = SHA1_ARRAY_INIT;
> +
>   for (; ref; ref = ref->next)
> - if (!is_null_oid(>new_oid) &&
> - !push_unpushed_submodules(ref->new_oid.hash,
> - transport->remote->name))
> - die ("Failed to push all needed 
> submodules!");
> + if (!is_null_oid(>new_oid))
> + sha1_array_append(, 
> ref->new_oid.hash);
> +
> + if (!push_unpushed_submodules(, 
> transport->remote->name))
> + die ("Failed to push all needed submodules!");

Do we leak the contents of hashes here?

>   }
>  
>   if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND |
> TRANSPORT_RECURSE_SUBMODULES_CHECK)) && 
> !is_bare_repository()) {
>   struct ref *ref = remote_refs;
>   struct string_list needs_pushing = STRING_LIST_INIT_DUP;
> + struct sha1_array hashes = SHA1_ARRAY_INIT;
>  
>   for (; ref; ref = ref->next)
> - if (!is_null_oid(>new_oid) &&
> - find_unpushed_submodules(ref->new_oid.hash,
> -   

Re: [PATCH v2 1/1] git-p4: Add --checkpoint-period option to sync/clone

2016-09-16 Thread Ori Rawlings
On Fri, Sep 16, 2016 at 11:19 AM, Lars Schneider
 wrote:
>
>
> This looks interesting! I ran into the same issue and added a parameter to 
> the p4 commands to retry (patch not yet proposed to the mailing list):
> https://github.com/autodesk-forks/git/commit/fcfc96a7814935ee6cefb9d69e44def30a90eabb

I was unaware of the retry flag to the p4 command, that seems like a
useful trick too. I think both approaches might pair nicely together
(p4 optimistically retrying, but still falling back to the latest git
checkpoint if we exhaust our N retry attempts).

> Would it make sense to print the "git-p4 resume command" in case an error 
> happens and checkpoints are written?

I was thinking something like this would be a good idea and would
certainly aide in usability. Resuming a sync is fairly
straight-forward (just re-execute the same command). Resuming a clone
is a bit more problematic, today if a depot path argument is provided
to the sync or clone command (and it is always required for clone), no
attempt is made to examine the existing git branches and limit to only
Perforce changes missing from git.

There is a lingering TODO in the script where we check the presence of
the depot path argument, with a suggestion that we should always make
an attempt to continue building upon existing history when it is
available. I think there might be a few edge cases around this
behavior that I'd need to think through. But, if I'm able to address
the TODO, then printing the command to resume the import should be
pretty straight-forward. I'll continue working on that next week.


[RFC/PATCH 3/3] mailinfo: handle in-body header continuations

2016-09-16 Thread Jonathan Tan
Mailinfo currently handles multi-line headers, but it does not handle
multi-line in-body headers. Teach it to handle such headers, for
example, for this input:

  Subject: a very long
   broken line

  Subject: another very long
   broken line

interpret the in-body subject to be "another very long broken line"
instead of "another very long".

Instead of repeatedly calling "check_header" (as in this patch), one
alternate method to accomplish this would be to have a buffer of
potential header text in struct mailinfo to be flushed whenever a header
is known to end (for example, if we detect the start of a new header),
but this makes the logic more complicated - for example, the flushing
would not only invoke check_header but would also need to reconstruct
the original lines, possibly decode them into UTF-8, and store them in
log_message, and any failures would be noticed a few "lines" away from
the original failure point. Also, care would need to be taken to flush
the buffer at all appropriate places.

Another alternate would be to modify "read_one_header_line" to accept
strings of lines instead of reading its own from a FILE pointer, but
this would also require a buffer, with the same issues.

Signed-off-by: Jonathan Tan 
---
 mailinfo.c  | 24 ++--
 mailinfo.h  |  1 +
 t/t4150-am.sh   | 23 +++
 t/t5100-mailinfo.sh |  4 ++--
 t/t5100/info0018|  5 +
 t/t5100/msg0018 |  2 ++
 t/t5100/patch0018   |  6 ++
 t/t5100/sample.mbox | 19 +++
 8 files changed, 80 insertions(+), 4 deletions(-)
 create mode 100644 t/t5100/info0018
 create mode 100644 t/t5100/msg0018
 create mode 100644 t/t5100/patch0018

diff --git a/mailinfo.c b/mailinfo.c
index 23a56c2..3bbdf74 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -729,8 +729,10 @@ static int handle_commit_msg(struct mailinfo *mi, struct 
strbuf *line)
 
if (mi->header_stage) {
char *buf = old_buf ? old_buf : line->buf;
-   if (buf[0] == 0 || (buf[0] == '\n' && buf[1] == 0))
+   if (buf[0] == 0 || (buf[0] == '\n' && buf[1] == 0)) {
+   strbuf_reset(>last_inbody_header);
goto handle_commit_msg_out;
+   }
}
 
if (mi->use_inbody_headers && mi->header_stage) {
@@ -738,8 +740,24 @@ static int handle_commit_msg(struct mailinfo *mi, struct 
strbuf *line)
size_t len = old_buf ? old_len : line->len;
mi->header_stage = check_header_raw(mi, buf, len,
mi->s_hdr_data, 0);
-   if (mi->header_stage)
+   if (mi->header_stage) {
+   strbuf_reset(>last_inbody_header);
+   strbuf_add(>last_inbody_header, buf, len);
goto handle_commit_msg_out;
+   }
+
+   if (mi->last_inbody_header.len &&
+   (buf[0] == ' ' || buf[0] == '\t')) {
+   strbuf_strip_suffix(>last_inbody_header, "\n");
+   strbuf_add(>last_inbody_header, buf, len);
+   mi->header_stage = check_header(mi,
+   >last_inbody_header,
+   mi->s_hdr_data, 1);
+   if (mi->header_stage)
+   goto handle_commit_msg_out;
+   }
+
+   mi->header_stage = 0;
} else
/* Only trim the first (blank) line of the commit message
 * when ignoring in-body headers.
@@ -1086,6 +1104,7 @@ void setup_mailinfo(struct mailinfo *mi)
strbuf_init(>email, 0);
strbuf_init(>charset, 0);
strbuf_init(>log_message, 0);
+   strbuf_init(>last_inbody_header, 0);
mi->header_stage = 1;
mi->use_inbody_headers = 1;
mi->content_top = mi->content;
@@ -1099,6 +1118,7 @@ void clear_mailinfo(struct mailinfo *mi)
strbuf_release(>name);
strbuf_release(>email);
strbuf_release(>charset);
+   strbuf_release(>last_inbody_header);
free(mi->message_id);
 
for (i = 0; mi->p_hdr_data[i]; i++)
diff --git a/mailinfo.h b/mailinfo.h
index 93776a7..ab2d0dd 100644
--- a/mailinfo.h
+++ b/mailinfo.h
@@ -27,6 +27,7 @@ struct mailinfo {
int patch_lines;
int filter_stage; /* still reading log or are we copying patch? */
int header_stage; /* still checking in-body headers? */
+   struct strbuf last_inbody_header;
struct strbuf **p_hdr_data;
struct strbuf **s_hdr_data;
 
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index 9ce9424..89a5bac 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -977,4 +977,27 @@ test_expect_success 'am --patch-format=mboxrd handles 
mboxrd' '
test_cmp msg out
 '
 
+test_expect_success 'am works with multi-line in-body headers' '
+ 

[RFC/PATCH 2/3] mailinfo: correct malformed test example

2016-09-16 Thread Jonathan Tan
An existing sample message (0015) in the tests for mailinfo contains an
indented line immediately after an in-body header (without any
intervening blank line). Correct this by adding the blank line, in
preparation for a subsequent patch that will treat such indented lines
as RFC 2822 continuation lines (instead of as part of the commit
message).

To ensure that non-indented lines immediately after an in-body header
are still treated correctly (now and in the future), 0008 has been
updated to test both the case when in-body headers are used and the case
when they are not used.

Signed-off-by: Jonathan Tan 
---
 t/t5100/info0008--no-inbody-headers  | 5 +
 t/t5100/msg0008--no-inbody-headers   | 6 ++
 t/t5100/msg0015--no-inbody-headers   | 1 +
 t/t5100/patch0008--no-inbody-headers | 0
 t/t5100/sample.mbox  | 1 +
 5 files changed, 13 insertions(+)
 create mode 100644 t/t5100/info0008--no-inbody-headers
 create mode 100644 t/t5100/msg0008--no-inbody-headers
 create mode 100644 t/t5100/patch0008--no-inbody-headers

diff --git a/t/t5100/info0008--no-inbody-headers 
b/t/t5100/info0008--no-inbody-headers
new file mode 100644
index 000..e8a2951
--- /dev/null
+++ b/t/t5100/info0008--no-inbody-headers
@@ -0,0 +1,5 @@
+Author: Junio C Hamano
+Email: ju...@kernel.org
+Subject: another patch
+Date: Fri, 9 Jun 2006 00:44:16 -0700
+
diff --git a/t/t5100/msg0008--no-inbody-headers 
b/t/t5100/msg0008--no-inbody-headers
new file mode 100644
index 000..d6e950e
--- /dev/null
+++ b/t/t5100/msg0008--no-inbody-headers
@@ -0,0 +1,6 @@
+From: A U Thor 
+Subject: [PATCH] another patch
+>Here is an empty patch from A U Thor.
+
+Hey you forgot the patch!
+
diff --git a/t/t5100/msg0015--no-inbody-headers 
b/t/t5100/msg0015--no-inbody-headers
index be5115b..44a6ce7 100644
--- a/t/t5100/msg0015--no-inbody-headers
+++ b/t/t5100/msg0015--no-inbody-headers
@@ -1,3 +1,4 @@
 From: bogosity
+
   - a list
   - of stuff
diff --git a/t/t5100/patch0008--no-inbody-headers 
b/t/t5100/patch0008--no-inbody-headers
new file mode 100644
index 000..e69de29
diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox
index 8b2ae06..ba8b208 100644
--- a/t/t5100/sample.mbox
+++ b/t/t5100/sample.mbox
@@ -656,6 +656,7 @@ Subject: check bogus body header (from)
 Date: Fri, 9 Jun 2006 00:44:16 -0700
 
 From: bogosity
+
   - a list
   - of stuff
 ---
-- 
2.10.0.rc2.20.g5b18e70



[RFC/PATCH 1/3] mailinfo: refactor commit message processing

2016-09-16 Thread Jonathan Tan
Within the processing of the commit message, check for a scissors line
or a patchbreak line first (before checking for in-body headers) so that
a subsequent patch modifying the processing of in-body headers would not
cause a scissors line or patchbreak line to be misidentified.

If a line could be both an in-body header and a scissors line (for
example, "From: -- >8 --"), this is considered a fatal error
(previously, it would be interpreted as an in-body header).  (It is not
possible for a line to be both an in-body header and a patchbreak line,
since both require different prefixes.)

The following enumeration shows that processing is the same except (as
described above) the in-body header + scissors line case.

o in-body header (check_header OK)
  o passes UTF-8 conversion
o [described above] is scissors line
o [not possible] is patchbreak line
o [not possible] is blank line
o is none of the above - processed as header
  o fails UTF-8 conversion - processed as header
o not in-body header
  o passes UTF-8 conversion
o is scissors line - processed as scissors
o is patchbreak line - processed as patchbreak
o is blank line - ignored if in header_stage
o is none of the above - log message
  o fails UTF-8 conversion - input error

As for the result left in "line" (after the invocation of
handle_commit_msg), it is unused (by its caller, handle_filter, and by
handle_filter's callers, handle_boundary and handle_body) unless this
line is a patchbreak line, in which case handle_patch is subsequently
called (in handle_filter) on "line". In this case, "line" must have
passed UTF-8 conversion both before and after this patch, so the result
is still the same overall.

Signed-off-by: Jonathan Tan 
---
 mailinfo.c | 145 -
 1 file changed, 115 insertions(+), 30 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..23a56c2 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -340,23 +340,56 @@ static struct strbuf *decode_b_segment(const struct 
strbuf *b_seg)
return out;
 }
 
-static int convert_to_utf8(struct mailinfo *mi,
-  struct strbuf *line, const char *charset)
+/*
+ * Attempts to convert line into UTF-8, storing the result in line.
+ *
+ * This differs from convert_to_utf8 in that conversion non-success is not
+ * considered an error case - mi->input_error is not set, and no error message
+ * is printed.
+ *
+ * If the conversion is unnecessary, returns 0 and stores NULL in old_buf (if
+ * old_buf is not NULL).
+ *
+ * If the conversion is successful, returns 0 and stores the unconverted string
+ * in old_buf and old_len (if they are respectively not NULL).
+ *
+ * If the conversion is unsuccessful, returns -1.
+ */
+static int try_convert_to_utf8(const struct mailinfo *mi, struct strbuf *line,
+  const char *charset, char **old_buf,
+  size_t *old_len)
 {
-   char *out;
+   char *utf8;
 
-   if (!mi->metainfo_charset || !charset || !*charset)
+   if (!mi->metainfo_charset || !charset || !*charset ||
+   same_encoding(mi->metainfo_charset, charset)) {
+   if (old_buf)
+   *old_buf = NULL;
return 0;
+   }
 
-   if (same_encoding(mi->metainfo_charset, charset))
+   utf8 = reencode_string(line->buf, mi->metainfo_charset, charset);
+   if (utf8) {
+   char *temp = strbuf_detach(line, old_len);
+   if (old_buf)
+   *old_buf = temp;
+   strbuf_attach(line, utf8, strlen(utf8), strlen(utf8));
return 0;
-   out = reencode_string(line->buf, mi->metainfo_charset, charset);
-   if (!out) {
+   }
+   return -1;
+}
+
+/*
+ * Converts line into UTF-8, setting mi->input_error to -1 upon failure.
+ */
+static int convert_to_utf8(struct mailinfo *mi,
+  struct strbuf *line, const char *charset)
+{
+   if (try_convert_to_utf8(mi, line, charset, NULL, NULL)) {
mi->input_error = -1;
return error("cannot convert from %s to %s",
 charset, mi->metainfo_charset);
}
-   strbuf_attach(line, out, strlen(out), strlen(out));
return 0;
 }
 
@@ -515,6 +548,13 @@ static int check_header(struct mailinfo *mi,
return ret;
 }
 
+static int check_header_raw(struct mailinfo *mi,
+   char *buf, size_t len,
+   struct strbuf *hdr_data[], int overwrite) {
+   const struct strbuf sb = {0, len, buf};
+   return check_header(mi, , hdr_data, overwrite);
+}
+
 static void decode_transfer_encoding(struct mailinfo *mi, struct strbuf *line)
 {
struct strbuf *ret;
@@ -623,32 +663,48 @@ static int is_scissors_line(const struct strbuf *line)
gap * 2 < perforation);
 }
 
-static int 

[RFC/PATCH 0/3] handle multiline in-body headers

2016-09-16 Thread Jonathan Tan
Thanks, Peff, for the explanation and the method to reproduce the issue.

The issue seems to be in mailinfo.c - this patch set addresses that, and I have
also included a test for "git am" in t/t4150-am.sh to show the effect of this
patch set on that command.

Jonathan Tan (3):
  mailinfo: refactor commit message processing
  mailinfo: correct malformed test example
  mailinfo: handle in-body header continuations

 mailinfo.c   | 165 ---
 mailinfo.h   |   1 +
 t/t4150-am.sh|  23 +
 t/t5100-mailinfo.sh  |   4 +-
 t/t5100/info0008--no-inbody-headers  |   5 ++
 t/t5100/info0018 |   5 ++
 t/t5100/msg0008--no-inbody-headers   |   6 ++
 t/t5100/msg0015--no-inbody-headers   |   1 +
 t/t5100/msg0018  |   2 +
 t/t5100/patch0008--no-inbody-headers |   0
 t/t5100/patch0018|   6 ++
 t/t5100/sample.mbox  |  20 +
 12 files changed, 206 insertions(+), 32 deletions(-)
 create mode 100644 t/t5100/info0008--no-inbody-headers
 create mode 100644 t/t5100/info0018
 create mode 100644 t/t5100/msg0008--no-inbody-headers
 create mode 100644 t/t5100/msg0018
 create mode 100644 t/t5100/patch0008--no-inbody-headers
 create mode 100644 t/t5100/patch0018

-- 
2.10.0.rc2.20.g5b18e70



Re: [PATCH 1/2] serialize collection of changed submodules

2016-09-16 Thread Junio C Hamano
Heiko Voigt  writes:

> +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
> + const char *path)
> +{
> + struct string_list_item *item;
> + struct sha1_array *hashes;
> +
> + item = string_list_insert(submodules, path);
> + if (item->util)
> + return (struct sha1_array *) item->util;
> +
> + hashes = (struct sha1_array *) xmalloc(sizeof(struct sha1_array));
> + /* NEEDSWORK: should we add an initializer function for
> +  * sha1_array ? */
> + memset(hashes, 0, sizeof(struct sha1_array));
> + item->util = hashes;


/* NEEDSWORK: should we have SHA1_ARRAY_INIT etc.? */
item->util = xcalloc(1, sizeof(struct sha1_array));

>  static void collect_submodules_from_diff(struct diff_queue_struct *q,
>struct diff_options *options,
>void *data)
>  {
>   int i;
> - struct string_list *needs_pushing = data;
> + struct string_list *submodules = data;
>  
>   for (i = 0; i < q->nr; i++) {
>   struct diff_filepair *p = q->queue[i];
> + struct sha1_array *hashes;
>   if (!S_ISGITLINK(p->two->mode))
>   continue;
> - if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
> - string_list_insert(needs_pushing, p->two->path);
> + hashes = get_sha1s_from_list(submodules, p->two->path);
> + sha1_array_append(hashes, p->two->oid.hash);
>   }
>  }

So the idea at this step is still let each commit in the top-level
history inspected for any submodule change, but the result is
collected in a mapping (submodule -> [ list of submodule commits ]).
As we do not expect too many "oops, the old commit was better, so
let's revert and rebind the old one from the submodule" in the
history of the top-level, appending and then running for-each-unique
is an efficient way, instead of first checking if we already have
it and then inserting new ones to maintain the uniqueness.

Makes sense.

> @@ -582,14 +601,41 @@ static void find_unpushed_submodule_commits(struct 
> commit *commit,
>   diff_tree_combined_merge(commit, 1, );
>  }
>  
> +struct collect_submodule_from_sha1s_data {
> + char *submodule_path;
> + struct string_list *needs_pushing;
> +};
> +
> +static void collect_submodules_from_sha1s(const unsigned char sha1[20],
> + void *data)
> +{
> + struct collect_submodule_from_sha1s_data *me =
> + (struct collect_submodule_from_sha1s_data *) data;
> +
> + if (submodule_needs_pushing(me->submodule_path, sha1))
> + string_list_insert(me->needs_pushing, me->submodule_path);
> +}

This is called from sha1_array_for_each_unique() that iterates over
the submodule commit object names for one submodule and then ends up
calling submodule_needs_pushing() number of times, which smells less
efficient than it could be.  You can ask

rev-list  --not --remotes

just once in the submodule repository.  I imagine that is what you'll
do in the next patch.

An obvious but much less efficient way to optimize this part would
be to see if me->needs_pushing already has me->submodule_path and
skip the check for submodule_needs_pushing(), but if you drop the
call by find_unpushed_submodule to sha1_array_for_each_unique() to
walk new submodule commits one by one, that would become irrelevant.

> +static void free_submodules_sha1s(struct string_list *submodules)
> +{
> + int i;
> + for (i = 0; i < submodules->nr; i++) {
> + struct string_list_item *item = >items[i];
> + struct sha1_array *hashes = (struct sha1_array *) item->util;
> + sha1_array_clear(hashes);
> + }
> + string_list_clear(submodules, 1);
> +}
> +
>  int find_unpushed_submodules(unsigned char new_sha1[20],
>   const char *remotes_name, struct string_list *needs_pushing)
>  {
>   struct rev_info rev;
>   struct commit *commit;
>   const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
> - int argc = ARRAY_SIZE(argv) - 1;
> + int argc = ARRAY_SIZE(argv) - 1, i;
>   char *sha1_copy;
> + struct string_list submodules = STRING_LIST_INIT_DUP;
>  
>   struct strbuf remotes_arg = STRBUF_INIT;
>  
> @@ -603,12 +649,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
>   die("revision walk setup failed");
>  
>   while ((commit = get_revision()) != NULL)
> - find_unpushed_submodule_commits(commit, needs_pushing);
> + find_unpushed_submodule_commits(commit, );
>  
>   reset_revision_walk();
>   free(sha1_copy);
>   strbuf_release(_arg);
>  
> + for (i = 0; i < submodules.nr; i++) {
> + struct string_list_item *item = [i];
> + struct collect_submodule_from_sha1s_data data;
> + data.submodule_path = item->string;
> +   

[PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-16 Thread Josh Triplett
This provides a shorter and more convenient alias for
--subject-prefix='RFC PATCH'.

Add a test covering --rfc.

Signed-off-by: Josh Triplett 
---

By far, the most common subject-prefix I've seen other than "PATCH" is
"RFC PATCH" (or occasionally "PATCH RFC").  Seems worth optimizing for
the common case, to avoid having to spell it out the long way as
--subject-prefix='RFC PATCH'.

 builtin/log.c   | 10 ++
 t/t4014-format-patch.sh |  9 +
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 92dc34d..48d6a38 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1112,6 +1112,13 @@ static int subject_prefix_callback(const struct option 
*opt, const char *arg,
return 0;
 }
 
+static int rfc_callback(const struct option *opt, const char *arg, int unset)
+{
+   subject_prefix = 1;
+   ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH");
+   return 0;
+}
+
 static int numbered_cmdline_opt = 0;
 
 static int numbered_callback(const struct option *opt, const char *arg,
@@ -1419,6 +1426,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
N_("start numbering patches at  instead of 1")),
OPT_INTEGER('v', "reroll-count", _count,
N_("mark the series as Nth re-roll")),
+   { OPTION_CALLBACK, 0, "rfc", , NULL,
+   N_("Use [RFC PATCH] instead of [PATCH]"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, rfc_callback },
{ OPTION_CALLBACK, 0, "subject-prefix", , N_("prefix"),
N_("Use [] instead of [PATCH]"),
PARSE_OPT_NONEG, subject_prefix_callback },
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index b0579dd..81b0498 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1073,6 +1073,15 @@ test_expect_success 'empty subject prefix does not have 
extra space' '
test_cmp expect actual
 '
 
+cat >expect <<'EOF'
+Subject: [RFC PATCH 1/1] header with . in it
+EOF
+test_expect_success '--rfc' '
+   git format-patch -n -1 --stdout --rfc >patch &&
+   grep ^Subject: patch >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success '--from=ident notices bogus ident' '
test_must_fail git format-patch -1 --stdout --from=foo >patch
 '

base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b
-- 
git-series 0.8.10


Re: [PATCH v2 1/1] git-p4: Add --checkpoint-period option to sync/clone

2016-09-16 Thread Lars Schneider

> On 15 Sep 2016, at 23:17, Ori Rawlings  wrote:
> 
> Importing a long history from Perforce into git using the git-p4 tool
> can be especially challenging. The `git p4 clone` operation is based
> on an all-or-nothing transactionality guarantee. Under real-world
> conditions like network unreliability or a busy Perforce server,
> `git p4 clone` and  `git p4 sync` operations can easily fail, forcing a
> user to restart the import process from the beginning. The longer the
> history being imported, the more likely a fault occurs during the
> process. Long enough imports thus become statistically unlikely to ever
> succeed.
> 
> The underlying git fast-import protocol supports an explicit checkpoint
> command. The idea here is to optionally allow the user to force an
> explicit checkpoint every  seconds. If the sync/clone operation fails
> branches are left updated at the appropriate commit available during the
> latest checkpoint. This allows a user to resume importing Perforce
> history while only having to repeat at most approximately  seconds
> worth of import activity.

This looks interesting! I ran into the same issue and added a parameter to the 
p4 commands to retry (patch not yet proposed to the mailing list):
https://github.com/autodesk-forks/git/commit/fcfc96a7814935ee6cefb9d69e44def30a90eabb

Would it make sense to print the "git-p4 resume command" in case an error 
happens and checkpoints are written?

Cheers,
Lars

Re: [wishlist?] make submodule commands robust to having non-submodule Subprojects

2016-09-16 Thread Jacob Keller
On Fri, Sep 16, 2016 at 7:11 AM, Heiko Voigt  wrote:
> How about just
>
>git submodule add 
>
> ? I remember back in the days when I started with submodules thats the
> way I imagined submodules would work:
>
> 1. clone the submodule into a directory
> 2. git submodule add it
> 3. git commit everything
>
> Because that how you basically work with files.  So instead of adding
> another option I would rather like to autodetect that:
>
>  * its a relative path inside this repo that is passed to
>'git submodule add'
>  * there is no .gitmodules entry
>  * and no .git/config
> ==> create those from a remote in the submodule
>
> Corner cases:
>
>  * If there is more than one remote we could tell the user to use an
>option to specify which one to use.
>  * Barf in case there is no remote (not adding the submodule except -f
>is used).
>  * If the gitlink is already there but no .gitmodules entry, 'git
>submodule add' will just add the entry as if it was initially added.
>
> Instead of giving an error message that the submodule is already added
> we could actually be nicer to the user and try to fix things for him
> instead.
>

This makes sense to me. Possibly we could warn in this case, so that
the user knows that something was "off" but I don't think we should be
failing here...

Regards,
Jake

> Cheers Heiko


Re: [wishlist?] make submodule commands robust to having non-submodule Subprojects

2016-09-16 Thread Heiko Voigt
On Thu, Sep 15, 2016 at 11:27:54AM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> > So how about this fictional work flow:
> >
> >  $ git init top
> >  $ cd top
> >  $ git commit --allow-empty -m 'initial in top'
> >  $ git init sub
> >  $ git -C sub commit --allow-empty -m 'initial in sub'
> >  $ git add sub
> > You added a gitlink, but no corresponding entry in
> > .gitmodules is found. This is fine for gits core functionality, but
> > the submodule command gets confused by this unless you add 'sub'
> > to your .gitmodules via `git submodule add --already-in-tree \
> > --reuse-submodules-origin-as-URL sub`. Alternatively you can make 
> > this
> > message disappear by configuring advice.gitlinkPitfalls.
> 
> I am not sure if I agree with that direction.
> 
> If the trend in Git community collectively these days is to make
> usage of submodules easier and smoother, I'd imagine that you would
> want to teach "git add" that was given a submodule to "git submodule
> add" instead by default, with an option "git add --no-gitmodules
> sub" to disable it, or something like that.
> 
> >  $ git submodule add --fixup-modules-file ./sub sub
> >  Adding .gitmodule entry only for `sub` to use `git -C remote
> > show origin` as URL.
> 
> I agree that a feature like this is needed regardless of what
> happens at "git add" time.

How about just

   git submodule add 

? I remember back in the days when I started with submodules thats the
way I imagined submodules would work:

1. clone the submodule into a directory
2. git submodule add it
3. git commit everything

Because that how you basically work with files.  So instead of adding
another option I would rather like to autodetect that:

 * its a relative path inside this repo that is passed to
   'git submodule add'
 * there is no .gitmodules entry
 * and no .git/config
==> create those from a remote in the submodule

Corner cases:

 * If there is more than one remote we could tell the user to use an
   option to specify which one to use.
 * Barf in case there is no remote (not adding the submodule except -f
   is used).
 * If the gitlink is already there but no .gitmodules entry, 'git
   submodule add' will just add the entry as if it was initially added.

Instead of giving an error message that the submodule is already added
we could actually be nicer to the user and try to fix things for him
instead.

Cheers Heiko


Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-16 Thread Heiko Voigt
On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote:
> > By the way, with the two new patches, 'pu' seems to start failing
> > some tests, e.g. 5533 5404 5405.
> 
> Ah ok I did only test on master, will look into those.

Ok I had a look into these and the reason t5533 fails is because on pu
--recurse-submodules is enabled by default and I missed the case when
overwriting a ref. In that case we get the sha1 from the remote side as
old. So we could catch that and fall back to all revisions there, but...

... tl;dr: The solution to use the old revisions from the remote side
was too simple and does not make matters better but actually worse for
some typical usecases. Its only in the last patch.

... that lead me to further thinking about the previous solution (using
the locally cached remote refs) which might actually be a good default
for the non-fastforward cases like creating new ref or overwriting a
ref.

My current patch would handle the --mirror case nicer, since it gets a
lot of old revs to reduce the revisions to check. For the typical one
branch push it would most likely be worse. Except when the user is
updating (fast-forwarding) the remote ref we would scan all revs of a
ref until the root because we do not get enough valid revs that already
exist on the remote.

The most exact solution would be to use all actual remote refs available
(not sure if we have them at this point in the process?) another
solution would be to still append the --remotes= option as a
fallback to reduce the revisions.

What do others think? Will leave this for a little bit further thinking.
Its just the last patch ("use actual start hashes for submodule push
check instead of local refs") that needs to go back to the drawing
board.

Cheers Heiko


Re: [PATCH 3/2] batch check whether submodule needs pushing into one call

2016-09-16 Thread Heiko Voigt
On Thu, Sep 15, 2016 at 02:08:58PM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
> > struct child_process cp = CHILD_PROCESS_INIT;
> > -   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
> > "-n", "1" , NULL};
> > +
> > +   argv_array_push(, "rev-list");
> > +   sha1_array_for_each_unique(hashes, append_hash_to_argv, 
> > );
> > +   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
> > NULL);
> > +
> > struct strbuf buf = STRBUF_INIT;
> > int needs_pushing = 0;
> 
> These two become decl-after-stmt; move your new lines a bit lower,
> perhaps?

Thanks, missed those. Will do.

> > -   argv[1] = sha1_to_hex(sha1);
> > -   cp.argv = argv;
> > prepare_submodule_repo_env(_array);
> 
> By the way, with the two new patches, 'pu' seems to start failing
> some tests, e.g. 5533 5404 5405.

Ah ok I did only test on master, will look into those.

Cheers Heiko


Re: [RFC] extending pathspec support to submodules

2016-09-16 Thread Heiko Voigt
Hi,

On Thu, Sep 15, 2016 at 03:28:21PM -0700, Stefan Beller wrote:
> On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano  wrote:
> > Brandon Williams  writes:
> >
> >> You're right that seems like the best course of action and it already falls
> >> inline with what I did with a first patch to ls-files to support 
> >> submodules.
> >> In that patch I did exactly as you suggest and pass in the prefix to the
> >> submodule and make the child responsible for prepending the prefix to all 
> >> of
> >> its output.  This way we can simply pass through the whole pathspec (as 
> >> apposed
> >> to my original idea of stripping the prefix off the pathspec prior to 
> >> passing
> >> it to the child...which can get complicated with wild characters) to the
> >> childprocess and when checking if a file matches the pathspec we can check 
> >> if
> >> the prefix + file path matches.
> >
> > That's brilliant.  A few observations.
> >
> >  * With that change to tell the command that is spawned in a
> >submodule directory where the submodule repository is in the
> >context of the top-level superproject _and_ require it to take a
> >pathspec as relative to the top-level superproject, you no longer
> >worry about having to find where to cut the pathspec given at the
> >top-level to adjust it for the submodule's context.  That may
> >simplify things.
> 
> I wonder how this plays together with the prefix in the superproject, e.g.
> 
> cd super/unrelated-path
> # when invoking a git command the internal prefix is "unrelated-path/"
> git ls-files ../submodule-*
> # a submodule in submodule-A would be run in  submodule-A
> # with a superproject prefix of super/ ? but additionally we nned
> to know we're
> # not at the root of the superproject.

Do we need to know that? The internal prefix is internal to each
repository and can be treated as such. I would expect that the prefix is
only prefixed when needed. E.g. when we display output to the user,
match files, ...

How about "../submodule-A" as the submodule prefix in the situation you
describe? The wildcard would be resolved by the superproject since the
directory is still in its domain.

I would think of the submodule prefix as the path relative to the
command that started everything. E.g. if we have a tree like this:

*--subA
   /
super *--subB--subsubB
   \
*--dirC

where subA, subB and subsubB are submodules and dirC is just a
directory inside super.

We would get the following prefixes when issuing a command in dirC that
has a pathspec for subsubB:

  subB: ../subB
  subsubB: ../subB/subsubB

An interesting case is when we issue a command in subA:

  super:   ..
  subB:../subB
  subsubB: ../subB/subsubB

A rule for the prefix option could be: Always specified when crossing a
repository boundary with the pathspec (including upwards).

I have not completely thought this through though so just take this as
some food for thought. Since I am not sure what Junio's rationale behind
making the prefix relative to the toplevel superproject was, but I guess
finding it could be a challenge in some situations. I.e. is the
repository in home directory tracking all the dot-files really the
superproject or was it that other one I found before?

> >So we may have to rethink what this option name should be.  "You
> >are running in a repository that is used as a submodule in a
> >larger context, which has the submodule at this path" is what the
> >option tells the command; if any existing command already has
> >such an option, we should use it.  If we are inventing one,
> >perhaps "--submodule-path" (I didn't check if there are existing
> >options that sound similar to it and mean completely different
> >things, in which case that name is not usable)?
> 
> Would it make sense to add the '--submodule-path' to a more generic
> part of the code? It's not just ls-files/grep that have to solve exactly this
> problem. Up to now we just did not go for those commands, though.

Yes I think so, since it should also handle starting from a submodule
with a pathspec to the superproject or other submodule. In case we
go with my above suggestion I would suggest a more generic name since
the option could also be passed to processes handling the superproject.
E.g. something like --module-prefix or --repository-prefix comes to my
mind, not checked though.

Cheers Heiko


[PATCH] Documentation/config: default for color.* is color.ui

2016-09-16 Thread Matthieu Moy
Since 4c7f181 (make color.ui default to 'auto', 2013-06-10), the
default for color.* when nothing is set is 'auto' and we still claimed
that the default was 'false'. Be more precise by saying explicitly
that the default is to follow color.ui, and recall that the default is
'auto' to avoid one indirection for the reader.

Signed-off-by: Matthieu Moy 
---
 Documentation/config.txt | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 32f065c..66429fb 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -953,7 +953,8 @@ color.branch::
A boolean to enable/disable color in the output of
linkgit:git-branch[1]. May be set to `always`,
`false` (or `never`) or `auto` (or `true`), in which case colors are 
used
-   only when the output is to a terminal. Defaults to false.
+   only when the output is to a terminal. If unset, then the
+   value of `color.ui` is used (`auto` by default).
 
 color.branch.::
Use customized color for branch coloration. `` is one of
@@ -968,7 +969,8 @@ color.diff::
linkgit:git-log[1], and linkgit:git-show[1] will use color
for all patches.  If it is set to `true` or `auto`, those
commands will only use color when output is to the terminal.
-   Defaults to false.
+   If unset, then the value of `color.ui` is used (`auto` by
+   default).
 +
 This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
@@ -991,7 +993,8 @@ color.decorate.::
 color.grep::
When set to `always`, always highlight matches.  When `false` (or
`never`), never.  When set to `true` or `auto`, use color only
-   when the output is written to the terminal.  Defaults to `false`.
+   when the output is written to the terminal.  If unset, then the
+   value of `color.ui` is used (`auto` by default).
 
 color.grep.::
Use customized color for grep colorization.  `` specifies which
@@ -1024,7 +1027,8 @@ color.interactive::
and displays (such as those used by "git-add --interactive" and
"git-clean --interactive"). When false (or `never`), never.
When set to `true` or `auto`, use colors only when the output is
-   to the terminal. Defaults to false.
+   to the terminal. If unset, then the value of `color.ui` is
+   used (`auto` by default).
 
 color.interactive.::
Use customized color for 'git add --interactive' and 'git clean
@@ -1040,13 +1044,15 @@ color.showBranch::
A boolean to enable/disable color in the output of
linkgit:git-show-branch[1]. May be set to `always`,
`false` (or `never`) or `auto` (or `true`), in which case colors are 
used
-   only when the output is to a terminal. Defaults to false.
+   only when the output is to a terminal. If unset, then the
+   value of `color.ui` is used (`auto` by default).
 
 color.status::
A boolean to enable/disable color in the output of
linkgit:git-status[1]. May be set to `always`,
`false` (or `never`) or `auto` (or `true`), in which case colors are 
used
-   only when the output is to a terminal. Defaults to false.
+   only when the output is to a terminal. If unset, then the
+   value of `color.ui` is used (`auto` by default).
 
 color.status.::
Use customized color for status colorization. `` is
-- 
2.10.0.rc0.1.g07c9292



Re: Potentially misleading color.* defaults explanation in git-config(1)

2016-09-16 Thread Matthieu Moy
Anatoly Borodin  writes:

> Hi All!
>
> git-config(1) says:
>
>color.branch
>A boolean to enable/disable color in the output of git-branch(1).
>May be set to always, false (or never) or auto (or true), in which
>case colors are used only when the output is to a terminal.

So far, so good.

>Defaults to false.

The truth is: Defaults to following color.ui, which used to default to
false but now defaults to auto.

My bad, I forgot to update these parts of the docs when changing the
default for color.ui (a while back already). Patch follows.

> (2)   git config color.branch false ; git branch

Unrelated from the question, but you could write

git -c color.branch=false git branch

to set a configuration value just for one command.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/


Ich werde warten, von Ihnen zu hören.

2016-09-16 Thread Andrew Hau Chan



--
Lieber Freund,

Wie geht es Ihnen heute? Ich habe eine Investitionsmöglichkeit mit Ihnen 
zu teilen, die die Übertragung einer großen Geldsumme zum gegenseitigen 
Nutzen für beide von uns betreffen.


Mein Name ist Andrew Hau Chung, ich in einem Finanzinstitut arbeiten 
hier in Hong Kong.


Wenn Sie interessiert sind, kontaktieren Sie mich durch meine private 
E-Mail-Adresse:


Ihre prompte Antwort wird geschätzt.

Dein,
Andrew Hau Chung



Re: Gitattributes file is not respected when switching between branches

2016-09-16 Thread Виталий Ищенко
Sorry for delay.

".gitattributes" indeed is not present in "master", but this is intentionally
It is placed only in following 2 branches:
feature-branch
unix-feature-branch

This is how flow looks on windows
$ git --version
git version 2.9.3.windows.1

vitalii.ishchenko@DESKTOP-9TC9UPB MINGW64 /c/work/repos/gitattributes (master)
$ git ls-files --eol
i/lfw/crlf  attr/   testfile-crlf.txt

vitalii.ishchenko@DESKTOP-9TC9UPB MINGW64 /c/work/repos/gitattributes (master)
$ git checkout feature-branch
Switched to branch 'feature-branch'
Your branch is up-to-date with 'origin/feature-branch'.

vitalii.ishchenko@DESKTOP-9TC9UPB MINGW64 /c/work/repos/gitattributes
(feature-branch)
$ git ls-files --eol
i/lfw/lfattr/text eol=lf.gitattributes
i/lfw/crlf  attr/text eol=lftestfile-crlf.txt



On Mon, Sep 12, 2016 at 10:42 PM, Torsten Bögershausen  wrote:
> On 12.09.16 21:35, Torsten Bögershausen wrote:
>> On 12.09.16 14:55, Виталий Ищенко wrote:
>>> Good day
>>>
>>> I faced following issue with gitattributes file (at least eol setting)
>>> when was trying to force `lf` mode on windows.
>>>
>>> We have 2 branches: master & dev. With master set as HEAD in repository
>>>
>>> I've added `.gitattributes` with following content to `dev` branch
>>>
>>> ```
>>> * text eol=lf
>>> ```
>>>
>>> Now when you clone this repo on other machine and checkout dev branch,
>>> eol setting is not respected.
>>> As a workaround you can rm all files except .git folder and do hard reset.
>>>
>>> Issue is reproducible on windows & unix versions. Test repo can be
>>> found on github
>>> https://github.com/betalb/gitattributes-issue
>>>
>>> master branch - one file without gitattributes
>>> feature-branch - .gitattributes added with eol=lf
>>> unix-feature-branch - .gitattributes added with eol=crlf
>>>
>>> Thanks,
>>> Vitalii
>> Some more information may be needed, to help to debug.
>>
>> Which version of Git are you using ?
>> What does
>>
>> git ls-files --eol
>>
>> say ?
> Obs, All information was in the email.
>
> tb@xxx:/tmp/gitattributes-issue> git ls-files --eol
> i/lfw/lfattr/   testfile-crlf.txt
> tb@xxx:/tmp/gitattributes-issue> ls -al
> total 8
> drwxr-xr-x   4 tbwheel  136 Sep 12 21:38 .
> drwxrwxrwt  19 root  wheel  646 Sep 12 21:38 ..
> drwxr-xr-x  13 tbwheel  442 Sep 12 21:38 .git
> -rw-r--r--   1 tbwheel   60 Sep 12 21:38 testfile-crlf.txt
> tb@xxx:/tmp/gitattributes-issue>
>
> Could it be that you didn't commit the file ".gitattributes" ?
> This could help:
> git add .gitattributes && git commit -m "Add .gitattributes"
>
>
>
>
>
>
>