Re: [PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]
On Fri, Sep 16, 2016 at 05:49:22PM -0700, Jeff King wrote: > On Fri, Sep 16, 2016 at 10:27:45AM -0700, Josh Triplett wrote: > > > By far, the most common subject-prefix I've seen other than "PATCH" is > > "RFC PATCH" (or occasionally "PATCH RFC"). Seems worth optimizing for > > the common case, to avoid having to spell it out the long way as > > --subject-prefix='RFC PATCH'. > > "RFC" is the most common one for me, too. And if it ends here, I'm OK > with it. But I'm a little worried with ending up with a proliferation of > options. I haven't seen a significant number of variations on subject prefixes; I can't think of any other prefix I've seen often enough to suggest an option. > If we had a short-option for --subject-prefix, then: > > -P RFC > > is not so bad compared to "--rfc". But if you want to spell it as "RFC > PATCH" that's getting a bit longer. We could have a short option for > "tag this in the subject prefix _in addition_ to writing PATCH". And > then you could do: > > -T RFC > > I dunno. One other thing to consider is that format-patch takes > arbitrary diff options, so we'd want to avoid stomping on them with any > short options (which is why I used "-T" instead of "-t", though I find > it unlikely that many people use the latter with format-patch). That's a > point in favor of --rfc, I think. I agree; the short option space seems more contentious. And in any case, I find --rfc more ergonomic than "-T RFC". :) > > builtin/log.c | 10 ++ > > t/t4014-format-patch.sh | 9 + > > 2 files changed, 19 insertions(+), 0 deletions(-) > > Documentation? Oops, thanks. I'll send v2 shortly. > > +static int rfc_callback(const struct option *opt, const char *arg, int > > unset) > > +{ > > + subject_prefix = 1; > > + ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH"); > > + return 0; > > +} > > I was going to complain that you don't free() the previous value, but > actually the other callers do not xstrdup() in the first place (and we > do not need to do so here, either, as it's a string literal). We > actually _do_ allocate a new copy when reading the value from config, > but it's probably not a big deal in practice to leak that. > > I also wonder if you could implement this as just: > > return subject_prefix_callback(opt, "RFC PATCH", unset); > > And then if you write the documentation as: > > --rfc:: > Behave as if --subject-prefix="RFC PATCH" was specified. > > then it will be trivially correct. :) Nice idea; will do. > > +cat >expect <<'EOF' > > +Subject: [RFC PATCH 1/1] header with . in it > > +EOF > > +test_expect_success '--rfc' ' > > + git format-patch -n -1 --stdout --rfc >patch && > > + grep ^Subject: patch >actual && > > + test_cmp expect actual > > +' > > Our usual style these days is to set up expectations inside the test > blocks (and use "<<-" to get nice indentation; we also typically use > "\EOF" but that's purely style). I copied this from a test immediately above it. :) I can change it easily enough, though. - Josh Triplett
Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example
Jonathan Tanwrites: > From: bogosity > - a list > - of stuff > > Unchanged, the subsequent patch would break this test because it would > interpret that as a multi-line "From" in-body header when in-body > headers are *not* disabled. Yes, that is totally expected. So I would be perfectly fine if your patch changed the test vector for that case, saying "Allowing a folded in-body header means the expected result for the above three lines has to change".
Re: [PATCH] ls-files: add pathspec matching for submodules
Brandon Williamswrites: > ... > [--full-name] [--recurse-submodules] > - [--output-path-prefix=] > + [--submodule-prefix=] > [--abbrev] [--] [...] > > ---output-path-prefix=:: > +--submodule-prefix=:: > Prepend the provided path to the output of each file > ... > static int show_eol; > -static const char *output_path_prefix; > +static const char *submodule_prefix; > static int recurse_submodules; > ... > static struct strbuf full_name = STRBUF_INIT; > - if (output_path_prefix && *output_path_prefix) { > + if (submodule_prefix && *submodule_prefix) { > strbuf_reset(_name); > - strbuf_addstr(_name, output_path_prefix); > + strbuf_addstr(_name, submodule_prefix); > strbuf_addstr(_name, name); As the previous one that used a wrong (sorry) argument is not even in 'next' yet, let's pretend that it never happened. It is OK to still keep it and this patch as two separate steps, i.e. a topic with two patches in it. > + /* Add pathspec args */ > + argv_array_push(, "--"); > + for (i = 0; i < pathspec.nr; ++i) > + argv_array_push(, pathspec.items[i].original); OK, so as discussed previously with Heiko and Stefan, the idea is to - pass the original pathspec as-is, - when --submodule-prefix is given, a path discovered in a submodule repository is first prefixed with that string before getting checked to see if it matches the original pathspec. And this loop is about relaying the original pathspec. > @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce) > > static void show_ce_entry(const char *tag, const struct cache_entry *ce) > { > + struct strbuf name = STRBUF_INIT; > int len = max_prefix_len; > + if (submodule_prefix) > + strbuf_addstr(, submodule_prefix); > + strbuf_addstr(, ce->name); > > if (len >= ce_namelen(ce)) > - die("git ls-files: internal error - cache entry not superset of > prefix"); > + die("git ls-files: internal error - cache entry not " > + "superset of prefix"); This is not such a great thing to do. Upon a bug report, we can no longer do git grep 'cache entry not superset' to see where the error message is coming from. > - if (!match_pathspec(, ce->name, ce_namelen(ce), > - len, ps_matched, > - S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode))) > - return; > - if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) { > + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) && > + submodule_path_match(, name.buf, ps_matched)) { > show_gitlink(ce); > - return; > - } > + } else if (match_pathspec(, name.buf, name.len, > + len, ps_matched, > + S_ISDIR(ce->ce_mode) || > + S_ISGITLINK(ce->ce_mode))) { > + if (tag && *tag && show_valid_bit && > + ... Argh. If we had a preparatory clean-up step, would it have helped to avoid this big re-indentation that makes the patch harder to read than necessary, I wonder? Another way would have been to "goto" from the end of this block > + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) && > + submodule_path_match(, name.buf, ps_matched)) { where we used to "return" out to the central clean-up location, i.e. here. > + strbuf_release(); > } > parse_pathspec(, 0, > PATHSPEC_PREFER_CWD | > PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP, > prefix, argv); > > - /* Find common prefix for all pathspec's */ > - max_prefix = common_prefix(); > + /* > + * Find common prefix for all pathspec's > + * This is used as a performance optimization which violates correctness > + * in the recurse_submodules mode > + */ The two new lines phrase it overly negatively and also misleading. I thought you were saying "We do this as optimization anyway; damn the correctness in the submodule case!" in my first reading before reading the statements the comment talks about. "This optimization unfortunately cannot be done when recursing into submodules" would have been better. > + if (recurse_submodules) > + max_prefix = NULL; > + else > + max_prefix = common_prefix(); > max_prefix_len = max_prefix ? strlen(max_prefix) : 0; > diff --git a/dir.c b/dir.c > index 0ea235f..630dc7a 100644 > --- a/dir.c > +++ b/dir.c > @@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count) > return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count); > } > > +static int prefix_fnmatch(const struct pathspec_item *item, > +const char *pattern, const char *string, > +int prefix) >
[PATCH] ls-files: add pathspec matching for submodules
Pathspecs can be a bit tricky when trying to apply them to submodules. This change permits the pathspec logic to perform a prefix match against submodules since a pathspec could refer to a file inside of a submodule. Signed-off-by: Brandon Williams--- Documentation/git-ls-files.txt | 4 +- builtin/ls-files.c | 143 +++-- dir.c | 62 +- dir.h | 4 + t/t3007-ls-files-recurse-submodules.sh | 66 +-- 5 files changed, 208 insertions(+), 71 deletions(-) diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index a623ebf..09e4449 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -19,7 +19,7 @@ SYNOPSIS [--exclude-standard] [--error-unmatch] [--with-tree=] [--full-name] [--recurse-submodules] - [--output-path-prefix=] + [--submodule-prefix=] [--abbrev] [--] [...] DESCRIPTION @@ -143,7 +143,7 @@ a space) at the start of each line: Recursively calls ls-files on each submodule in the repository. Currently there is only support for the --cached mode. ---output-path-prefix=:: +--submodule-prefix=:: Prepend the provided path to the output of each file --abbrev[=]:: diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 687e475..dc1e076 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -29,7 +29,7 @@ static int show_valid_bit; static int line_terminator = '\n'; static int debug_mode; static int show_eol; -static const char *output_path_prefix; +static const char *submodule_prefix; static int recurse_submodules; static const char *prefix; @@ -78,9 +78,9 @@ static void write_name(const char *name) * churn. */ static struct strbuf full_name = STRBUF_INIT; - if (output_path_prefix && *output_path_prefix) { + if (submodule_prefix && *submodule_prefix) { strbuf_reset(_name); - strbuf_addstr(_name, output_path_prefix); + strbuf_addstr(_name, submodule_prefix); strbuf_addstr(_name, name); name = full_name.buf; } @@ -177,12 +177,30 @@ static void show_gitlink(const struct cache_entry *ce) { struct child_process cp = CHILD_PROCESS_INIT; int status; + int i; argv_array_push(, "ls-files"); argv_array_push(, "--recurse-submodules"); - argv_array_pushf(, "--output-path-prefix=%s%s/", -output_path_prefix ? output_path_prefix : "", + argv_array_pushf(, "--submodule-prefix=%s%s/", +submodule_prefix ? submodule_prefix : "", ce->name); + /* add options */ + if (show_eol) + argv_array_push(, "--eol"); + if (show_valid_bit) + argv_array_push(, "-v"); + if (show_stage) + argv_array_push(, "--stage"); + if (show_cached) + argv_array_push(, "--cached"); + if (debug_mode) + argv_array_push(, "--debug"); + + /* Add pathspec args */ + argv_array_push(, "--"); + for (i = 0; i < pathspec.nr; ++i) + argv_array_push(, pathspec.items[i].original); + cp.git_cmd = 1; cp.dir = ce->name; status = run_command(); @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce) static void show_ce_entry(const char *tag, const struct cache_entry *ce) { + struct strbuf name = STRBUF_INIT; int len = max_prefix_len; + if (submodule_prefix) + strbuf_addstr(, submodule_prefix); + strbuf_addstr(, ce->name); if (len >= ce_namelen(ce)) - die("git ls-files: internal error - cache entry not superset of prefix"); + die("git ls-files: internal error - cache entry not " + "superset of prefix"); - if (!match_pathspec(, ce->name, ce_namelen(ce), - len, ps_matched, - S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode))) - return; - if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) { + if (recurse_submodules && S_ISGITLINK(ce->ce_mode) && + submodule_path_match(, name.buf, ps_matched)) { show_gitlink(ce); - return; - } + } else if (match_pathspec(, name.buf, name.len, + len, ps_matched, + S_ISDIR(ce->ce_mode) || + S_ISGITLINK(ce->ce_mode))) { + if (tag && *tag && show_valid_bit && + (ce->ce_flags & CE_VALID)) { + static char alttag[4]; + memcpy(alttag, tag, 3); + if
Re: [PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]
On Fri, Sep 16, 2016 at 10:27:45AM -0700, Josh Triplett wrote: > By far, the most common subject-prefix I've seen other than "PATCH" is > "RFC PATCH" (or occasionally "PATCH RFC"). Seems worth optimizing for > the common case, to avoid having to spell it out the long way as > --subject-prefix='RFC PATCH'. "RFC" is the most common one for me, too. And if it ends here, I'm OK with it. But I'm a little worried with ending up with a proliferation of options. If we had a short-option for --subject-prefix, then: -P RFC is not so bad compared to "--rfc". But if you want to spell it as "RFC PATCH" that's getting a bit longer. We could have a short option for "tag this in the subject prefix _in addition_ to writing PATCH". And then you could do: -T RFC I dunno. One other thing to consider is that format-patch takes arbitrary diff options, so we'd want to avoid stomping on them with any short options (which is why I used "-T" instead of "-t", though I find it unlikely that many people use the latter with format-patch). That's a point in favor of --rfc, I think. > builtin/log.c | 10 ++ > t/t4014-format-patch.sh | 9 + > 2 files changed, 19 insertions(+), 0 deletions(-) Documentation? > +static int rfc_callback(const struct option *opt, const char *arg, int unset) > +{ > + subject_prefix = 1; > + ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH"); > + return 0; > +} I was going to complain that you don't free() the previous value, but actually the other callers do not xstrdup() in the first place (and we do not need to do so here, either, as it's a string literal). We actually _do_ allocate a new copy when reading the value from config, but it's probably not a big deal in practice to leak that. I also wonder if you could implement this as just: return subject_prefix_callback(opt, "RFC PATCH", unset); And then if you write the documentation as: --rfc:: Behave as if --subject-prefix="RFC PATCH" was specified. then it will be trivially correct. :) > +cat >expect <<'EOF' > +Subject: [RFC PATCH 1/1] header with . in it > +EOF > +test_expect_success '--rfc' ' > + git format-patch -n -1 --stdout --rfc >patch && > + grep ^Subject: patch >actual && > + test_cmp expect actual > +' Our usual style these days is to set up expectations inside the test blocks (and use "<<-" to get nice indentation; we also typically use "\EOF" but that's purely style). -Peff
Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example
On 09/16/2016 03:55 PM, Junio C Hamano wrote: Hmph, these: t/t5100/info0008--no-inbody-headers | 5 + t/t5100/msg0008--no-inbody-headers | 6 ++ t/t5100/msg0015--no-inbody-headers | 1 + have --no-inbody-headers in their names; wouldn't that an indication that they are expected output when mailinfo is run while in-body header feature disabled? Yes, that's correct (they are the test data for when the in-body header feature is disabled). I would have expected that it would make more sense to make no change to sample.mbox and have updated expectation to outputs in the case where in-body header feature is enabled. The sample.mbox file contains the following: From nobody Mon Sep 17 00:00:00 2001 From: A U ThorSubject: check bogus body header (from) Date: Fri, 9 Jun 2006 00:44:16 -0700 From: bogosity - a list - of stuff Unchanged, the subsequent patch would break this test because it would interpret that as a multi-line "From" in-body header when in-body headers are *not* disabled. Besides changing sample.mbox, the other way to make sure that this test passes is to suppress the test when in-body headers are *not* disabled, but looking at t5100* (directory and script), it seemed more straightforward to modify sample.mbox. The patch I sent added a blank line after "From: bogosity", but removing the spaces before "- a list" and "- of stuff" would work too. To make sure this new feature will not break in the future, we would want a brand new message with a folded in-body header added to the sample.mbox, and see how it is parsed by mailinfo with in-body header feature enabled (and disabled). OK, I'll add this test. (The subsequent patch already has the brand new message, but not the test where in-body headers are disabled.)
Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
On 09/16/2016 04:04 PM, Junio C Hamano wrote: Jonathan Tanwrites: I'm concerned about what happens if check_header fails - we would then have some lines which need to be treated as log messages. (At least, they are currently treated that way.) I actually think we should refactor check_header() further so that in-body header processing does not even see things that shouldn't be changed. The current check_header() should be used only for real mail headers, and then a reduced version of check_header() that is called for in-body will _ONLY_ handle the header lines that are handled by the first "search for the interesting parts" loop. And of course we would update your "does it look like rfc2822?" to match what are handled by the "interesting parts" loop. That I think would match the current behaviour much better, I would think. There would be a bit of code duplication in that this "does it look like rfc2822" function would also need to account for duplicate headers (e.g. 2 "Subject:" lines in the in-body headers) because check_header would reject the 2nd one, but that is minor. (Alternatively, we could just allow duplicate headers in the in-body headers.) The ">From " and "[PATCH]" cases in check_header() should not even be there. We should handle them inside handle_commit_msg(), as these two cases should never appear in the real header part of a message. And if we clean it up like that, I do not think we would ever need to worry about "ah, it looked like a header but it is not after all". And not having to worry about it is a good thing and should be one of the primary goals in this conversion, I whould think. Yes, this makes sense. I'll go ahead and make a patch set implementing this (unless anyone has any objections).
Re: [PATCH 11/11] Resumable clone: implement primer logic in git-clone
Kevin Wernwrites: > Use transport_download_primer and transport_prime_clone in git clone. > This only supports a fully connected packfile. > > transport_prime_clone and transport_download_primer are executed > completely independent of transport_(get|fetch)_remote_refs, et al. > transport_download_primer is executed based on the existence of an > alt_resource. The idea is that the "prime clone" execution should be > able to attempt retrieving an alternate resource without dying, as > opposed to depending on the result of upload pack's "capabilities" to > indicate whether or not the client can attempt it. > > If a resumable resource is available, execute a codepath with the > following modular components: > - downloading resource to a specific directory > - using the resource (for pack, indexing and generating the bundle > file) > - cleaning up the resource (if the download or use fails) > - cleaning up the resource (if the download or use succeeds) > > If resume is interrupted on the client side, the alternate resource > info is written to the RESUMABLE file in the git directory. > > On resume, the required info is extracted by parsing the created > config file, and that info is used to determine the work and git > directories. If these cannot be determined, the program exits. > The writing of the refspec and determination of the initial git > directories are skipped, along with transport_prime_clone. > > The main purpose of this series of patches is to flesh out a codepath > for automatic resuming, manual resuming, and leaving a resumable > directory on exit--the logic for when to do these still needs more > work. > > Signed-off-by: Kevin Wern > --- > Documentation/git-clone.txt | 16 ++ > builtin/clone.c | 590 > +--- > t/t9904-git-prime-clone.sh | 181 ++ > 3 files changed, 698 insertions(+), 89 deletions(-) > create mode 100755 t/t9904-git-prime-clone.sh > > diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt > index b7c467a..5934bb6 100644 > --- a/Documentation/git-clone.txt > +++ b/Documentation/git-clone.txt > @@ -16,6 +16,7 @@ SYNOPSIS > [--depth ] [--[no-]single-branch] > [--recursive | --recurse-submodules] [--] > [] > +'git clone --resume ' > > DESCRIPTION > --- > @@ -172,6 +173,12 @@ objects from the source repository into a pack in the > cloned repository. > via ssh, this specifies a non-default path for the command > run on the other end. > > +--prime-clone :: > +-p :: Not many other options have single letter shorthand. Is it expected that it is worth to let this option squat on a short-and-sweet "-p", perhaps because it is so frequently used? > +--resume:: > + Resume a partially cloned repo in a "resumable" state. This > + can only be specified with a single local directory ( + dir>). This is incompatible with all other options. > + > +:: > + The directory of the partial clone. This could be either the > + work directory or the git directory. I think these should be described this way: --resume :: description if what resume option does and how resumable_dir is used by the option. in a single bullet point. > diff --git a/builtin/clone.c b/builtin/clone.c > index 9ac6c01..d9a13dc 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -8,7 +8,9 @@ > * Clone a repository into a different directory that does not yet exist. > */ > > +#include "cache.h" > #include "builtin.h" I do not think you need to include cache.h if you are including builtin.h; Documentation/CodingGuidelines says: - The first #include in C files, except in platform specific compat/ implementations, must be either "git-compat-util.h", "cache.h" or "builtin.h". You do not have to include more than one of these. > @@ -40,17 +42,20 @@ static const char * const builtin_clone_usage[] = { > > static int option_no_checkout, option_bare, option_mirror, > option_single_branch = -1; > static int option_local = -1, option_no_hardlinks, option_shared, > option_recursive; > +static int option_resume; > static char *option_template, *option_depth; > -static char *option_origin = NULL; > +static const char *option_origin = NULL; Is this change related to anything you are doing here? If you are fixing things while at it, please don't ;-) If you really want to, please also remove " = NULL", from this line and also from the next line. Also do not add " = NULL" at the end of alt_res. > static char *option_branch = NULL; > ... > +static const struct alt_resource *alt_res = NULL; > +static char *get_filename(const char *dir) > +{ > + char *dir_copy = xstrdup(dir); > + strip_trailing_slashes(dir_copy); > + char *filename, *final = NULL; > + > + filename = find_last_dir_sep(dir); > + > + if (filename && *(++filename)) > + final =
Re: [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT
Kevin Wernwrites: > Add option RUN_COMMAND_NO_STDOUT, which sets no_stdout on a child > process. > > This will be used by git clone when calling index-pack on a downloaded > packfile. If it is just one caller, would't it make more sense for that caller set no_stdout explicitly itself?
Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
Jonathan Tanwrites: > On 09/16/2016 01:59 PM, Junio C Hamano wrote: >> if (mi->in_line_header->len) { >> /* we have read the beginning of one in-line header */ >> if (line->len && isspace(*line->buf) && >> !(mi->use_scissors && is_scissors_line(line))) { > > Minor note: this means that the scissors check appears twice in the > code, once here and once below (for the non-header case). Yes. I actually was wondering if it is even more sensible to always have the scissors check at the very beginning. Even if we saw a half-written in-body header already in the message, when we see a scissors line, we clear the slate and restart as if the line after the scissors is the first line in the body of the message. >> append to mi->in_line_header strbuf; >> return 0; >> } >> /* otherwise we know mi->in_line_header is now complete */ >> check_header(mi, mi->in_line_header, ...); > > (Sorry - should have also noticed this in your original e-mail.) > > I'm concerned about what happens if check_header fails - we would then > have some lines which need to be treated as log messages. (At least, > they are currently treated that way.) I actually think we should refactor check_header() further so that in-body header processing does not even see things that shouldn't be changed. The current check_header() should be used only for real mail headers, and then a reduced version of check_header() that is called for in-body will _ONLY_ handle the header lines that are handled by the first "search for the interesting parts" loop. And of course we would update your "does it look like rfc2822?" to match what are handled by the "interesting parts" loop. That I think would match the current behaviour much better, I would think. The ">From " and "[PATCH]" cases in check_header() should not even be there. We should handle them inside handle_commit_msg(), as these two cases should never appear in the real header part of a message. And if we clean it up like that, I do not think we would ever need to worry about "ah, it looked like a header but it is not after all". And not having to worry about it is a good thing and should be one of the primary goals in this conversion, I whould think. Thanks.
Re: [PATCH 03/10] diff.c: drop tautologous condition in emit_line_0
On Mon, Sep 12, 2016 at 4:53 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> diff --git a/diff.c b/diff.c >> index 156c2aa..9d2e704 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -460,8 +460,7 @@ static void emit_line_0(struct diff_options *o, const >> char *set, const char *res >> >> if (len == 0) { >> has_trailing_newline = (first == '\n'); >> - has_trailing_carriage_return = (!has_trailing_newline && >> - (first == '\r')); >> + has_trailing_carriage_return = (first == '\r'); >> nofirst = has_trailing_newline || has_trailing_carriage_return; >> } else { >> has_trailing_newline = (len > 0 && line[len-1] == '\n'); > > Interesting. > > This may be a mis-conversion at 250f7993 ("diff.c: split emit_line() > from the first char and the rest of the line", 2009-09-14), I > suspect. The original took line[] with length and peeked for '\n', > and when it saw one, it decremented length before checking > line[len-1] for '\r'. > > But of course if there is only one byte on the line (i.e. len == 0 > after first is stripped off), it cannot be both '\n' or '\r' at the > same time. > > Thanks for spotting. Oh, right, it used to be possible to remove \r\n completely and that information was then kept as has_trailing_newline = has_trailing_carriage_return = 1; and the resulting line is kept completely without ending line. After some thought I don't think I can use this mis-conversion to trigger a bug though, because the len=0 can only ever happen if first is '\n' alone essentially. Another thing I noticed when playing around with diffs: $ printf "\r\n" >crlf $ git commit crlf -m "add file crlf, empty line" $ printf "non zero length\r\n" >crlf $ diff --git a/crlf b/crlf $ index d3f5a12..ece7140 100644 --- a/crlf +++ b/crlf @@ -1 +1 @@ - +non zero length^M $ # The - line is missing a ^M ?
Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example
Jonathan Tanwrites: > On 09/16/2016 12:19 PM, Junio C Hamano wrote: >> Jonathan Tan writes: >> >>> An existing sample message (0015) in the tests for mailinfo contains an >>> indented line immediately after an in-body header (without any >>> intervening blank line). >> >> This comes from d25e5159 ("git am/mailinfo: Don't look at in-body >> headers when rebasing", 2009-11-20), where we want to make sure that >> a "From: bogosity" that isn't meant to be an in-body header is not >> identified as such, even when it is immediately followed by a >> non-blank line. "From: bogosity" is for msg0015 but the same >> applies to the header-looking block for msg0008. >> >> Adding a blank line there will defeat the whole point of the test, >> which is to make sure we don't do anything funky when --no-inbody-headers >> is asked for, no? > > Before I revise the patch set...I think that the point of 0015 would > be handled by 0008 (after this patch is applied), but if you prefer > that 0015 retain its purpose, I can unindent the bullet list in 0015 > instead of adding the extra line (and then dropping all 0008 > changes). Would that be better? (0015 needs to be changed somehow, > because its indented line would be interpreted as a continuation line > after RFC/PATCH 3/3 is applied.) Hmph, these: t/t5100/info0008--no-inbody-headers | 5 + t/t5100/msg0008--no-inbody-headers | 6 ++ t/t5100/msg0015--no-inbody-headers | 1 + have --no-inbody-headers in their names; wouldn't that an indication that they are expected output when mailinfo is run while in-body header feature disabled? I would have expected that it would make more sense to make no change to sample.mbox and have updated expectation to outputs in the case where in-body header feature is enabled. To make sure this new feature will not break in the future, we would want a brand new message with a folded in-body header added to the sample.mbox, and see how it is parsed by mailinfo with in-body header feature enabled (and disabled).
Re: [PATCH 07/11] Resumable clone: add resumable download to http/curl
Kevin Wernwrites: > +int http_download_primer(const char *url, const char *out_file) > +{ > + int ret = 0, try_count = HTTP_TRY_COUNT; > + struct http_get_options options = {0}; > + options.progress = 1; > + > + if (file_exists(out_file)) { > + fprintf(stderr, > + "File already downloaded: '%s', skipping...\n", > + out_file); > + return ret; > + } > + > + do { > + if (try_count != HTTP_TRY_COUNT) { > + fprintf(stderr, "Connection interrupted for some " > + "reason, retrying (%d attempts left)\n", > + try_count); > + struct timeval time = {10, 0}; // 1s We do not use // comment. > + select(0, NULL, NULL, NULL, ); > + } > + ret = http_get_file(url, out_file, ); I didn't realize that http_get_file() -> http_request() codepath, when it is the output file, already can do the "ftell and request the reminder". Very nice. > @@ -1136,7 +1138,10 @@ static int handle_curl_result(struct slot_results > *results) > curl_easy_strerror(results->curl_result), > sizeof(curl_errorstr)); > #endif > - return HTTP_ERROR; > + if (results->http_code >= 400) > + return HTTP_ERROR; > + else > + return HTTP_ERROR_RESUMABLE; > } > } Hmm, is "anything below 400" a good definition of resumable errors?
Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example
On 09/16/2016 12:19 PM, Junio C Hamano wrote: Jonathan Tanwrites: An existing sample message (0015) in the tests for mailinfo contains an indented line immediately after an in-body header (without any intervening blank line). This comes from d25e5159 ("git am/mailinfo: Don't look at in-body headers when rebasing", 2009-11-20), where we want to make sure that a "From: bogosity" that isn't meant to be an in-body header is not identified as such, even when it is immediately followed by a non-blank line. "From: bogosity" is for msg0015 but the same applies to the header-looking block for msg0008. Adding a blank line there will defeat the whole point of the test, which is to make sure we don't do anything funky when --no-inbody-headers is asked for, no? Before I revise the patch set...I think that the point of 0015 would be handled by 0008 (after this patch is applied), but if you prefer that 0015 retain its purpose, I can unindent the bullet list in 0015 instead of adding the extra line (and then dropping all 0008 changes). Would that be better? (0015 needs to be changed somehow, because its indented line would be interpreted as a continuation line after RFC/PATCH 3/3 is applied.)
git-subtree pull issue
When git-subtree is pulling data using the tag reference, it writes the tag's sha1 into a metadata. It could be a problem next time, since this commit object is not a part of main tree, and could be lost. Steps to reproduce: # add some stuff to a existing tree # history will refer to v0.1 tag object instead of v0.1^{commit} git subtree add --squash -P dir/ repo v0.1 # prune all dangling objects including external v0.1 tag git gc --aggressive --prune=all # this will fail git subtree pull --squash -P dir/ repo v0.2
Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
On 09/16/2016 01:59 PM, Junio C Hamano wrote: if (mi->in_line_header->len) { /* we have read the beginning of one in-line header */ if (line->len && isspace(*line->buf) && !(mi->use_scissors && is_scissors_line(line))) { Minor note: this means that the scissors check appears twice in the code, once here and once below (for the non-header case). append to mi->in_line_header strbuf; return 0; } /* otherwise we know mi->in_line_header is now complete */ check_header(mi, mi->in_line_header, ...); (Sorry - should have also noticed this in your original e-mail.) I'm concerned about what happens if check_header fails - we would then have some lines which need to be treated as log messages. (At least, they are currently treated that way.) To treat them as log messages, we would need to convert them into UTF-8, which may possibly fail, so we would have to figure out how to clean up (we have to clean up because we cannot `die` immediately, at least to preserve the current behavior). Also, we are likely to detect such a failure only while processing a subsequent line - this non-"fail fast" currently is fine, but I'm concerned that it will hinder future development (especially when debugging). Minor note: the buffer would also need to be more complicated (instead of the current single buffer), either: o store newlines in that buffer (and we would need to remove all newlines before passing to check_header), or o 2 buffers: one with newlines (for log messages) and one without (for check_header). In light of the above (multiple scissors checks, late detection of failure, more complicated buffer), it seems clearer to me to just change the order of the checks (as in RFC/PATCH 1/3). This necessitates holding on to the old un-decoded buf and len, but this seems easier to me than the above. strbuf_reset(>in_line_header); } ...
[wishlist] disable boring messages
Hi, is it possible to make git silent, when nothing interesting is happening? I have a lot of repos and a batch script to update them all, and I want to get rid of 'Fetching origin' and 'Already up-to-date.' messages leaving only new refs and tags.
Re: [PATCH] mailinfo: unescape quoted-pair in header fields
On Fri, Sep 16, 2016 at 11:02:04PM +0200, Kevin Daudt wrote: > rfc2822 has provisions for quoted strings in structured header fields, > but also allows for escaping these with so-called quoted-pairs. > > The only thing git currently does is removing exterior quotes, but > quotes within are left alone. > > Tell mailinfo to remove exterior quotes and remove escape characters from the > author so that they don't show up in the commits author field. > > Signed-off-by: Kevin Daudt> --- > The only thing I could not easily fix is the prevent git am from > removing any quotes around the author. This is done in fmt_ident, > which calls `strbuf_addstr_without_crud`. Ah, OK. I was wondering where that stripping was being done. That makes sense, and makes me doubly confident this is the right place to be doing it, since the other quote-stripping was not even intentional, but just a side effect of the low-level routines. I think it is OK to leave it in place. If you really want your name to be: "My Name is Always in Quotes" then tough luck. Git does not support it via git-am, but nor does it via git-commit, etc. > mailinfo.c | 54 > ++ > t/t5100-mailinfo.sh| 6 ++ > t/t5100/quoted-pair.expect | 5 + > t/t5100/quoted-pair.in | 9 > 4 files changed, 74 insertions(+) > create mode 100644 t/t5100/quoted-pair.expect > create mode 100644 t/t5100/quoted-pair.in > > diff --git a/mailinfo.c b/mailinfo.c > index e19abe3..04036f3 100644 > --- a/mailinfo.c > +++ b/mailinfo.c > @@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const > struct strbuf *line) > get_sane_name(>name, >name, >email); > } > > +static int unquote_quoted_string(struct strbuf *line) > +{ > + struct strbuf outbuf; > + const char *in = line->buf; > + int c, take_next_literally = 0; > + int found_error = 0; > + char escape_context=0; Style: whitespace around "=". I had to wonder why we needed both escape_context and take_next_literally; shouldn't we just need a single state bit. But escape_context is not "escape the next character", it is "we are currently in a mode where we should be escaping". Could we give it a more descriptive name? I guess it is more than just "we are in a mode", but rather "here is the character that will end the escaped mode". Maybe a comment would be more appropriate. > + while ((c = *in++) != 0) { > + if (take_next_literally) { > + take_next_literally = 0; > + } else { OK, so that means the previous one was backslash-quoted, and we don't do any other cleverness. Good. > + switch (c) { > + case '"': > + if (!escape_context) > + escape_context = '"'; > + else if (escape_context == '"') > + escape_context = 0; > + continue; And here we open or close the quoted portion, depending. Makes sense. > + case '\\': > + if (escape_context) { > + take_next_literally = 1; > + continue; > + } > + break; I didn't look in the RFC. Is: From: my \"name\" really the same as: From: "my \\\"name\\\"" ? That seems weird, but I think it may be that the former is simply bogus (you are not supposed to use backslashes outside of the quoted section at all). > + case '(': > + if (!escape_context) > + escape_context = '('; > + else if (escape_context == '(') > + found_error = 1; > + break; Hmm. Is: From: Name (Comment with (another comment)) really disallowed? RFC2822 seems to say that "comment" can contain "ccontent", which can itself be a comment. This is obviously getting pretty silly, but if we are going to follow the RFC, I think you actually have to do a recursive parse, and keep track of an arbitrary depth of context. I dunno. This method probably covers most cases in practice, and it's easy to reason about. > + case ')': > + if (escape_context == '(') > + escape_context = 0; > + break; > + } > + } > + > + strbuf_addch(, c); > + } > + > + strbuf_reset(line); > + strbuf_addbuf(line, ); > + strbuf_release(); I think you can use strbuf_swap() here to avoid copying the line an extra time, like: strbuf_swap(line, ); strbuf_release(); Another option would be to just:
Re: [PATCH 03/11] pkt-line: create gentle packet_read_line functions
Kevin Wernwrites: > /* And complain if we didn't get enough bytes to satisfy the read. */ > if (ret < size) { > - if (options & PACKET_READ_GENTLE_ON_EOF) > + if (options & (PACKET_READ_GENTLE_ON_EOF | > PACKET_READ_GENTLE_ALL)) > return -1; The name _ALL suggested to me that there may be multiple "under this condition, be gentle", "under that condition, be gentle", and _ALL is used as a catch-all "under any condition, be gentle". If you defined _ALL symbol to have all GENTLE bits on, this line could have become if (options & PACKET_READ_GENTLE_ALL) > @@ -205,15 +209,23 @@ int packet_read(int fd, char **src_buf, size_t *src_len, > if (ret < 0) > return ret; > len = packet_length(linelen); > - if (len < 0) > + if (len < 0) { > + if (options & PACKET_READ_GENTLE_ALL) > + return -1; On the other hand, however, you do want to die here when only GENTLE_ON_EOF is set. Taking the above two observations together, I'd have to say that _ALL is probably a misnomer. I agree with a need for a flag with the behaviour you defined in this patch, though. > die("protocol error: bad line length character: %.4s", linelen); > static char *packet_read_line_generic(int fd, > char **src, size_t *src_len, > - int *dst_len) > + int *dst_len, int flags) The original one is called options, not flags, and it would be easier to follow if it is consistently called options, instead of requiring the reader to keep track of "ah, it is called flags here but the callee renames it to options". > +/* > + * Same as packet_read_line, but does not die on any errors (uses > + * PACKET_READ_GENTLE_ALL). > + */ > +char *packet_read_line_gentle(int fd, int *len_p); > + > +/* > + * Same as packet_read_line_buf, but does not die on any errors (uses > + * PACKET_READ_GENTLE_ALL). > + */ > +char *packet_read_line_buf_gentle(char **src_buf, size_t *src_len, int > *size); I think most if not all "do the same thing as do_something() but report errors instead of dying" variant of functions are named do_something_gently(), not do_something_gentle().
Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
On Fri, Sep 16, 2016 at 10:37:24AM -0700, Jonathan Tan wrote: > Mailinfo currently handles multi-line headers, but it does not handle > multi-line in-body headers. Teach it to handle such headers, for > example, for this input: > > Subject: a very long >broken line > > Subject: another very long >broken line > > interpret the in-body subject to be "another very long broken line" > instead of "another very long". This puzzled me; we should stop parsing in-body headers after the first blank line. But then I realized you probably meant the first "Subject" to be the real mail header. I wonder if it would be more obvious with an example like: From: ... Date: ... Subject: the actual mail subject Subject: a very long broken line Or something. -Peff
Re: [RFC/PATCH 1/3] mailinfo: refactor commit message processing
On Fri, Sep 16, 2016 at 12:12:50PM -0700, Junio C Hamano wrote: > > +static int check_header_raw(struct mailinfo *mi, > > + char *buf, size_t len, > > + struct strbuf *hdr_data[], int overwrite) { > > + const struct strbuf sb = {0, len, buf}; > > + return check_header(mi, , hdr_data, overwrite); > > +} > > IIUC, this is a helper for callers that do not have a strbuf but > instead havepair to perform the same check_header() the > callers that have strbuf can do. > > As check_header() uses the strbuf as a read-only entity, wrapping > the pair in a temporary strbuf like this is safe. > > The incoming should conceptually be "const char *", but it's > OK. I think the "right" way to do this would be to continue taking a "char *", and then strbuf_attach() it. That saves us from unexpectedly violating any strbuf invariants. If our assumption that check_header() does not touch the contents turns out to be wrong, neither strategy would inform our caller, though. I think you'd want something like: assert(sb.buf == buf); after check_header() returns (though I guess we are in theory protected by the "const"). That being said... > If check_header() didn't call any helper function that gets passed > as a strbuf, or if convertiong the helper function to take a > pair instead, I would actually suggest refactoring this > the other way around, though. That is, move the implementation of > check_header() to this function, updating its reference to line->buf > and line->len to reference to and , and then make > check_header() a thin wrapper that does > > check_header(mi, const struct strbuf *line, > struct strbuf *hdr_data[], int overwrite) > { > return check_header_raw(mi, line->buf, line->len, > hdr_data, overwrite); > } This is _way_ better, and it looks like check_header() could handle it easily. Looking at it, I also suspect the cascading if in that function could be made more pleasant by modeling cmp_header()'s interface after skip_prefix_mem(), but that is totally orthogonal and optional. -Peff
[PATCH] mailinfo: unescape quoted-pair in header fields
rfc2822 has provisions for quoted strings in structured header fields, but also allows for escaping these with so-called quoted-pairs. The only thing git currently does is removing exterior quotes, but quotes within are left alone. Tell mailinfo to remove exterior quotes and remove escape characters from the author so that they don't show up in the commits author field. Signed-off-by: Kevin Daudt--- The only thing I could not easily fix is the prevent git am from removing any quotes around the author. This is done in fmt_ident, which calls `strbuf_addstr_without_crud`. mailinfo.c | 54 ++ t/t5100-mailinfo.sh| 6 ++ t/t5100/quoted-pair.expect | 5 + t/t5100/quoted-pair.in | 9 4 files changed, 74 insertions(+) create mode 100644 t/t5100/quoted-pair.expect create mode 100644 t/t5100/quoted-pair.in diff --git a/mailinfo.c b/mailinfo.c index e19abe3..04036f3 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line) get_sane_name(>name, >name, >email); } +static int unquote_quoted_string(struct strbuf *line) +{ + struct strbuf outbuf; + const char *in = line->buf; + int c, take_next_literally = 0; + int found_error = 0; + char escape_context=0; + + strbuf_init(, line->len); + + while ((c = *in++) != 0) { + if (take_next_literally) { + take_next_literally = 0; + } else { + switch (c) { + case '"': + if (!escape_context) + escape_context = '"'; + else if (escape_context == '"') + escape_context = 0; + continue; + case '\\': + if (escape_context) { + take_next_literally = 1; + continue; + } + break; + case '(': + if (!escape_context) + escape_context = '('; + else if (escape_context == '(') + found_error = 1; + break; + case ')': + if (escape_context == '(') + escape_context = 0; + break; + } + } + + strbuf_addch(, c); + } + + strbuf_reset(line); + strbuf_addbuf(line, ); + strbuf_release(); + + return found_error; + +} + static void handle_from(struct mailinfo *mi, const struct strbuf *from) { char *at; size_t el; struct strbuf f; + strbuf_init(, from->len); strbuf_addbuf(, from); + unquote_quoted_string(); + at = strchr(f.buf, '@'); if (!at) { parse_bogus_from(mi, from); diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh index 1a5a546..d0c21fc 100755 --- a/t/t5100-mailinfo.sh +++ b/t/t5100-mailinfo.sh @@ -142,4 +142,10 @@ test_expect_success 'mailinfo unescapes with --mboxrd' ' test_cmp expect mboxrd/msg ' +test_expect_success 'mailinfo unescapes rfc2822 quoted-string' ' +mkdir quoted-pair && +git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >quoted-pair/info && +test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect quoted-pair/info +' + test_done diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect new file mode 100644 index 000..cab1bce --- /dev/null +++ b/t/t5100/quoted-pair.expect @@ -0,0 +1,5 @@ +Author: Author "The Author" Name +Email: someb...@example.com +Subject: testing quoted-pair +Date: Sun, 25 May 2008 00:38:18 -0700 + diff --git a/t/t5100/quoted-pair.in b/t/t5100/quoted-pair.in new file mode 100644 index 000..e2e627a --- /dev/null +++ b/t/t5100/quoted-pair.in @@ -0,0 +1,9 @@ +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001 +From: "Author \"The Author\" Name" +Date: Sun, 25 May 2008 00:38:18 -0700 +Subject: [PATCH] testing quoted-pair + + + +--- +patch -- 2.10.0.86.g6ffa4f1.dirty
Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
Jonathan Tanwrites: >> handle_commit_msg(...) >> { >> if (mi->in_line_header->len) { >> /* we have read the beginning of one in-line header */ >> if (line->len && isspace(*line->buf)) > > This would mean that a message like the following: > > From: Me > -- 8< -- this scissors line will be treated as part of "From" > > would have its scissors line treated as a header. > > The main reason why I reordered the checks (in RFC/PATCH 1/3) is to > avoid this (treating a scissors line with an initial space immediately > following an in-body header as part of a header). > > (If this is not a concern then yes, I agree that the way you described > is simpler and better.) Ahh, OK. I do not think anybody sane would do the "From:" thing, but with the "does it look like 2822 header" check to decide if the first header-looking line should be queued, another failure mode may be: any-random-alpha-and-dash-string: -- >8 -- cut here -- >8 -- Subject: real subject The first line of the real message I personally do not think it matters that much, but if we wanted to protect us from it we could easily do if (mi->in_line_header->len) { /* we have read the beginning of one in-line header */ if (line->len && isspace(*line->buf) && !(mi->use_scissors && is_scissors_line(line))) { append to mi->in_line_header strbuf; return 0; } /* otherwise we know mi->in_line_header is now complete */ check_header(mi, mi->in_line_header, ...); strbuf_reset(>in_line_header); } ... instead, I think.
Re: [PATCH 01/11] Resumable clone: create service git-prime-clone
Kevin Wernwrites: > Create git-prime-clone, a program to be executed on the server that > returns the location and type of static resource to download before > performing the rest of a clone. > > Additionally, as this executable's location will be configurable (see: > upload-pack and receive-pack), add the program to > BINDIR_PROGRAMS_NEED_X, in addition to the usual builtin places. Add > git-prime-clone executable to gitignore, as well > > Signed-off-by: Kevin Wern > --- I wonder if we even need a separate service like this. Wouldn't a new protocol capability that is advertised from upload-pack sufficient to tell the "git clone" that it can and should consider priming from this static resource? > +static void prime_clone(void) > +{ > + if (!enabled) { > + fprintf(stderr, _("prime-clone not enabled\n")); > + } > + else if (url && filetype){ > + packet_write(1, "%s %s\n", filetype, url); > + } > + else if (url || filetype) { > + if (filetype) > + fprintf(stderr, _("prime-clone not properly " > + "configured: missing url\n")); > + else if (url) > + fprintf(stderr, _("prime-clone not properly " > + "configured: missing filetype\n")); > + } > + packet_flush(1); > +} Two minor comments: - For whom are you going to localize these strings? This program is running on the server side and we do not know the locale preferred by the end-user who is sitting on the other end of the connection, no? - Turn "}\n\s+else " into "} else ", please.
Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
On 09/16/2016 01:17 PM, Junio C Hamano wrote: In other words, wouldn't something like the illustration at the end of this message sufficient? If the body consists solely of in-body header without any patch or patchbreak, we may reach EOF with something in mi->in_line_header buffer and nothing in mi->log_message and without this function getting any chance to return 1, so a careful caller may want to flush in_line_header, but the overall result of the mailinfo subsystem in such a case would be an error ("you didn't have any patch or a message?"), so it may not matter too much. Noted. (This was one of my concerns - that the caller should, but did not, flush.) What am I missing? handle_commit_msg(...) { if (mi->in_line_header->len) { /* we have read the beginning of one in-line header */ if (line->len && isspace(*line->buf)) This would mean that a message like the following: From: Me-- 8< -- this scissors line will be treated as part of "From" would have its scissors line treated as a header. The main reason why I reordered the checks (in RFC/PATCH 1/3) is to avoid this (treating a scissors line with an initial space immediately following an in-body header as part of a header). (If this is not a concern then yes, I agree that the way you described is simpler and better.) append to mi->in_line_header strbuf; return 0; /* otherwise we know mi->in_line_header is now complete */ check_header(mi, mi->in_line_header, ...); strbuf_reset(>in_line_header); } if (mi->header_stage && (it is a blank line)) return 0; if (mi->use_inbody_headers && mi->header_stage && (the line looks like beginning of 2822 header)) { strbuf_addbuf(>in_line_header, line); return 0; } /* otherwise we are no longer looking at headers */ mi->header_stage = 0; /* normalize the log message to UTF-8. */ if (convert_to_utf8(mi, line, mi->charset.buf)) return 0; /* mi->input_error already set */ if (mi->use_scissors && is_scissors_line(line)) { int i; strbuf_setlen(>log_message, 0); mi->header_stage = 1; /* * We may have already read "secondary headers"; purge * them to give ourselves a clean restart. */ for (i = 0; header[i]; i++) { if (mi->s_hdr_data[i]) strbuf_release(mi->s_hdr_data[i]); mi->s_hdr_data[i] = NULL; } return 0; } if (patchbreak(line)) { if (mi->message_id) strbuf_addf(>log_message, "Message-Id: %s\n", mi->message_id); return 1; } strbuf_addbuf(>log_message, line); return 0; }
Re: [PATCH 00/11] Resumable clone
Kevin Wernwrites: > It's been a while (sent a very short patch in May), but I've > still been working on the resumable clone feature and checking up on > the mailing list for any updates. After submitting the prime-clone > service alone, I figured implementing the whole thing would be the best > way to understand the full scope of the problem (this is my first real > contribution here, and learning while working on such an involved > feature has not been easy). It may not have been easy but I hope it has been a fun journey for you ;-) > On the client side, the transport_prime_clone and > transport_download_primer APIs are built to be more robust (i.e. read > messages without dying due to protocol errors), so that git clone can > always try them without being dependent on the capability output of > git-upload-pack. transport_download_primer is dependent on the success > of transport_prime_clone, but transport_prime_clone is always run on an > initial clone. Part of achieving this robustness involves adding > *_gentle functions to pkt_line, so that prime_clone can fail silently > without dying. OK. > Right now, a manually resumable directory is left behind only if the > *client* is interrupted while a new junk mode, JUNK_LEAVE_RESUMABLE, > is set (right before the download). For an initial clone, if the > connection fails after automatic resuming, the client erases the > partial resources and falls through to a normal clone. However, once a > resumable directory is left behind by the program, it is NEVER > deleted/abandoned after it is continued with --resume. Sounds like you made a sensible design decision here. > - When running with ssh and a password, the credentials are > prompted for twice. I don't know if there is a way to > preserve credentials between executions. I couldn't find any > examples in git's source. We leave credentail reuse to keyring services like ssh-agent.
Re: [RFC/PATCH 3/3] mailinfo: handle in-body header continuations
Jonathan Tanwrites: > Instead of repeatedly calling "check_header" (as in this patch), one > alternate method to accomplish this would be to have a buffer of > potential header text in struct mailinfo to be flushed whenever a header > is known to end (for example, if we detect the start of a new header), > but this makes the logic more complicated - for example, the flushing > would not only invoke check_header but would also need to reconstruct > the original lines, possibly decode them into UTF-8, and store them in > log_message, and any failures would be noticed a few "lines" away from > the original failure point. Also, care would need to be taken to flush > the buffer at all appropriate places. I am not sure how much the UTF-8 decoding argument above matters. The current way handle_commit_msg() is structured (before any of your patches) is for it to take one raw line at a time and: - If we haven't seen a non-header line (i.e. at the beginning, or we were reading in-body headers), return without doing anything. - If we are told to honor in-body headers and if we haven't seen a non-header line, see if the line itself looks like a header and if so, handle it as an in-body header and return. If that line is not an in-body header, continue processing. - If the processing reaches at this point, we are done with the headers (i.e. mi->header_stage is set to 0). - Make sure the line is in utf8. - If it is a scissors line and we are told to honor scissors lines, ignore what we have read so far and go back to "we haven't seen a non-header line" state and return. - If it is a patch break, return and signal the caller we are done with the log message. - Otherwise accumulate the line as part of the log message. The bug we want to address is in the second step. We only look at the first line of folded in-body header line, because we are fed one line at a time. If we keep the location of UTF8 conversion, and buffered the in-body header in "struct mailinfo *mi" (like you seem to do in this patch), what we will queue there will be _before_ conversion. We'd call check_header() on it once we know one logical line of a header is accumulated, and check_header() would do the right conversion via decode_header() etc., so I do not see why you need to worry about the encoding issues at all. I wonder if the simplest would be to introduce another state in the state machine that is "we know we are processing in-body header, and we have read early part of an in-body header line that may not be complete". In other words, wouldn't something like the illustration at the end of this message sufficient? If the body consists solely of in-body header without any patch or patchbreak, we may reach EOF with something in mi->in_line_header buffer and nothing in mi->log_message and without this function getting any chance to return 1, so a careful caller may want to flush in_line_header, but the overall result of the mailinfo subsystem in such a case would be an error ("you didn't have any patch or a message?"), so it may not matter too much. What am I missing? handle_commit_msg(...) { if (mi->in_line_header->len) { /* we have read the beginning of one in-line header */ if (line->len && isspace(*line->buf)) append to mi->in_line_header strbuf; return 0; /* otherwise we know mi->in_line_header is now complete */ check_header(mi, mi->in_line_header, ...); strbuf_reset(>in_line_header); } if (mi->header_stage && (it is a blank line)) return 0; if (mi->use_inbody_headers && mi->header_stage && (the line looks like beginning of 2822 header)) { strbuf_addbuf(>in_line_header, line); return 0; } /* otherwise we are no longer looking at headers */ mi->header_stage = 0; /* normalize the log message to UTF-8. */ if (convert_to_utf8(mi, line, mi->charset.buf)) return 0; /* mi->input_error already set */ if (mi->use_scissors && is_scissors_line(line)) { int i; strbuf_setlen(>log_message, 0); mi->header_stage = 1; /* * We may have already read "secondary headers"; purge * them to give ourselves a clean restart. */ for (i = 0; header[i]; i++) { if (mi->s_hdr_data[i]) strbuf_release(mi->s_hdr_data[i]); mi->s_hdr_data[i] = NULL; } return 0; } if (patchbreak(line)) { if (mi->message_id) strbuf_addf(>log_message,
Re: [RFC/PATCH 2/3] mailinfo: correct malformed test example
Jonathan Tanwrites: > An existing sample message (0015) in the tests for mailinfo contains an > indented line immediately after an in-body header (without any > intervening blank line). This comes from d25e5159 ("git am/mailinfo: Don't look at in-body headers when rebasing", 2009-11-20), where we want to make sure that a "From: bogosity" that isn't meant to be an in-body header is not identified as such, even when it is immediately followed by a non-blank line. "From: bogosity" is for msg0015 but the same applies to the header-looking block for msg0008. Adding a blank line there will defeat the whole point of the test, which is to make sure we don't do anything funky when --no-inbody-headers is asked for, no? > diff --git a/t/t5100/info0008--no-inbody-headers > b/t/t5100/info0008--no-inbody-headers > new file mode 100644 > index 000..e8a2951 > --- /dev/null > +++ b/t/t5100/info0008--no-inbody-headers > @@ -0,0 +1,5 @@ > +Author: Junio C Hamano > +Email: ju...@kernel.org > +Subject: another patch > +Date: Fri, 9 Jun 2006 00:44:16 -0700 > + > diff --git a/t/t5100/msg0008--no-inbody-headers > b/t/t5100/msg0008--no-inbody-headers > new file mode 100644 > index 000..d6e950e > --- /dev/null > +++ b/t/t5100/msg0008--no-inbody-headers > @@ -0,0 +1,6 @@ > +From: A U Thor > +Subject: [PATCH] another patch > +>Here is an empty patch from A U Thor. > + > +Hey you forgot the patch! > + > diff --git a/t/t5100/msg0015--no-inbody-headers > b/t/t5100/msg0015--no-inbody-headers > index be5115b..44a6ce7 100644 > --- a/t/t5100/msg0015--no-inbody-headers > +++ b/t/t5100/msg0015--no-inbody-headers > @@ -1,3 +1,4 @@ > From: bogosity > + >- a list >- of stuff > diff --git a/t/t5100/patch0008--no-inbody-headers > b/t/t5100/patch0008--no-inbody-headers > new file mode 100644 > index 000..e69de29 > diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox > index 8b2ae06..ba8b208 100644 > --- a/t/t5100/sample.mbox > +++ b/t/t5100/sample.mbox > @@ -656,6 +656,7 @@ Subject: check bogus body header (from) > Date: Fri, 9 Jun 2006 00:44:16 -0700 > > From: bogosity > + >- a list >- of stuff > ---
Re: [RFC/PATCH 1/3] mailinfo: refactor commit message processing
Jonathan Tanwrites: > Within the processing of the commit message, check for a scissors line > or a patchbreak line first (before checking for in-body headers) so that > a subsequent patch modifying the processing of in-body headers would not > cause a scissors line or patchbreak line to be misidentified. > > If a line could be both an in-body header and a scissors line (for > example, "From: -- >8 --"), this is considered a fatal error > (previously, it would be interpreted as an in-body header). The scissors line is designed to allow garbage other than scissors and perforation marks to be on the same line, i.e. /* * The mark must be at least 8 bytes long (e.g. "-- >8 --"). * Even though there can be arbitrary cruft on the same line * (e.g. "cut here"), in order to avoid misidentification, the * perforation must occupy more than a third of the visible * width of the line, and dashes and scissors must occupy more * than half of the perforation. */ Even though it is not likely for people to do so, it would probably be nicer if we can treat From: -- >8 -- cut -- >8 -- >8 -- here -- >8 -- as a scissors line instead of making it a fatal error, by treating that "From:" as just a random garbage. But this is a minor point. It is not worth to make it work like so if the resulting code will become messier. > The following enumeration shows that processing is the same except (as > described above) the in-body header + scissors line case. > > o in-body header (check_header OK) > o passes UTF-8 conversion > o [described above] is scissors line > o [not possible] is patchbreak line > o [not possible] is blank line > o is none of the above - processed as header > o fails UTF-8 conversion - processed as header > o not in-body header > o passes UTF-8 conversion > o is scissors line - processed as scissors > o is patchbreak line - processed as patchbreak > o is blank line - ignored if in header_stage > o is none of the above - log message > o fails UTF-8 conversion - input error > > As for the result left in "line" (after the invocation of > handle_commit_msg), it is unused (by its caller, handle_filter, and by > handle_filter's callers, handle_boundary and handle_body) unless this > line is a patchbreak line, in which case handle_patch is subsequently > called (in handle_filter) on "line". In this case, "line" must have > passed UTF-8 conversion both before and after this patch, so the result > is still the same overall. > > Signed-off-by: Jonathan Tan > --- > mailinfo.c | 145 > - > 1 file changed, 115 insertions(+), 30 deletions(-) > > diff --git a/mailinfo.c b/mailinfo.c > index e19abe3..23a56c2 100644 > --- a/mailinfo.c > +++ b/mailinfo.c > @@ -340,23 +340,56 @@ static struct strbuf *decode_b_segment(const struct > strbuf *b_seg) > return out; > } > > -static int convert_to_utf8(struct mailinfo *mi, > -struct strbuf *line, const char *charset) > +/* > + * Attempts to convert line into UTF-8, storing the result in line. > + * > + * This differs from convert_to_utf8 in that conversion non-success is not > + * considered an error case - mi->input_error is not set, and no error > message > + * is printed. > + * > + * If the conversion is unnecessary, returns 0 and stores NULL in old_buf (if > + * old_buf is not NULL). > + * > + * If the conversion is successful, returns 0 and stores the unconverted > string > + * in old_buf and old_len (if they are respectively not NULL). > + * > + * If the conversion is unsuccessful, returns -1. > + */ > +static int try_convert_to_utf8(const struct mailinfo *mi, struct strbuf > *line, > +const char *charset, char **old_buf, > +size_t *old_len) > { > - char *out; > + char *utf8; > > - if (!mi->metainfo_charset || !charset || !*charset) > + if (!mi->metainfo_charset || !charset || !*charset || > + same_encoding(mi->metainfo_charset, charset)) { > + if (old_buf) > + *old_buf = NULL; > return 0; > + } > > - if (same_encoding(mi->metainfo_charset, charset)) > + utf8 = reencode_string(line->buf, mi->metainfo_charset, charset); > + if (utf8) { > + char *temp = strbuf_detach(line, old_len); > + if (old_buf) > + *old_buf = temp; > + strbuf_attach(line, utf8, strlen(utf8), strlen(utf8)); > return 0; > - out = reencode_string(line->buf, mi->metainfo_charset, charset); > - if (!out) { > + } > + return -1; > +} > + > +/* > + * Converts line into UTF-8, setting mi->input_error to -1 upon failure. > + */ > +static int convert_to_utf8(struct mailinfo *mi, > +struct strbuf
Re: [RFC] extending pathspec support to submodules
On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamanowrote: > > * Your program that runs in the top-level superproject still needs >to be able to say "this pathspec from the top cannot possibly >match anything in the submodule, so let's not even bother >descending into it". > Yes, we would need to first check if the submodule is a prefix match to the pathspec. ie a submodule 'sub' would need to match the pathspec 'sub/somedir' or '*.txt' but not the pathspecs 'subdirectory' or 'otherdir' > > >So we may have to rethink what this option name should be. "You > > >are running in a repository that is used as a submodule in a > > >larger context, which has the submodule at this path" is what the > > >option tells the command; if any existing command already has > > >such an option, we should use it. If we are inventing one, > > >perhaps "--submodule-path" (I didn't check if there are existing > > >options that sound similar to it and mean completely different > > >things, in which case that name is not usable)? > > > > Would it make sense to add the '--submodule-path' to a more generic > > part of the code? It's not just ls-files/grep that have to solve exactly > > this > > problem. Up to now we just did not go for those commands, though. > > Yes I think so, since it should also handle starting from a submodule > with a pathspec to the superproject or other submodule. In case we > go with my above suggestion I would suggest a more generic name since > the option could also be passed to processes handling the superproject. > E.g. something like --module-prefix or --repository-prefix comes to my > mind, not checked though. Yeah we may want to come up with a more descriptive option name now which can be generally applied, especially if we are going to continue adding submodule support for other commands. -Brandon
Re: [RFC/PATCH 0/3] handle multiline in-body headers
Jonathan Tanwrites: > Thanks, Peff, for the explanation and the method to reproduce the issue. > > The issue seems to be in mailinfo.c - this patch set addresses that, and I > have > also included a test for "git am" in t/t4150-am.sh to show the effect of this > patch set on that command. Thanks, will take a look. > Jonathan Tan (3): > mailinfo: refactor commit message processing > mailinfo: correct malformed test example > mailinfo: handle in-body header continuations > > mailinfo.c | 165 > --- > mailinfo.h | 1 + > t/t4150-am.sh| 23 + > t/t5100-mailinfo.sh | 4 +- > t/t5100/info0008--no-inbody-headers | 5 ++ > t/t5100/info0018 | 5 ++ > t/t5100/msg0008--no-inbody-headers | 6 ++ > t/t5100/msg0015--no-inbody-headers | 1 + > t/t5100/msg0018 | 2 + > t/t5100/patch0008--no-inbody-headers | 0 > t/t5100/patch0018| 6 ++ > t/t5100/sample.mbox | 20 + > 12 files changed, 206 insertions(+), 32 deletions(-) > create mode 100644 t/t5100/info0008--no-inbody-headers > create mode 100644 t/t5100/info0018 > create mode 100644 t/t5100/msg0008--no-inbody-headers > create mode 100644 t/t5100/msg0018 > create mode 100644 t/t5100/patch0008--no-inbody-headers > create mode 100644 t/t5100/patch0018
Re: [wishlist?] make submodule commands robust to having non-submodule Subprojects
Heiko Voigtwrites: > On Thu, Sep 15, 2016 at 11:27:54AM -0700, Junio C Hamano wrote: > >> If the trend in Git community collectively these days is to make >> usage of submodules easier and smoother, I'd imagine that you would >> want to teach "git add" that was given a submodule to "git submodule >> add" instead by default, with an option "git add --no-gitmodules >> sub" to disable it, or something like that. >> >> > $ git submodule add --fixup-modules-file ./sub sub >> > Adding .gitmodule entry only for `sub` to use `git -C remote >> > show origin` as URL. >> >> I agree that a feature like this is needed regardless of what >> happens at "git add" time. > > How about just > >git submodule add When I said "a feature like this is needed", I didn't care about exact syntax. I am not sure how often people need the "fixup", what kind of causes there are that they need the "fixup", and what the distribution of vaious causes would be like. If the _ONLY_ kind of fixup necessary is "I meant to say 'git submodule add ./path path' but I said 'git add path' instead", then I think it makes sense to teach "submodule add" that the form "git submodule add " is a short-pand for "git submodule add ./ ". I am not sure if we want to _ignore_ a gitlink that is already in the index unconditionally, i.e. if it is a good idea to let the second one override the first one git submodule add $URL sub && git submodule add sub in this sequence, though. > ? I remember back in the days when I started with submodules thats the > way I imagined submodules would work: > > 1. clone the submodule into a directory > 2. git submodule add it > 3. git commit everything > > Because that how you basically work with files. So instead of adding > another option I would rather like to autodetect that: > > * its a relative path inside this repo that is passed to >'git submodule add' > * there is no .gitmodules entry > * and no .git/config > ==> create those from a remote in the submodule In other words, I agree with the general direction but I'd add another condition to the above three, i.e. * and there is no gitlink for that path in the index yet.
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
Heiko Voigtwrites: > On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote: >> > By the way, with the two new patches, 'pu' seems to start failing >> > some tests, e.g. 5533 5404 5405. >> >> Ah ok I did only test on master, will look into those. > > Ok I had a look into these and the reason t5533 fails is because on pu > --recurse-submodules is enabled by default and I missed the case when > overwriting a ref. In that case we get the sha1 from the remote side as > old. So we could catch that and fall back to all revisions there, but... > > ... tl;dr: The solution to use the old revisions from the remote side > was too simple and does not make matters better but actually worse for > some typical usecases. Its only in the last patch. You may not even have the old one in your copy of the remote repository if you haven't fetched from them and you are forcing your push. "rev-list --not " may fail in such a case, not producing the list of new commits. You'd need to exclude old ones you learned over the wire that you do not yet have locally. > The most exact solution would be to use all actual remote refs available > (not sure if we have them at this point in the process?) another > solution would be to still append the --remotes= option as a > fallback to reduce the revisions. I'd say --remotes= is the least problematic thing to do.
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
Heiko Voigtwrites: > +static void append_hash_to_argv(const unsigned char sha1[20], void *data) > { > - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) > + struct argv_array *argv = (struct argv_array *) data; > + argv_array_push(argv, sha1_to_hex(sha1)); > +} Hmph, why do I think I've seen this before in the previous patch? ... scans through this patch and finds that a similar one is removed ;-) OK. This makes sense. > +static void check_has_hash(const unsigned char sha1[20], void *data) > +{ > + int *has_hash = (int *) data; > + > + if (!lookup_commit_reference(sha1)) > + *has_hash = 0; > +} > + > +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) > +{ > + int has_hash = 1; > + > + if (add_submodule_odb(path)) > + return 0; > + > + sha1_array_for_each_unique(hashes, check_has_hash, _hash); > + return has_hash; > +} > + > +static int submodule_needs_pushing(const char *path, struct sha1_array > *hashes) > +{ > + if (!submodule_has_hashes(path, hashes)) > return 0; I think you meant well, but this optimization is wrong. A mere presence of an object does not mean that the current tip can reach that object. Imagine you pushed commit A earlier to them at the tip, then pushed commit A~ to them at the tip, which is the current state of the remote of the submodule, and since them they may have GC'ed. They no longer have the commit A. For that matter, because you are doing this check by pretending as if all the submodule objects are in the object store of the current superproject you are working in, and saying "it exists there in the submodule repository" when the only thing you know is it exists in an object store of either the submodule repository, the superproject repository, or any of the other submodule repositories, you really cannot tell much from a mere presence of an object. Not just the remote of the submodule repository you are interested in, but the submodule repository you are interested in itself, may not have that object. Drop the previous two helper functions and this short-cut. > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > struct child_process cp = CHILD_PROCESS_INIT; > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", > "-n", "1" , NULL}; > + > + argv_array_push(, "rev-list"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, > ); > + argv_array_pushl(, "--not", "--remotes", "-n", "1" , > NULL); > + > struct strbuf buf = STRBUF_INIT; > int needs_pushing = 0; > > - argv[1] = sha1_to_hex(sha1); > - cp.argv = argv; > prepare_submodule_repo_env(_array); > cp.git_cmd = 1; > cp.no_stdin = 1; > cp.out = -1; > cp.dir = path; > if (start_command()) > - die("Could not run 'git rev-list %s --not --remotes -n > 1' command in submodule %s", > - sha1_to_hex(sha1), path); > + die("Could not run 'git rev-list --not > --remotes -n 1' command in submodule %s", > + path); > if (strbuf_read(, cp.out, 41)) > needs_pushing = 1; > finish_command(); > @@ -601,21 +628,6 @@ static void find_unpushed_submodule_commits(struct > commit *commit, > diff_tree_combined_merge(commit, 1, ); > } Good. This is the optimization I alluded to in the review of the first one in the series.
Re: [PATCH 2/2] serialize collection of refs that contain submodule changes
Heiko Voigtwrites: > diff --git a/submodule.c b/submodule.c > index b04c066..a15e346 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -627,24 +627,31 @@ static void free_submodules_sha1s(struct string_list > *submodules) > string_list_clear(submodules, 1); > } > > -int find_unpushed_submodules(unsigned char new_sha1[20], > +static void append_hash_to_argv(const unsigned char sha1[20], > + void *data) > +{ > + struct argv_array *argv = (struct argv_array *) data; > + argv_array_push(argv, sha1_to_hex(sha1)); > +} > + > +int find_unpushed_submodules(struct sha1_array *hashes, > const char *remotes_name, struct string_list *needs_pushing) > { > struct rev_info rev; > struct commit *commit; > - const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; > - int argc = ARRAY_SIZE(argv) - 1, i; > - char *sha1_copy; > + int i; > struct string_list submodules = STRING_LIST_INIT_DUP; > + struct argv_array argv = ARGV_ARRAY_INIT; > > - struct strbuf remotes_arg = STRBUF_INIT; > - > - strbuf_addf(_arg, "--remotes=%s", remotes_name); > init_revisions(, NULL); > - sha1_copy = xstrdup(sha1_to_hex(new_sha1)); > - argv[1] = sha1_copy; > - argv[3] = remotes_arg.buf; > - setup_revisions(argc, argv, , NULL); > + > + /* argv.argv[0] will be ignored by setup_revisions */ > + argv_array_push(, "find_unpushed_submodules"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, ); > + argv_array_push(, "--not"); > + argv_array_pushf(, "--remotes=%s", remotes_name); > + > + setup_revisions(argv.argc, argv.argv, , NULL); Yes, its about time to for us to lose that fixed-size argv[] and replace it with an argv-array ;-). > if (prepare_revision_walk()) > die("revision walk setup failed"); So this one used to get a single commit at the tip of what we pushed in the superproject and was asked "Look at the history we just pushed leading to the tip commit, and tell me if any of the ones new to the remote requires submodule commits the remote does not yet have". Now the caller collects all the tip commits and asks us once: "Here are the new tips we just pushed; in the history leading to them, is there a commit that the remote did not have that requires submodule history the remote does not yet have?". Makes sort-of sense. I speculated that you would be doing the same kind of optimization to feed all positive commits to rev-list at once in each submodule repository in the review of the previous one, but you didn't do it here. You did the same for superproject in this step. Perhaps 3 or 4 would do so in the submodule repository. One thing that makes me worried is how the ref cache layer interacts with this. I see you first call push_unpushed_submodules() when ON_DEMAND is set, which would result in pushes in submodule repositories, updating their remote tracking branches. At that point, before you make another call to find_unpushed_submodules(), is our cached ref layer knows that the remote tracking branches are now up to date (otherwise, we would incorrectly judge that these submodules need pushing based on stale information)? > diff --git a/transport.c b/transport.c > index 94d6dc3..76e1daf 100644 > --- a/transport.c > +++ b/transport.c > @@ -903,23 +903,29 @@ int transport_push(struct transport *transport, > > if ((flags & TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND) && > !is_bare_repository()) { > struct ref *ref = remote_refs; > + struct sha1_array hashes = SHA1_ARRAY_INIT; > + > for (; ref; ref = ref->next) > - if (!is_null_oid(>new_oid) && > - !push_unpushed_submodules(ref->new_oid.hash, > - transport->remote->name)) > - die ("Failed to push all needed > submodules!"); > + if (!is_null_oid(>new_oid)) > + sha1_array_append(, > ref->new_oid.hash); > + > + if (!push_unpushed_submodules(, > transport->remote->name)) > + die ("Failed to push all needed submodules!"); Do we leak the contents of hashes here? > } > > if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND | > TRANSPORT_RECURSE_SUBMODULES_CHECK)) && > !is_bare_repository()) { > struct ref *ref = remote_refs; > struct string_list needs_pushing = STRING_LIST_INIT_DUP; > + struct sha1_array hashes = SHA1_ARRAY_INIT; > > for (; ref; ref = ref->next) > - if (!is_null_oid(>new_oid) && > - find_unpushed_submodules(ref->new_oid.hash, > -
Re: [PATCH v2 1/1] git-p4: Add --checkpoint-period option to sync/clone
On Fri, Sep 16, 2016 at 11:19 AM, Lars Schneiderwrote: > > > This looks interesting! I ran into the same issue and added a parameter to > the p4 commands to retry (patch not yet proposed to the mailing list): > https://github.com/autodesk-forks/git/commit/fcfc96a7814935ee6cefb9d69e44def30a90eabb I was unaware of the retry flag to the p4 command, that seems like a useful trick too. I think both approaches might pair nicely together (p4 optimistically retrying, but still falling back to the latest git checkpoint if we exhaust our N retry attempts). > Would it make sense to print the "git-p4 resume command" in case an error > happens and checkpoints are written? I was thinking something like this would be a good idea and would certainly aide in usability. Resuming a sync is fairly straight-forward (just re-execute the same command). Resuming a clone is a bit more problematic, today if a depot path argument is provided to the sync or clone command (and it is always required for clone), no attempt is made to examine the existing git branches and limit to only Perforce changes missing from git. There is a lingering TODO in the script where we check the presence of the depot path argument, with a suggestion that we should always make an attempt to continue building upon existing history when it is available. I think there might be a few edge cases around this behavior that I'd need to think through. But, if I'm able to address the TODO, then printing the command to resume the import should be pretty straight-forward. I'll continue working on that next week.
[RFC/PATCH 3/3] mailinfo: handle in-body header continuations
Mailinfo currently handles multi-line headers, but it does not handle multi-line in-body headers. Teach it to handle such headers, for example, for this input: Subject: a very long broken line Subject: another very long broken line interpret the in-body subject to be "another very long broken line" instead of "another very long". Instead of repeatedly calling "check_header" (as in this patch), one alternate method to accomplish this would be to have a buffer of potential header text in struct mailinfo to be flushed whenever a header is known to end (for example, if we detect the start of a new header), but this makes the logic more complicated - for example, the flushing would not only invoke check_header but would also need to reconstruct the original lines, possibly decode them into UTF-8, and store them in log_message, and any failures would be noticed a few "lines" away from the original failure point. Also, care would need to be taken to flush the buffer at all appropriate places. Another alternate would be to modify "read_one_header_line" to accept strings of lines instead of reading its own from a FILE pointer, but this would also require a buffer, with the same issues. Signed-off-by: Jonathan Tan--- mailinfo.c | 24 ++-- mailinfo.h | 1 + t/t4150-am.sh | 23 +++ t/t5100-mailinfo.sh | 4 ++-- t/t5100/info0018| 5 + t/t5100/msg0018 | 2 ++ t/t5100/patch0018 | 6 ++ t/t5100/sample.mbox | 19 +++ 8 files changed, 80 insertions(+), 4 deletions(-) create mode 100644 t/t5100/info0018 create mode 100644 t/t5100/msg0018 create mode 100644 t/t5100/patch0018 diff --git a/mailinfo.c b/mailinfo.c index 23a56c2..3bbdf74 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -729,8 +729,10 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line) if (mi->header_stage) { char *buf = old_buf ? old_buf : line->buf; - if (buf[0] == 0 || (buf[0] == '\n' && buf[1] == 0)) + if (buf[0] == 0 || (buf[0] == '\n' && buf[1] == 0)) { + strbuf_reset(>last_inbody_header); goto handle_commit_msg_out; + } } if (mi->use_inbody_headers && mi->header_stage) { @@ -738,8 +740,24 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line) size_t len = old_buf ? old_len : line->len; mi->header_stage = check_header_raw(mi, buf, len, mi->s_hdr_data, 0); - if (mi->header_stage) + if (mi->header_stage) { + strbuf_reset(>last_inbody_header); + strbuf_add(>last_inbody_header, buf, len); goto handle_commit_msg_out; + } + + if (mi->last_inbody_header.len && + (buf[0] == ' ' || buf[0] == '\t')) { + strbuf_strip_suffix(>last_inbody_header, "\n"); + strbuf_add(>last_inbody_header, buf, len); + mi->header_stage = check_header(mi, + >last_inbody_header, + mi->s_hdr_data, 1); + if (mi->header_stage) + goto handle_commit_msg_out; + } + + mi->header_stage = 0; } else /* Only trim the first (blank) line of the commit message * when ignoring in-body headers. @@ -1086,6 +1104,7 @@ void setup_mailinfo(struct mailinfo *mi) strbuf_init(>email, 0); strbuf_init(>charset, 0); strbuf_init(>log_message, 0); + strbuf_init(>last_inbody_header, 0); mi->header_stage = 1; mi->use_inbody_headers = 1; mi->content_top = mi->content; @@ -1099,6 +1118,7 @@ void clear_mailinfo(struct mailinfo *mi) strbuf_release(>name); strbuf_release(>email); strbuf_release(>charset); + strbuf_release(>last_inbody_header); free(mi->message_id); for (i = 0; mi->p_hdr_data[i]; i++) diff --git a/mailinfo.h b/mailinfo.h index 93776a7..ab2d0dd 100644 --- a/mailinfo.h +++ b/mailinfo.h @@ -27,6 +27,7 @@ struct mailinfo { int patch_lines; int filter_stage; /* still reading log or are we copying patch? */ int header_stage; /* still checking in-body headers? */ + struct strbuf last_inbody_header; struct strbuf **p_hdr_data; struct strbuf **s_hdr_data; diff --git a/t/t4150-am.sh b/t/t4150-am.sh index 9ce9424..89a5bac 100755 --- a/t/t4150-am.sh +++ b/t/t4150-am.sh @@ -977,4 +977,27 @@ test_expect_success 'am --patch-format=mboxrd handles mboxrd' ' test_cmp msg out ' +test_expect_success 'am works with multi-line in-body headers' ' +
[RFC/PATCH 2/3] mailinfo: correct malformed test example
An existing sample message (0015) in the tests for mailinfo contains an indented line immediately after an in-body header (without any intervening blank line). Correct this by adding the blank line, in preparation for a subsequent patch that will treat such indented lines as RFC 2822 continuation lines (instead of as part of the commit message). To ensure that non-indented lines immediately after an in-body header are still treated correctly (now and in the future), 0008 has been updated to test both the case when in-body headers are used and the case when they are not used. Signed-off-by: Jonathan Tan--- t/t5100/info0008--no-inbody-headers | 5 + t/t5100/msg0008--no-inbody-headers | 6 ++ t/t5100/msg0015--no-inbody-headers | 1 + t/t5100/patch0008--no-inbody-headers | 0 t/t5100/sample.mbox | 1 + 5 files changed, 13 insertions(+) create mode 100644 t/t5100/info0008--no-inbody-headers create mode 100644 t/t5100/msg0008--no-inbody-headers create mode 100644 t/t5100/patch0008--no-inbody-headers diff --git a/t/t5100/info0008--no-inbody-headers b/t/t5100/info0008--no-inbody-headers new file mode 100644 index 000..e8a2951 --- /dev/null +++ b/t/t5100/info0008--no-inbody-headers @@ -0,0 +1,5 @@ +Author: Junio C Hamano +Email: ju...@kernel.org +Subject: another patch +Date: Fri, 9 Jun 2006 00:44:16 -0700 + diff --git a/t/t5100/msg0008--no-inbody-headers b/t/t5100/msg0008--no-inbody-headers new file mode 100644 index 000..d6e950e --- /dev/null +++ b/t/t5100/msg0008--no-inbody-headers @@ -0,0 +1,6 @@ +From: A U Thor +Subject: [PATCH] another patch +>Here is an empty patch from A U Thor. + +Hey you forgot the patch! + diff --git a/t/t5100/msg0015--no-inbody-headers b/t/t5100/msg0015--no-inbody-headers index be5115b..44a6ce7 100644 --- a/t/t5100/msg0015--no-inbody-headers +++ b/t/t5100/msg0015--no-inbody-headers @@ -1,3 +1,4 @@ From: bogosity + - a list - of stuff diff --git a/t/t5100/patch0008--no-inbody-headers b/t/t5100/patch0008--no-inbody-headers new file mode 100644 index 000..e69de29 diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox index 8b2ae06..ba8b208 100644 --- a/t/t5100/sample.mbox +++ b/t/t5100/sample.mbox @@ -656,6 +656,7 @@ Subject: check bogus body header (from) Date: Fri, 9 Jun 2006 00:44:16 -0700 From: bogosity + - a list - of stuff --- -- 2.10.0.rc2.20.g5b18e70
[RFC/PATCH 1/3] mailinfo: refactor commit message processing
Within the processing of the commit message, check for a scissors line or a patchbreak line first (before checking for in-body headers) so that a subsequent patch modifying the processing of in-body headers would not cause a scissors line or patchbreak line to be misidentified. If a line could be both an in-body header and a scissors line (for example, "From: -- >8 --"), this is considered a fatal error (previously, it would be interpreted as an in-body header). (It is not possible for a line to be both an in-body header and a patchbreak line, since both require different prefixes.) The following enumeration shows that processing is the same except (as described above) the in-body header + scissors line case. o in-body header (check_header OK) o passes UTF-8 conversion o [described above] is scissors line o [not possible] is patchbreak line o [not possible] is blank line o is none of the above - processed as header o fails UTF-8 conversion - processed as header o not in-body header o passes UTF-8 conversion o is scissors line - processed as scissors o is patchbreak line - processed as patchbreak o is blank line - ignored if in header_stage o is none of the above - log message o fails UTF-8 conversion - input error As for the result left in "line" (after the invocation of handle_commit_msg), it is unused (by its caller, handle_filter, and by handle_filter's callers, handle_boundary and handle_body) unless this line is a patchbreak line, in which case handle_patch is subsequently called (in handle_filter) on "line". In this case, "line" must have passed UTF-8 conversion both before and after this patch, so the result is still the same overall. Signed-off-by: Jonathan Tan--- mailinfo.c | 145 - 1 file changed, 115 insertions(+), 30 deletions(-) diff --git a/mailinfo.c b/mailinfo.c index e19abe3..23a56c2 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -340,23 +340,56 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg) return out; } -static int convert_to_utf8(struct mailinfo *mi, - struct strbuf *line, const char *charset) +/* + * Attempts to convert line into UTF-8, storing the result in line. + * + * This differs from convert_to_utf8 in that conversion non-success is not + * considered an error case - mi->input_error is not set, and no error message + * is printed. + * + * If the conversion is unnecessary, returns 0 and stores NULL in old_buf (if + * old_buf is not NULL). + * + * If the conversion is successful, returns 0 and stores the unconverted string + * in old_buf and old_len (if they are respectively not NULL). + * + * If the conversion is unsuccessful, returns -1. + */ +static int try_convert_to_utf8(const struct mailinfo *mi, struct strbuf *line, + const char *charset, char **old_buf, + size_t *old_len) { - char *out; + char *utf8; - if (!mi->metainfo_charset || !charset || !*charset) + if (!mi->metainfo_charset || !charset || !*charset || + same_encoding(mi->metainfo_charset, charset)) { + if (old_buf) + *old_buf = NULL; return 0; + } - if (same_encoding(mi->metainfo_charset, charset)) + utf8 = reencode_string(line->buf, mi->metainfo_charset, charset); + if (utf8) { + char *temp = strbuf_detach(line, old_len); + if (old_buf) + *old_buf = temp; + strbuf_attach(line, utf8, strlen(utf8), strlen(utf8)); return 0; - out = reencode_string(line->buf, mi->metainfo_charset, charset); - if (!out) { + } + return -1; +} + +/* + * Converts line into UTF-8, setting mi->input_error to -1 upon failure. + */ +static int convert_to_utf8(struct mailinfo *mi, + struct strbuf *line, const char *charset) +{ + if (try_convert_to_utf8(mi, line, charset, NULL, NULL)) { mi->input_error = -1; return error("cannot convert from %s to %s", charset, mi->metainfo_charset); } - strbuf_attach(line, out, strlen(out), strlen(out)); return 0; } @@ -515,6 +548,13 @@ static int check_header(struct mailinfo *mi, return ret; } +static int check_header_raw(struct mailinfo *mi, + char *buf, size_t len, + struct strbuf *hdr_data[], int overwrite) { + const struct strbuf sb = {0, len, buf}; + return check_header(mi, , hdr_data, overwrite); +} + static void decode_transfer_encoding(struct mailinfo *mi, struct strbuf *line) { struct strbuf *ret; @@ -623,32 +663,48 @@ static int is_scissors_line(const struct strbuf *line) gap * 2 < perforation); } -static int
[RFC/PATCH 0/3] handle multiline in-body headers
Thanks, Peff, for the explanation and the method to reproduce the issue. The issue seems to be in mailinfo.c - this patch set addresses that, and I have also included a test for "git am" in t/t4150-am.sh to show the effect of this patch set on that command. Jonathan Tan (3): mailinfo: refactor commit message processing mailinfo: correct malformed test example mailinfo: handle in-body header continuations mailinfo.c | 165 --- mailinfo.h | 1 + t/t4150-am.sh| 23 + t/t5100-mailinfo.sh | 4 +- t/t5100/info0008--no-inbody-headers | 5 ++ t/t5100/info0018 | 5 ++ t/t5100/msg0008--no-inbody-headers | 6 ++ t/t5100/msg0015--no-inbody-headers | 1 + t/t5100/msg0018 | 2 + t/t5100/patch0008--no-inbody-headers | 0 t/t5100/patch0018| 6 ++ t/t5100/sample.mbox | 20 + 12 files changed, 206 insertions(+), 32 deletions(-) create mode 100644 t/t5100/info0008--no-inbody-headers create mode 100644 t/t5100/info0018 create mode 100644 t/t5100/msg0008--no-inbody-headers create mode 100644 t/t5100/msg0018 create mode 100644 t/t5100/patch0008--no-inbody-headers create mode 100644 t/t5100/patch0018 -- 2.10.0.rc2.20.g5b18e70
Re: [PATCH 1/2] serialize collection of changed submodules
Heiko Voigtwrites: > +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules, > + const char *path) > +{ > + struct string_list_item *item; > + struct sha1_array *hashes; > + > + item = string_list_insert(submodules, path); > + if (item->util) > + return (struct sha1_array *) item->util; > + > + hashes = (struct sha1_array *) xmalloc(sizeof(struct sha1_array)); > + /* NEEDSWORK: should we add an initializer function for > + * sha1_array ? */ > + memset(hashes, 0, sizeof(struct sha1_array)); > + item->util = hashes; /* NEEDSWORK: should we have SHA1_ARRAY_INIT etc.? */ item->util = xcalloc(1, sizeof(struct sha1_array)); > static void collect_submodules_from_diff(struct diff_queue_struct *q, >struct diff_options *options, >void *data) > { > int i; > - struct string_list *needs_pushing = data; > + struct string_list *submodules = data; > > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; > + struct sha1_array *hashes; > if (!S_ISGITLINK(p->two->mode)) > continue; > - if (submodule_needs_pushing(p->two->path, p->two->oid.hash)) > - string_list_insert(needs_pushing, p->two->path); > + hashes = get_sha1s_from_list(submodules, p->two->path); > + sha1_array_append(hashes, p->two->oid.hash); > } > } So the idea at this step is still let each commit in the top-level history inspected for any submodule change, but the result is collected in a mapping (submodule -> [ list of submodule commits ]). As we do not expect too many "oops, the old commit was better, so let's revert and rebind the old one from the submodule" in the history of the top-level, appending and then running for-each-unique is an efficient way, instead of first checking if we already have it and then inserting new ones to maintain the uniqueness. Makes sense. > @@ -582,14 +601,41 @@ static void find_unpushed_submodule_commits(struct > commit *commit, > diff_tree_combined_merge(commit, 1, ); > } > > +struct collect_submodule_from_sha1s_data { > + char *submodule_path; > + struct string_list *needs_pushing; > +}; > + > +static void collect_submodules_from_sha1s(const unsigned char sha1[20], > + void *data) > +{ > + struct collect_submodule_from_sha1s_data *me = > + (struct collect_submodule_from_sha1s_data *) data; > + > + if (submodule_needs_pushing(me->submodule_path, sha1)) > + string_list_insert(me->needs_pushing, me->submodule_path); > +} This is called from sha1_array_for_each_unique() that iterates over the submodule commit object names for one submodule and then ends up calling submodule_needs_pushing() number of times, which smells less efficient than it could be. You can ask rev-list --not --remotes just once in the submodule repository. I imagine that is what you'll do in the next patch. An obvious but much less efficient way to optimize this part would be to see if me->needs_pushing already has me->submodule_path and skip the check for submodule_needs_pushing(), but if you drop the call by find_unpushed_submodule to sha1_array_for_each_unique() to walk new submodule commits one by one, that would become irrelevant. > +static void free_submodules_sha1s(struct string_list *submodules) > +{ > + int i; > + for (i = 0; i < submodules->nr; i++) { > + struct string_list_item *item = >items[i]; > + struct sha1_array *hashes = (struct sha1_array *) item->util; > + sha1_array_clear(hashes); > + } > + string_list_clear(submodules, 1); > +} > + > int find_unpushed_submodules(unsigned char new_sha1[20], > const char *remotes_name, struct string_list *needs_pushing) > { > struct rev_info rev; > struct commit *commit; > const char *argv[] = {NULL, NULL, "--not", "NULL", NULL}; > - int argc = ARRAY_SIZE(argv) - 1; > + int argc = ARRAY_SIZE(argv) - 1, i; > char *sha1_copy; > + struct string_list submodules = STRING_LIST_INIT_DUP; > > struct strbuf remotes_arg = STRBUF_INIT; > > @@ -603,12 +649,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20], > die("revision walk setup failed"); > > while ((commit = get_revision()) != NULL) > - find_unpushed_submodule_commits(commit, needs_pushing); > + find_unpushed_submodule_commits(commit, ); > > reset_revision_walk(); > free(sha1_copy); > strbuf_release(_arg); > > + for (i = 0; i < submodules.nr; i++) { > + struct string_list_item *item = [i]; > + struct collect_submodule_from_sha1s_data data; > + data.submodule_path = item->string; > +
[PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]
This provides a shorter and more convenient alias for --subject-prefix='RFC PATCH'. Add a test covering --rfc. Signed-off-by: Josh Triplett--- By far, the most common subject-prefix I've seen other than "PATCH" is "RFC PATCH" (or occasionally "PATCH RFC"). Seems worth optimizing for the common case, to avoid having to spell it out the long way as --subject-prefix='RFC PATCH'. builtin/log.c | 10 ++ t/t4014-format-patch.sh | 9 + 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index 92dc34d..48d6a38 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1112,6 +1112,13 @@ static int subject_prefix_callback(const struct option *opt, const char *arg, return 0; } +static int rfc_callback(const struct option *opt, const char *arg, int unset) +{ + subject_prefix = 1; + ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH"); + return 0; +} + static int numbered_cmdline_opt = 0; static int numbered_callback(const struct option *opt, const char *arg, @@ -1419,6 +1426,9 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) N_("start numbering patches at instead of 1")), OPT_INTEGER('v', "reroll-count", _count, N_("mark the series as Nth re-roll")), + { OPTION_CALLBACK, 0, "rfc", , NULL, + N_("Use [RFC PATCH] instead of [PATCH]"), + PARSE_OPT_NOARG | PARSE_OPT_NONEG, rfc_callback }, { OPTION_CALLBACK, 0, "subject-prefix", , N_("prefix"), N_("Use [] instead of [PATCH]"), PARSE_OPT_NONEG, subject_prefix_callback }, diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index b0579dd..81b0498 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -1073,6 +1073,15 @@ test_expect_success 'empty subject prefix does not have extra space' ' test_cmp expect actual ' +cat >expect <<'EOF' +Subject: [RFC PATCH 1/1] header with . in it +EOF +test_expect_success '--rfc' ' + git format-patch -n -1 --stdout --rfc >patch && + grep ^Subject: patch >actual && + test_cmp expect actual +' + test_expect_success '--from=ident notices bogus ident' ' test_must_fail git format-patch -1 --stdout --from=foo >patch ' base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b -- git-series 0.8.10
Re: [PATCH v2 1/1] git-p4: Add --checkpoint-period option to sync/clone
> On 15 Sep 2016, at 23:17, Ori Rawlingswrote: > > Importing a long history from Perforce into git using the git-p4 tool > can be especially challenging. The `git p4 clone` operation is based > on an all-or-nothing transactionality guarantee. Under real-world > conditions like network unreliability or a busy Perforce server, > `git p4 clone` and `git p4 sync` operations can easily fail, forcing a > user to restart the import process from the beginning. The longer the > history being imported, the more likely a fault occurs during the > process. Long enough imports thus become statistically unlikely to ever > succeed. > > The underlying git fast-import protocol supports an explicit checkpoint > command. The idea here is to optionally allow the user to force an > explicit checkpoint every seconds. If the sync/clone operation fails > branches are left updated at the appropriate commit available during the > latest checkpoint. This allows a user to resume importing Perforce > history while only having to repeat at most approximately seconds > worth of import activity. This looks interesting! I ran into the same issue and added a parameter to the p4 commands to retry (patch not yet proposed to the mailing list): https://github.com/autodesk-forks/git/commit/fcfc96a7814935ee6cefb9d69e44def30a90eabb Would it make sense to print the "git-p4 resume command" in case an error happens and checkpoints are written? Cheers, Lars
Re: [wishlist?] make submodule commands robust to having non-submodule Subprojects
On Fri, Sep 16, 2016 at 7:11 AM, Heiko Voigtwrote: > How about just > >git submodule add > > ? I remember back in the days when I started with submodules thats the > way I imagined submodules would work: > > 1. clone the submodule into a directory > 2. git submodule add it > 3. git commit everything > > Because that how you basically work with files. So instead of adding > another option I would rather like to autodetect that: > > * its a relative path inside this repo that is passed to >'git submodule add' > * there is no .gitmodules entry > * and no .git/config > ==> create those from a remote in the submodule > > Corner cases: > > * If there is more than one remote we could tell the user to use an >option to specify which one to use. > * Barf in case there is no remote (not adding the submodule except -f >is used). > * If the gitlink is already there but no .gitmodules entry, 'git >submodule add' will just add the entry as if it was initially added. > > Instead of giving an error message that the submodule is already added > we could actually be nicer to the user and try to fix things for him > instead. > This makes sense to me. Possibly we could warn in this case, so that the user knows that something was "off" but I don't think we should be failing here... Regards, Jake > Cheers Heiko
Re: [wishlist?] make submodule commands robust to having non-submodule Subprojects
On Thu, Sep 15, 2016 at 11:27:54AM -0700, Junio C Hamano wrote: > Stefan Bellerwrites: > > So how about this fictional work flow: > > > > $ git init top > > $ cd top > > $ git commit --allow-empty -m 'initial in top' > > $ git init sub > > $ git -C sub commit --allow-empty -m 'initial in sub' > > $ git add sub > > You added a gitlink, but no corresponding entry in > > .gitmodules is found. This is fine for gits core functionality, but > > the submodule command gets confused by this unless you add 'sub' > > to your .gitmodules via `git submodule add --already-in-tree \ > > --reuse-submodules-origin-as-URL sub`. Alternatively you can make > > this > > message disappear by configuring advice.gitlinkPitfalls. > > I am not sure if I agree with that direction. > > If the trend in Git community collectively these days is to make > usage of submodules easier and smoother, I'd imagine that you would > want to teach "git add" that was given a submodule to "git submodule > add" instead by default, with an option "git add --no-gitmodules > sub" to disable it, or something like that. > > > $ git submodule add --fixup-modules-file ./sub sub > > Adding .gitmodule entry only for `sub` to use `git -C remote > > show origin` as URL. > > I agree that a feature like this is needed regardless of what > happens at "git add" time. How about just git submodule add ? I remember back in the days when I started with submodules thats the way I imagined submodules would work: 1. clone the submodule into a directory 2. git submodule add it 3. git commit everything Because that how you basically work with files. So instead of adding another option I would rather like to autodetect that: * its a relative path inside this repo that is passed to 'git submodule add' * there is no .gitmodules entry * and no .git/config ==> create those from a remote in the submodule Corner cases: * If there is more than one remote we could tell the user to use an option to specify which one to use. * Barf in case there is no remote (not adding the submodule except -f is used). * If the gitlink is already there but no .gitmodules entry, 'git submodule add' will just add the entry as if it was initially added. Instead of giving an error message that the submodule is already added we could actually be nicer to the user and try to fix things for him instead. Cheers Heiko
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote: > > By the way, with the two new patches, 'pu' seems to start failing > > some tests, e.g. 5533 5404 5405. > > Ah ok I did only test on master, will look into those. Ok I had a look into these and the reason t5533 fails is because on pu --recurse-submodules is enabled by default and I missed the case when overwriting a ref. In that case we get the sha1 from the remote side as old. So we could catch that and fall back to all revisions there, but... ... tl;dr: The solution to use the old revisions from the remote side was too simple and does not make matters better but actually worse for some typical usecases. Its only in the last patch. ... that lead me to further thinking about the previous solution (using the locally cached remote refs) which might actually be a good default for the non-fastforward cases like creating new ref or overwriting a ref. My current patch would handle the --mirror case nicer, since it gets a lot of old revs to reduce the revisions to check. For the typical one branch push it would most likely be worse. Except when the user is updating (fast-forwarding) the remote ref we would scan all revs of a ref until the root because we do not get enough valid revs that already exist on the remote. The most exact solution would be to use all actual remote refs available (not sure if we have them at this point in the process?) another solution would be to still append the --remotes= option as a fallback to reduce the revisions. What do others think? Will leave this for a little bit further thinking. Its just the last patch ("use actual start hashes for submodule push check instead of local refs") that needs to go back to the drawing board. Cheers Heiko
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
On Thu, Sep 15, 2016 at 02:08:58PM -0700, Junio C Hamano wrote: > Heiko Voigtwrites: > > > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > > struct child_process cp = CHILD_PROCESS_INIT; > > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", > > "-n", "1" , NULL}; > > + > > + argv_array_push(, "rev-list"); > > + sha1_array_for_each_unique(hashes, append_hash_to_argv, > > ); > > + argv_array_pushl(, "--not", "--remotes", "-n", "1" , > > NULL); > > + > > struct strbuf buf = STRBUF_INIT; > > int needs_pushing = 0; > > These two become decl-after-stmt; move your new lines a bit lower, > perhaps? Thanks, missed those. Will do. > > - argv[1] = sha1_to_hex(sha1); > > - cp.argv = argv; > > prepare_submodule_repo_env(_array); > > By the way, with the two new patches, 'pu' seems to start failing > some tests, e.g. 5533 5404 5405. Ah ok I did only test on master, will look into those. Cheers Heiko
Re: [RFC] extending pathspec support to submodules
Hi, On Thu, Sep 15, 2016 at 03:28:21PM -0700, Stefan Beller wrote: > On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamanowrote: > > Brandon Williams writes: > > > >> You're right that seems like the best course of action and it already falls > >> inline with what I did with a first patch to ls-files to support > >> submodules. > >> In that patch I did exactly as you suggest and pass in the prefix to the > >> submodule and make the child responsible for prepending the prefix to all > >> of > >> its output. This way we can simply pass through the whole pathspec (as > >> apposed > >> to my original idea of stripping the prefix off the pathspec prior to > >> passing > >> it to the child...which can get complicated with wild characters) to the > >> childprocess and when checking if a file matches the pathspec we can check > >> if > >> the prefix + file path matches. > > > > That's brilliant. A few observations. > > > > * With that change to tell the command that is spawned in a > >submodule directory where the submodule repository is in the > >context of the top-level superproject _and_ require it to take a > >pathspec as relative to the top-level superproject, you no longer > >worry about having to find where to cut the pathspec given at the > >top-level to adjust it for the submodule's context. That may > >simplify things. > > I wonder how this plays together with the prefix in the superproject, e.g. > > cd super/unrelated-path > # when invoking a git command the internal prefix is "unrelated-path/" > git ls-files ../submodule-* > # a submodule in submodule-A would be run in submodule-A > # with a superproject prefix of super/ ? but additionally we nned > to know we're > # not at the root of the superproject. Do we need to know that? The internal prefix is internal to each repository and can be treated as such. I would expect that the prefix is only prefixed when needed. E.g. when we display output to the user, match files, ... How about "../submodule-A" as the submodule prefix in the situation you describe? The wildcard would be resolved by the superproject since the directory is still in its domain. I would think of the submodule prefix as the path relative to the command that started everything. E.g. if we have a tree like this: *--subA / super *--subB--subsubB \ *--dirC where subA, subB and subsubB are submodules and dirC is just a directory inside super. We would get the following prefixes when issuing a command in dirC that has a pathspec for subsubB: subB: ../subB subsubB: ../subB/subsubB An interesting case is when we issue a command in subA: super: .. subB:../subB subsubB: ../subB/subsubB A rule for the prefix option could be: Always specified when crossing a repository boundary with the pathspec (including upwards). I have not completely thought this through though so just take this as some food for thought. Since I am not sure what Junio's rationale behind making the prefix relative to the toplevel superproject was, but I guess finding it could be a challenge in some situations. I.e. is the repository in home directory tracking all the dot-files really the superproject or was it that other one I found before? > >So we may have to rethink what this option name should be. "You > >are running in a repository that is used as a submodule in a > >larger context, which has the submodule at this path" is what the > >option tells the command; if any existing command already has > >such an option, we should use it. If we are inventing one, > >perhaps "--submodule-path" (I didn't check if there are existing > >options that sound similar to it and mean completely different > >things, in which case that name is not usable)? > > Would it make sense to add the '--submodule-path' to a more generic > part of the code? It's not just ls-files/grep that have to solve exactly this > problem. Up to now we just did not go for those commands, though. Yes I think so, since it should also handle starting from a submodule with a pathspec to the superproject or other submodule. In case we go with my above suggestion I would suggest a more generic name since the option could also be passed to processes handling the superproject. E.g. something like --module-prefix or --repository-prefix comes to my mind, not checked though. Cheers Heiko
[PATCH] Documentation/config: default for color.* is color.ui
Since 4c7f181 (make color.ui default to 'auto', 2013-06-10), the default for color.* when nothing is set is 'auto' and we still claimed that the default was 'false'. Be more precise by saying explicitly that the default is to follow color.ui, and recall that the default is 'auto' to avoid one indirection for the reader. Signed-off-by: Matthieu Moy--- Documentation/config.txt | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 32f065c..66429fb 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -953,7 +953,8 @@ color.branch:: A boolean to enable/disable color in the output of linkgit:git-branch[1]. May be set to `always`, `false` (or `never`) or `auto` (or `true`), in which case colors are used - only when the output is to a terminal. Defaults to false. + only when the output is to a terminal. If unset, then the + value of `color.ui` is used (`auto` by default). color.branch.:: Use customized color for branch coloration. `` is one of @@ -968,7 +969,8 @@ color.diff:: linkgit:git-log[1], and linkgit:git-show[1] will use color for all patches. If it is set to `true` or `auto`, those commands will only use color when output is to the terminal. - Defaults to false. + If unset, then the value of `color.ui` is used (`auto` by + default). + This does not affect linkgit:git-format-patch[1] or the 'git-diff-{asterisk}' plumbing commands. Can be overridden on the @@ -991,7 +993,8 @@ color.decorate.:: color.grep:: When set to `always`, always highlight matches. When `false` (or `never`), never. When set to `true` or `auto`, use color only - when the output is written to the terminal. Defaults to `false`. + when the output is written to the terminal. If unset, then the + value of `color.ui` is used (`auto` by default). color.grep.:: Use customized color for grep colorization. `` specifies which @@ -1024,7 +1027,8 @@ color.interactive:: and displays (such as those used by "git-add --interactive" and "git-clean --interactive"). When false (or `never`), never. When set to `true` or `auto`, use colors only when the output is - to the terminal. Defaults to false. + to the terminal. If unset, then the value of `color.ui` is + used (`auto` by default). color.interactive.:: Use customized color for 'git add --interactive' and 'git clean @@ -1040,13 +1044,15 @@ color.showBranch:: A boolean to enable/disable color in the output of linkgit:git-show-branch[1]. May be set to `always`, `false` (or `never`) or `auto` (or `true`), in which case colors are used - only when the output is to a terminal. Defaults to false. + only when the output is to a terminal. If unset, then the + value of `color.ui` is used (`auto` by default). color.status:: A boolean to enable/disable color in the output of linkgit:git-status[1]. May be set to `always`, `false` (or `never`) or `auto` (or `true`), in which case colors are used - only when the output is to a terminal. Defaults to false. + only when the output is to a terminal. If unset, then the + value of `color.ui` is used (`auto` by default). color.status.:: Use customized color for status colorization. `` is -- 2.10.0.rc0.1.g07c9292
Re: Potentially misleading color.* defaults explanation in git-config(1)
Anatoly Borodinwrites: > Hi All! > > git-config(1) says: > >color.branch >A boolean to enable/disable color in the output of git-branch(1). >May be set to always, false (or never) or auto (or true), in which >case colors are used only when the output is to a terminal. So far, so good. >Defaults to false. The truth is: Defaults to following color.ui, which used to default to false but now defaults to auto. My bad, I forgot to update these parts of the docs when changing the default for color.ui (a while back already). Patch follows. > (2) git config color.branch false ; git branch Unrelated from the question, but you could write git -c color.branch=false git branch to set a configuration value just for one command. -- Matthieu Moy http://www-verimag.imag.fr/~moy/
Ich werde warten, von Ihnen zu hören.
-- Lieber Freund, Wie geht es Ihnen heute? Ich habe eine Investitionsmöglichkeit mit Ihnen zu teilen, die die Übertragung einer großen Geldsumme zum gegenseitigen Nutzen für beide von uns betreffen. Mein Name ist Andrew Hau Chung, ich in einem Finanzinstitut arbeiten hier in Hong Kong. Wenn Sie interessiert sind, kontaktieren Sie mich durch meine private E-Mail-Adresse: Ihre prompte Antwort wird geschätzt. Dein, Andrew Hau Chung
Re: Gitattributes file is not respected when switching between branches
Sorry for delay. ".gitattributes" indeed is not present in "master", but this is intentionally It is placed only in following 2 branches: feature-branch unix-feature-branch This is how flow looks on windows $ git --version git version 2.9.3.windows.1 vitalii.ishchenko@DESKTOP-9TC9UPB MINGW64 /c/work/repos/gitattributes (master) $ git ls-files --eol i/lfw/crlf attr/ testfile-crlf.txt vitalii.ishchenko@DESKTOP-9TC9UPB MINGW64 /c/work/repos/gitattributes (master) $ git checkout feature-branch Switched to branch 'feature-branch' Your branch is up-to-date with 'origin/feature-branch'. vitalii.ishchenko@DESKTOP-9TC9UPB MINGW64 /c/work/repos/gitattributes (feature-branch) $ git ls-files --eol i/lfw/lfattr/text eol=lf.gitattributes i/lfw/crlf attr/text eol=lftestfile-crlf.txt On Mon, Sep 12, 2016 at 10:42 PM, Torsten Bögershausenwrote: > On 12.09.16 21:35, Torsten Bögershausen wrote: >> On 12.09.16 14:55, Виталий Ищенко wrote: >>> Good day >>> >>> I faced following issue with gitattributes file (at least eol setting) >>> when was trying to force `lf` mode on windows. >>> >>> We have 2 branches: master & dev. With master set as HEAD in repository >>> >>> I've added `.gitattributes` with following content to `dev` branch >>> >>> ``` >>> * text eol=lf >>> ``` >>> >>> Now when you clone this repo on other machine and checkout dev branch, >>> eol setting is not respected. >>> As a workaround you can rm all files except .git folder and do hard reset. >>> >>> Issue is reproducible on windows & unix versions. Test repo can be >>> found on github >>> https://github.com/betalb/gitattributes-issue >>> >>> master branch - one file without gitattributes >>> feature-branch - .gitattributes added with eol=lf >>> unix-feature-branch - .gitattributes added with eol=crlf >>> >>> Thanks, >>> Vitalii >> Some more information may be needed, to help to debug. >> >> Which version of Git are you using ? >> What does >> >> git ls-files --eol >> >> say ? > Obs, All information was in the email. > > tb@xxx:/tmp/gitattributes-issue> git ls-files --eol > i/lfw/lfattr/ testfile-crlf.txt > tb@xxx:/tmp/gitattributes-issue> ls -al > total 8 > drwxr-xr-x 4 tbwheel 136 Sep 12 21:38 . > drwxrwxrwt 19 root wheel 646 Sep 12 21:38 .. > drwxr-xr-x 13 tbwheel 442 Sep 12 21:38 .git > -rw-r--r-- 1 tbwheel 60 Sep 12 21:38 testfile-crlf.txt > tb@xxx:/tmp/gitattributes-issue> > > Could it be that you didn't commit the file ".gitattributes" ? > This could help: > git add .gitattributes && git commit -m "Add .gitattributes" > > > > > > >