[bug] Git submodule command interprets switch as argument and switch
The issue is as follows: R0b0t1@host:~/devel/project$ git submodule add https://github.com/user/project -f Cloning into '/home/R0b0t1/devel/project/-f'... My .gitignore's first line is *, and then I explicitly allow things. Despite the presence of "project/" in the .gitignore the submodule command says it is ignored. The "force" flag is interpreted as a flag and also as the destination directory. It is possible the argument parsing code for other commands exhibits this error. R0b0t1.
Re: Submodule regression in 2.14?
On Thu, Aug 17, 2017 at 7:13 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> Are you saying this might be a design mistake and >> the .update ought to be respected by all the other >> commands? For example >> git reset --recurse-submodules >> should ignore the .update= none? > > I have been under the impression that that has been the traditional > desire of what .update ought to mean. I personally do not have a > strong opinion---at least not yet. In this context note v2.14.0-rc1-34-g7463e2ec3 (bw/submodule-config-cleanup~7, "unpack-trees: don't respect submodule.update") that is going opposite of your impression. >> When designing these new recursive submodule functionality >> outside the "submodule" command, I'd want submodules >> to behave as much as possible like trees. > > I think that is sensible as long as the user does not explicitly say > "this and that submodule behave differently" by giving configuration > variables. Perhaps .update is one of those that should countermand > the default behaviour of "--recurse-submodules"? Maybe, I'll think about it. However there is no such equivalent for trees (and AFAICT never came up) to treat a specific directory other than the rest in worktree operations. The problem with the issue in question is however: git-pull is a combination of two other high level commands (fetch/merge), the fetch component already had a recursive behavior, and that commit in question added a bit for the merge component, so the UX is hard to get right for both of them: git pull --recurse=fetch-only git pull --recurse=merge-respects-update-strategy is what I'd want to avoid. So maybe we can just respect the update strategy before starting the local part. Stefan
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
On Friday 18 August 2017 01:25 AM, Junio C Hamano wrote: Martin Ågrenwrites: On 17 August 2017 at 04:54, Kaartic Sivaraam wrote: Helped-by: Martin Ågren , Junio C Hamano Signed-off-by: Kaartic Sivaraam I didn't expect a "Helped-by", all I did was to give some random comments. :-) I'm not so sure about the comma-separation, that seems to be a first in the project. I didn't either ;-) The line looks odd so I'll remove it while queuing. Thanks for noticing. I should have been better with my wordings :) How about converting that line into two 'Suggestions-by:' or 'Reviewed-by:' ? --- Kaartic
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
On Friday 18 August 2017 01:28 AM, Junio C Hamano wrote: Kaartic Sivaraamwrites: diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt index 81bd0a7b7..948d9c9ef 100644 --- a/Documentation/git-branch.txt +++ b/Documentation/git-branch.txt @@ -195,10 +195,10 @@ start-point is either a local or remote-tracking branch. branch.autoSetupMerge configuration variable is true. --set-upstream:: - If specified branch does not exist yet or if `--force` has been - given, acts exactly like `--track`. Otherwise sets up configuration - like `--track` would when creating the branch, except that where - branch points to is not changed. + As this option had confusing syntax it's no longer supported. Please use + --track or --set-upstream-to instead. ++ +Note: This could possibly become an alias of --set-upstream-to in the future. I'll tweak `--track` and `--set-upstream-to` in the updated text and remove the 'Note:' thing that does not give any useful information to the end users while queuing (no need to resend). I thought I explained the reason for adding the note in one of the previous mails. Here's the portion of the mail, On Monday 14 August 2017 02:20 PM, Kaartic Sivaraam wrote: > > On Wednesday 09 August 2017 12:03 AM, Martin Ågren wrote: >> >> Maybe the final note could be removed? Someone who is looking up >> --set-upstream because Git just "crashed" on them will only want to know >> what they should do instead. Our thoughts about the future are perhaps >> not that interesting. > > I thought it's better to document it to avoid people from getting surprised > when the options *starts working* again. > I hope that explains the reason. --- Kaartic
Re: [PATCH/RFC] git-grep: correct exit code with --quiet and -L
Anthony Sottilewrites: > The handling of `status_only` no longer interferes with the handling of > `unmatch_name_only`. `--quiet` no longer affects the exit code when using > `-L`/`--files-without-match`. > > Signed-off-by: Anthony Sottile > --- Thanks, Will queue. > grep.c | 2 +- > t/t7810-grep.sh | 5 + > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/grep.c b/grep.c > index 2efec0e..c9e7cc7 100644 > --- a/grep.c > +++ b/grep.c > @@ -1821,7 +1821,7 @@ static int grep_source_1(struct grep_opt *opt, struct > grep_source *gs, int colle > return 0; > > if (opt->status_only) > - return 0; > + return opt->unmatch_name_only; > if (opt->unmatch_name_only) { > /* We did not see any hit, so we want to show this */ > show_name(opt, gs->name); > diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh > index f106387..2a6679c 100755 > --- a/t/t7810-grep.sh > +++ b/t/t7810-grep.sh > @@ -374,6 +374,11 @@ test_expect_success 'grep -L -C' ' > test_cmp expected actual > ' > > +test_expect_success 'grep --files-without-match --quiet' ' > + git grep --files-without-match --quiet nonexistent_string >actual && > + test_cmp /dev/null actual > +' > + > cat >expected < file:foo mmap bar_mmap > EOF
Re: [PATCH v3 0/4] Modernize read_graft_line implementation
Thanks, will queue.
Re: Submodule regression in 2.14?
Stefan Bellerwrites: > Are you saying this might be a design mistake and > the .update ought to be respected by all the other > commands? For example > git reset --recurse-submodules > should ignore the .update= none? I have been under the impression that that has been the traditional desire of what .update ought to mean. I personally do not have a strong opinion---at least not yet. > When designing these new recursive submodule functionality > outside the "submodule" command, I'd want submodules > to behave as much as possible like trees. I think that is sensible as long as the user does not explicitly say "this and that submodule behave differently" by giving configuration variables. Perhaps .update is one of those that should countermand the default behaviour of "--recurse-submodules"?
[PATCH v3 3/4] commit: allocate array using object_id size
struct commit_graft aggregates an array of object_id's, which have size >= GIT_MAX_RAWSZ bytes. This change prevents memory allocation error when size of object_id is larger than GIT_SHA1_RAWSZ. Signed-off-by: Patryk Obara--- commit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/commit.c b/commit.c index 019e733..61528a5 100644 --- a/commit.c +++ b/commit.c @@ -149,7 +149,8 @@ struct commit_graft *read_graft_line(struct strbuf *line) if ((len + 1) % entry_size) goto bad_graft_data; i = (len + 1) / entry_size - 1; - graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i))); + graft = xmalloc(st_add(sizeof(*graft), + st_mult(sizeof(struct object_id), i))); graft->nr_parent = i; if (get_oid_hex(buf, >oid)) goto bad_graft_data; -- 2.9.5
[PATCH v3 4/4] commit: rewrite read_graft_line
Determine the number of object_id's to parse in a single graft line by counting separators (whitespace characters) instead of dividing by length of hash representation. This way graft parsing code can support different sizes of hashes without any further code adaptations. Signed-off-by: Patryk Obara--- commit.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/commit.c b/commit.c index 61528a5..46ee2db 100644 --- a/commit.c +++ b/commit.c @@ -137,29 +137,27 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups) struct commit_graft *read_graft_line(struct strbuf *line) { /* The format is just "Commit Parent1 Parent2 ...\n" */ - int i, len; - char *buf = line->buf; + int i, n; + const char *tail = NULL; struct commit_graft *graft = NULL; - const int entry_size = GIT_SHA1_HEXSZ + 1; strbuf_rtrim(line); if (line->buf[0] == '#' || line->len == 0) return NULL; - len = line->len; - if ((len + 1) % entry_size) - goto bad_graft_data; - i = (len + 1) / entry_size - 1; + /* count number of blanks to determine size of array to allocate */ + for (i = 0, n = 0; i < line->len; i++) + if (isspace(line->buf[i])) + n++; graft = xmalloc(st_add(sizeof(*graft), - st_mult(sizeof(struct object_id), i))); - graft->nr_parent = i; - if (get_oid_hex(buf, >oid)) + st_mult(sizeof(struct object_id), n))); + graft->nr_parent = n; + if (parse_oid_hex(line->buf, >oid, )) goto bad_graft_data; - for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) { - if (buf[i] != ' ') - goto bad_graft_data; - if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash)) + for (i = 0; i < graft->nr_parent; i++) + if (!isspace(*tail++) || parse_oid_hex(tail, >parent[i], )) goto bad_graft_data; - } + if (tail[0] != '\0') + goto bad_graft_data; return graft; bad_graft_data: -- 2.9.5
[PATCH v3 1/4] sha1_file: fix definition of null_sha1
The array is declared in cache.h as: extern const unsigned char null_sha1[GIT_MAX_RAWSZ]; Definition in sha1_file.c must match. Signed-off-by: Patryk Obara--- sha1_file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sha1_file.c b/sha1_file.c index b60ae15..f5b5bec 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -32,7 +32,7 @@ #define SZ_FMT PRIuMAX static inline uintmax_t sz_fmt(size_t s) { return s; } -const unsigned char null_sha1[20]; +const unsigned char null_sha1[GIT_MAX_RAWSZ]; const struct object_id null_oid; const struct object_id empty_tree_oid = { EMPTY_TREE_SHA1_BIN_LITERAL -- 2.9.5
[PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
This simplifies function declaration and allows for use of strbuf_rtrim instead of modifying buffer directly. Signed-off-by: Patryk Obara--- builtin/blame.c | 2 +- commit.c| 15 --- commit.h| 2 +- 3 files changed, 10 insertions(+), 9 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index bda1a78..d4472e9 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file) return -1; while (!strbuf_getwholeline(, fp, '\n')) { /* The format is just "Commit Parent1 Parent2 ...\n" */ - struct commit_graft *graft = read_graft_line(buf.buf, buf.len); + struct commit_graft *graft = read_graft_line(); if (graft) register_commit_graft(graft, 0); } diff --git a/commit.c b/commit.c index 8b28415..019e733 100644 --- a/commit.c +++ b/commit.c @@ -134,17 +134,18 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups) return 0; } -struct commit_graft *read_graft_line(char *buf, int len) +struct commit_graft *read_graft_line(struct strbuf *line) { /* The format is just "Commit Parent1 Parent2 ...\n" */ - int i; + int i, len; + char *buf = line->buf; struct commit_graft *graft = NULL; const int entry_size = GIT_SHA1_HEXSZ + 1; - while (len && isspace(buf[len-1])) - buf[--len] = '\0'; - if (buf[0] == '#' || buf[0] == '\0') + strbuf_rtrim(line); + if (line->buf[0] == '#' || line->len == 0) return NULL; + len = line->len; if ((len + 1) % entry_size) goto bad_graft_data; i = (len + 1) / entry_size - 1; @@ -161,7 +162,7 @@ struct commit_graft *read_graft_line(char *buf, int len) return graft; bad_graft_data: - error("bad graft data: %s", buf); + error("bad graft data: %s", line->buf); free(graft); return NULL; } @@ -174,7 +175,7 @@ static int read_graft_file(const char *graft_file) return -1; while (!strbuf_getwholeline(, fp, '\n')) { /* The format is just "Commit Parent1 Parent2 ...\n" */ - struct commit_graft *graft = read_graft_line(buf.buf, buf.len); + struct commit_graft *graft = read_graft_line(); if (!graft) continue; if (register_commit_graft(graft, 1)) diff --git a/commit.h b/commit.h index 6d857f0..baecc0a 100644 --- a/commit.h +++ b/commit.h @@ -247,7 +247,7 @@ struct commit_graft { }; typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *); -struct commit_graft *read_graft_line(char *buf, int len); +struct commit_graft *read_graft_line(struct strbuf *line); int register_commit_graft(struct commit_graft *, int); struct commit_graft *lookup_commit_graft(const struct object_id *oid); -- 2.9.5
[PATCH v3 0/4] Modernize read_graft_line implementation
Changes since v2: - commit implementing free_graft dropped (no longer needed) - several more lines moved from last commit to commit replacing raw buffer with strbuf - fix for memory allocation separated from change in hash parsing - commit_graft struct uses FLEX_ARRAY again, meaning free_graft, memset, nor oid_array were not needed after all Patryk Obara (4): sha1_file: fix definition of null_sha1 commit: replace the raw buffer with strbuf in read_graft_line commit: allocate array using object_id size commit: rewrite read_graft_line builtin/blame.c | 2 +- commit.c| 38 +++--- commit.h| 2 +- sha1_file.c | 2 +- 4 files changed, 22 insertions(+), 22 deletions(-) -- 2.9.5 base-commit: b3622a4ee94e4916cd05e6d96e41eeb36b941182
[PATCH/RFC] git-grep: correct exit code with --quiet and -L
The handling of `status_only` no longer interferes with the handling of `unmatch_name_only`. `--quiet` no longer affects the exit code when using `-L`/`--files-without-match`. Signed-off-by: Anthony Sottile--- grep.c | 2 +- t/t7810-grep.sh | 5 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index 2efec0e..c9e7cc7 100644 --- a/grep.c +++ b/grep.c @@ -1821,7 +1821,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle return 0; if (opt->status_only) - return 0; + return opt->unmatch_name_only; if (opt->unmatch_name_only) { /* We did not see any hit, so we want to show this */ show_name(opt, gs->name); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index f106387..2a6679c 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -374,6 +374,11 @@ test_expect_success 'grep -L -C' ' test_cmp expected actual ' +test_expect_success 'grep --files-without-match --quiet' ' + git grep --files-without-match --quiet nonexistent_string >actual && + test_cmp /dev/null actual +' + cat >expected <
Re: reftable [v7]: new ref storage format
On Tue, Aug 15, 2017 at 7:48 PM, Shawn Pearcewrote: > 7th iteration of the reftable storage format. > > You can read a rendered version of this here: > https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md FYI, JGit has adopted this v7 draft as version 1 of the reftable format. The file reader/writer implementations will be in the next version of JGit. Transaction support in the $GIT_DIR layout still needs to be implemented.
Re: [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply
tbo...@web.de writes: > @@ -1712,11 +1726,15 @@ static int parse_fragment(struct apply_state *state, > if (!deleted && !added) > leading++; > trailing++; > + if (!state->apply_in_reverse) > + check_old_for_crlf(patch, line, len); > if (!state->apply_in_reverse && > state->ws_error_action == correct_ws_error) > check_whitespace(state, line, len, > patch->ws_rule); > break; This one is wrong. You are looking at " " (common context) and you should unconditionally call check_old for them. > case '-': > + if (!state->apply_in_reverse) > + check_old_for_crlf(patch, line, len); This is correct. There is "case '+':" below here you did not touch. There should be case '+': + if (state->apply_in_reverse) + check_old_for_crlf(...); there. Note that we call check_old() only when applying in reverse. > @@ -2268,8 +2286,11 @@ static void show_stats(struct apply_state *state, > struct patch *patch) > add, pluses, del, minuses); > } > > -static int read_old_data(struct stat *st, const char *path, struct strbuf > *buf) > +static int read_old_data(struct stat *st, struct patch *patch, > + const char *path, struct strbuf *buf) The order of argument to have the patch structure earlier is different from my version; what we see here looks much better to me. > { > + enum safe_crlf safe_crlf = patch->crlf_in_old ? > + SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; > switch (st->st_mode & S_IFMT) { > case S_IFLNK: > if (strbuf_readlink(buf, path, st->st_size) < 0) > @@ -2278,7 +2299,15 @@ static int read_old_data(struct stat *st, const char > *path, struct strbuf *buf) > case S_IFREG: > if (strbuf_read_file(buf, path, st->st_size) != st->st_size) > return error(_("unable to open or read %s"), path); > - convert_to_git(_index, path, buf->buf, buf->len, buf, 0); > +/* > + * "git apply" without "--index/--cached" should never look > + * at the index; the target file may not have been added to > + * the index yet, and we may not even be in any Git repository. > + * Pass NULL to convert_to_git() to stress this; the function > + * should never look at the index when explicit crlf option > + * is given. > + */ > + convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf); This comment is somewhat strangly indented. I thought opening "/*" alighs with the usual tab stop. Thanks.
Re: [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply
tbo...@web.de writes: > Changes since v2: > - Manually integrated all code changes from Junio > (Thanks, I hope that I didn't miss something) I suspect that "apply -R makes '+' preimage" change is not here. > - Having examples of "git diff" in the commit message confuses "git apply", > so that all examples for git diff have a '*' at the beginnig of the line > (V2 used '$' which is typically an example for a shell script) Just FYI we tend to just indent them further, just like any displayed material in the proposed log message. > - The official version to apply the CRLF-rules without having an index is > SAFE_CRLF_RENORMALIZE, that is already working today. Ah, good find. I forgot about that thing you added some time ago.
Beloved
I am Mrs Susan Patrick i have a proposal donation if you are interested get back to me via email: susanpatrick...@gmail.com Thanks Hope to hear from you soon Mrs Susan Patrick --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Re: Submodule regression in 2.14?
On Thu, Aug 17, 2017 at 2:21 PM, Lars Schneiderwrote: > >> Oh, wait. >> $ git log --oneline v2.13.0..v2.14.1 -- builtin/pull.c >> c9c63ee558 Merge branch 'sb/pull-rebase-submodule' >> a6d7eb2c7a pull: optionally rebase submodules (remote submodule changes only) >> could also be a culprit. Do you have pull.rebase set? > > I bisected the problem today and "a6d7eb2c7a pull: optionally rebase > submodules > (remote submodule changes only)" is indeed the culprit. > > The commit seems to break the following test case. > > diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh > index dcac364c5f..24f9729015 100755 > --- a/t/t7400-submodule-basic.sh > +++ b/t/t7400-submodule-basic.sh > @@ -1289,4 +1289,19 @@ test_expect_success 'init properly sets the config' ' > test_must_fail git -C multisuper_clone config --get > submodule.sub1.active > ' > > +test_expect_success 'submodule update and git pull with disabled submodule' ' > + test_when_finished "rm -rf multisuper_clone" && > + pwd=$(pwd) && > + git clone file://"$pwd"/multisuper multisuper_clone && > + ( > + cd multisuper_clone && > + git config --local submodule.sub0.update none && > + git submodule update --init --recursive && > + git pull --recurse-submodules && > + git submodule status | cut -c 1,43- >actual > + ) && > + ls && > + test_cmp expect multisuper_clone/actual > +' Thanks for providing this test. cd trash directory.t7400-submodule-basic/multisuper_clone cat .git/config [submodule "sub0"] update = none active = true url = file:///.../t/trash directory.t7400-submodule-basic/sub1 submodule..update The default update procedure for a submodule. This variable is populated by git submodule init from the gitmodules(5) file. See description of update command in git-submodule(1). The first sentence of .update is misleading IMHO as the these settings should strictly apply to the "submodule update" command. So "pull --recurse-submodules" ought to ignore it, instead the pull can do whatever it wants, namely treat the submodule roughly like a tree and either merge/rebase inside the submodule as well. The user *asked* for recursive pull after all. Are you saying this might be a design mistake and the .update ought to be respected by all the other commands? For example git reset --recurse-submodules should ignore the .update= none? When designing these new recursive submodule functionality outside the "submodule" command, I'd want submodules to behave as much as possible like trees. ideas? Thanks, Stefan > + > test_done > > > I am not familiar with the code. Does anyone see the problem > right away? > > Thanks, > Lars > >
Re: [PATCH] sub-process: print the cmd when a capability is unsupported
On Thu, 17 Aug 2017 23:34:33 +0200 Lars Schneiderwrote: > > > On 17 Aug 2017, at 23:01, Junio C Hamano wrote: > > > > Christian Couder writes: > > > >> ... but I think we should then emphasize more in our test > >> scripts (maybe by giving a good example) and perhaps also in the doc > >> that the filters/sub-processes should really pay attention and not > >> output any capability that are not supported by Git. > > > > Oh, absolutely. If you know there is such a breakage in our test > > script, please do fix it. > > > > Thanks. > > Junio's reasoning [1] is spot on from my point of view. Agreed. > > I intentionally did not add the negotiation to the test code to keep > the test as simple as possible. I think this is the correct approach - the test was testing Git's behavior, not the filter's behavior. (Although, if someone wanted to add a test for a misbehaving filter, that would be great, although such a test would have hardcoded output from the filter anyway.) > However, I wrote this in the > gitattributes docs [2]: > > After the version negotiation Git sends a list of all capabilities that > it supports and a flush packet. Git expects to read a list of desired > capabilities, which must be a subset of the supported capabilities list, > and a flush packet as response: > > Maybe we should revive "Documentation/technical/api-sub-process.txt" [3] > after all to explain these kind of things? As for reviving that specific file, I saw you wrote a similar comment but I didn't reply to it - sorry. The commit in question is 7e2e1bb ("Documentation: migrate sub-process docs to header", 2017-07-26), and in that commit, everything in api-sub-process.txt was migrated. I think it's better for such documentation to be in a .h file, rather than a .txt file. As for describing the Long Running Process Protocol in a Documentation/.../txt file, my plan was to create a new file specifically for that, as you can see in an e-mail I sent [4]. I'm OK with putting it in a different place, but it probably shouldn't be named "api", because those are for internal Git APIs. [4] https://public-inbox.org/git/eadce97b6a1e80345a2621e71ce187e9e6bc05bf.1501532294.git.jonathanta...@google.com/ > [1] https://public-inbox.org/git/xmqq8tijpkrv@gitster.mtv.corp.google.com/ > [2] > https://github.com/git/git/blob/b3622a4ee94e4916cd05e6d96e41eeb36b941182/Documentation/gitattributes.txt#L408-L411 > [3] > https://public-inbox.org/git/20170807102136.30b23...@twelve2.svl.corp.google.com/
[PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply
From: Torsten BögershausenWhen a file had been commited with CRLF but now .gitattributes say "* text=auto" (or core.autocrlf is true), the following does not roundtrip, `git apply` fails: printf "Added line\r\n" >>file && git diff >patch && git checkout -- . && git apply patch Before applying the patch, the file from working tree is converted into the index format (clean filter, CRLF conversion, ...) Here, when commited with CRLF, the line endings should not be converted. Note that `git apply --index` or `git apply --cache` doesn't call convert_to_git() because the source material is already in index format. Analyze the patch if there is a) any context line with CRLF, or b) if any line with CRLF is to be removed. In this case the patch file `patch` has mixed line endings, for a) it looks like this (ignore the * at the begin of the line): * diff --git a/one b/one * index 533790e..c30dea8 100644 * --- a/one * +++ b/one * @@ -1 +1,2 @@ * a\r * +b\r And for b) it looks like this: * diff --git a/one b/one * index 533790e..485540d 100644 * --- a/one * +++ b/one * @@ -1 +1 @@ * -a\r * +b\r If `git apply` detects that the patch itself has CRLF, (look at the line " a\r" or "-a\r" above), the new flag crlf_in_old is set in "struct patch" and two things will happen: - read_old_data() will not convert CRLF into LF by calling convert_to_git(..., SAFE_CRLF_KEEP_CRLF); - The WS_CR_AT_EOL bit is set in the "white space rule", CRLF are no longer treated as white space. Thanks to Junio C Hamano, his input became the base for the changes in t4124. One test case is split up into 3: - Detect the " a\r" line in the patch - Detect the "-a\r" line in the patch - Use LF in repo and CLRF in the worktree. Reported-by: Anthony Sottile Signed-off-by: Torsten Bögershausen --- Changes since v2: - Manually integrated all code changes from Junio (Thanks, I hope that I didn't miss something) - Having examples of "git diff" in the commit message confuses "git apply", so that all examples for git diff have a '*' at the beginnig of the line (V2 used '$' which is typically an example for a shell script) - The official version to apply the CRLF-rules without having an index is SAFE_CRLF_RENORMALIZE, that is already working today. - Now we have convert_to_git(NULL, ..., safe_crlf) with enum safe_crlf safe_crlf = patch->crlf_in_old ? SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; apply.c | 40 +++- t/t4124-apply-ws-rule.sh | 33 +++-- 2 files changed, 62 insertions(+), 11 deletions(-) diff --git a/apply.c b/apply.c index f2d599141d..691f47c783 100644 --- a/apply.c +++ b/apply.c @@ -220,6 +220,7 @@ struct patch { unsigned int recount:1; unsigned int conflicted_threeway:1; unsigned int direct_to_threeway:1; + unsigned int crlf_in_old:1; struct fragment *fragments; char *result; size_t resultsize; @@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state, record_ws_error(state, result, line + 1, len - 2, state->linenr); } +/* + * Check if the patch has context lines with CRLF or + * the patch wants to remove lines with CRLF. + */ +static void check_old_for_crlf(struct patch *patch, const char *line, int len) +{ + if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') { + patch->ws_rule |= WS_CR_AT_EOL; + patch->crlf_in_old = 1; + } +} + + /* * Parse a unified diff. Note that this really needs to parse each * fragment separately, since the only way to know the difference @@ -1712,11 +1726,15 @@ static int parse_fragment(struct apply_state *state, if (!deleted && !added) leading++; trailing++; + if (!state->apply_in_reverse) + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action == correct_ws_error) check_whitespace(state, line, len, patch->ws_rule); break; case '-': + if (!state->apply_in_reverse) + check_old_for_crlf(patch, line, len); if (state->apply_in_reverse && state->ws_error_action != nowarn_ws_error) check_whitespace(state, line, len, patch->ws_rule); @@ -2268,8 +2286,11 @@ static void show_stats(struct apply_state *state, struct patch *patch) add, pluses, del, minuses); } -static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) +static int read_old_data(struct stat *st, struct patch *patch, +const char *path, struct strbuf *buf) { + enum
[PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
From: Torsten BögershausenWhen convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen --- convert.c | 10 ++ convert.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index deaf0ba7b3..040123b4fe 100644 --- a/convert.c +++ b/convert.c @@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate, src = dst->buf; len = dst->len; } - ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); - if (ret && dst) { - src = dst->buf; - len = dst->len; + if (checksafe != SAFE_CRLF_KEEP_CRLF) { + ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } } return ret | ident_to_git(path, src, len, dst, ca.ident); } diff --git a/convert.h b/convert.h index cecf59d1aa..cabd5ed6dd 100644 --- a/convert.h +++ b/convert.h @@ -10,7 +10,8 @@ enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, SAFE_CRLF_WARN = 2, - SAFE_CRLF_RENORMALIZE = 3 + SAFE_CRLF_RENORMALIZE = 3, + SAFE_CRLF_KEEP_CRLF = 4 }; extern enum safe_crlf safe_crlf; -- 2.14.1.145.gb3622a4ee9
Re: upgarding GIT
On Mon, Aug 7, 2017 at 4:20 PM, James Wellswrote: > > As you can see my version of git is not supported with the current version of > bitbucket. I will have to perform a two stage upgrade anyway as the version > of STASH I am running cannot be upgraded directly to bitbucket 5.2 as well. > > Is there an easy way just to install a higher support version of git like 2.9 > on the same server and then move all the repos and basically everything > across. Can you install another TAR ball later version on top of another git > , so it's like overwriting it. Hey James, Bitbucket Server developer here. Sorry for not responding sooner. I just wanted to check in with you and see if you'd gotten this all resolved? If not, feel free to message me directly; I'm happy to help you off the list. (Or on list, for that matter; I just figure the general Git list isn't interested in this issue.) Best regards, Bryan Turner
Re: [PATCH v2 4/4] commit: rewrite read_graft_line
Understood :) - yes, oid_array is not completely necessary here - and it gives the wrong impression about usage of this struct. FLEX_ARRAY will be brought back in v3. On Thu, Aug 17, 2017 at 11:20 PM, Junio C Hamanowrote: > Patryk Obara writes: > >> The previous implementation of read_graft_line used calculations based >> on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit >> ids in a single graft line. New implementation does not depend on these >> constants, so it adapts to any object_id buffer size. >> >> To make this possible, FLEX_ARRAY of object_id in struct was replaced >> by an oid_array. > > There is a leap in logic between the two paragraphs. Your use of > parse_oid_hex() is good. But I do not think moving the array body > to outside commit_graft structure and forcing it to be separately > allocated is necessary or beneficial. When we got a single line, we > know how many fake parents a child described by that graft line has, > and you can still use of FLEX_ARRAY to avoid separate allocation > and need for separate freeing of it. -- | ← Ceci n'est pas une pipe Patryk Obara
Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
> In fact, the former is already how we represent the list of fake > parents in the commit_graft structure, so I think patch 5/5 in this > series does two unrelated things, one of which is bad (i.e. use of > parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a > oid_array that requires a separate allocation of the array is bad). Agreed; I already split patch 5 into two separate changes (one fixing memory allocation issue, one parsing object_ids into FLEX_ARRAY, without modifying graft struct). In result patch 4 (free_graft) can be dropped. I will send these changes as v3. On Thu, Aug 17, 2017 at 11:17 PM, Junio C Hamanowrote: > Jeff King writes: > >> I'd expect most of the GIT_MAX constants to eventually go away in favor >> of "struct object_id", but that will still be using the same "big enough >> to hold any hash" size under the hood. > > Indeed. It is good to see major contributors are in agreement ;-) > I'd expect that an array of "struct object_id" would be how a fixed > number of object names would be represented, i.e. > > struct object_id thing[num_elements]; > > instead of an array of uchar that is MAX bytes long, i.e. > > unsigned char name[GIT_MAX_RAWSZ][num_elements]; > > In fact, the former is already how we represent the list of fake > parents in the commit_graft structure, so I think patch 5/5 in this > series does two unrelated things, one of which is bad (i.e. use of > parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a > oid_array that requires a separate allocation of the array is bad). > >> Agreed. Most code should be dealing with the abstract concept of a hash >> and shouldn't have to care about the size. I really like parse_oid_hex() >> for that reason (and I think parsing is the main place we've found that >> needs to care). > > Yes. -- | ← Ceci n'est pas une pipe Patryk Obara
Re: [RFC PATCH] Updated "imported object" design
Thanks for your comments. I'll reply to both your e-mails in this one e-mail. > This illustrates another place we need to resolve the > naming/vocabulary. We should at least be consistent to make it easier > to discuss/explain. We obviously went with "virtual" when building > GVFS but I'm OK with "lazy" as long as we're consistent. Some > examples of how the naming can clarify or confuse: > > 'Promise-enable your repo by setting the "extensions.lazyObject" flag' > > 'Enable your repo to lazily fetch objects by setting the > "extensions.lazyObject"' > > 'Virtualize your repo by setting the "extensions.virtualize" flag' > > We may want to carry the same name into the filename we use to mark > the (virtualized/lazy/promised/imported) objects. > > (This reminds me that there are only 2 hard problems in computer > science...) ;) Good point about the name. Maybe the 2nd one is the best? (Mainly because I would expect a "virtualized" repo to have virtual refs too.) But if there was a good way to refer to the "anti-projection" in a virtualized system (that is, the "real" thing or "object" behind the "virtual" thing or "image"), then maybe the "virtualized" language is the best. (And I would gladly change - I'm having a hard time coming up with a name for the "anti-projection" in the "lazy" language.) Also, I should probably standardize on "lazily fetch" instead of "lazily load". I didn't want to overlap with the existing fetching, but after some thought, it's probably better to do that. The explanation would thus be that you can either use the built-in Git fetcher (to be built, although I have an old version here [1]) or supply a custom fetcher. [1] https://github.com/jonathantanmy/git/commits/partialclone > I think this all works and would meet the requirements we've been > discussing. The big trade off here vs what we first discussed with > promises is that we are generating the list of promises on the fly > when they are needed rather than downloading and maintaining a list > locally. > > My biggest concern with this model is the cost of opening and parsing > every imported object (loose and pack for local and alternates) to > build the oidset of promises. > > In fsck this probably won't be an issue as it already focuses on > correctness at the expense of speed. I'm more worried about when we > add the same/similar logic into check_connected. That impacts fetch, > clone, and receive_pack. > > I guess the only way we can know for sure it to do a perf test and > measure the impact. As for fetching from the main repo, the connectivity check does not need to be performed at all because all objects are "imported", so the performance of the connectivity check does not matter. Same for cloning. This is not true if you're fetching from another repo or if you're using receive-pack, but (1) I think these are not used as much in such a situation, and (2) if you do use them, the slowness only "kicks in" if you do not have the objects referred to (whether non-"imported" or "imported") and thus have to check the references in all "imported" objects. > I think this topic should continue to move forward so that we can > provide reasonable connectivity tests for fsck and check_connected in > the face of partial clones. I'm not sure the prototype implementation > of reading/parsing all imported objects to build the promised oidset is > the most performant model but we can continue to investigate the best > options. Agreed - I think the most important thing here is settling on the API (name of extension and the nature of the object mark). > Given all we need is an existance check for a given oid, This is true... > I wonder if it > would be faster overall to do a binary search through the list of > imported idx files + an existence test for an imported loose object. ...but what we're checking is the existence of a reference, not the existence of an object. For a concrete example, consider what happens if we both have an "imported" tree and a non-"imported" tree that references a blob that we do not have. When checking the non-"imported" tree for connectivity, we have to iterate through all "imported" trees to see if any can vouch for the existence of such a blob. We cannot merely binary-search the .idx file.
Re: [PATCH] files-backend: cheapen refname_available check when locking refs
Jeff Kingwrites: >> diff --git a/refs/files-backend.c b/refs/files-backend.c >> index e9b95592b6..f2a420c611 100644 >> --- a/refs/files-backend.c >> +++ b/refs/files-backend.c >> @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, >> >> /* >> * If the ref did not exist and we are creating it, >> - * make sure there is no existing ref that conflicts >> - * with refname: >> + * make sure there is no existing packed ref that >> + * conflicts with refname: >> */ >> if (refs_verify_refname_available( >> ->base, refname, >> +refs->packed_ref_store, refname, >> extras, skip, err)) >> goto error_return; >> } > > This seems too easy to be true. :) But I think it matches what we were > doing before 524a9fdb51 (so it's correct), and the performance numbers > don't lie. Thanks, all. The log message explained the change very well, even though I agree that the patch text does indeed look too easy to be true ;-). Will queue.
Re: Inconsistent exit code for `git grep --files-without-match` with `--quiet`
Anthony reported this issue to bug-g...@gnu.org. From what I can see, git-grep's behavior is better, so I changed GNU grep to behave like git-grep. Please see: https://debbugs.gnu.org/28105 I hope this saves you folks a little bit of work.
Re: [PATCH] sub-process: print the cmd when a capability is unsupported
> On 17 Aug 2017, at 23:01, Junio C Hamanowrote: > > Christian Couder writes: > >> ... but I think we should then emphasize more in our test >> scripts (maybe by giving a good example) and perhaps also in the doc >> that the filters/sub-processes should really pay attention and not >> output any capability that are not supported by Git. > > Oh, absolutely. If you know there is such a breakage in our test > script, please do fix it. > > Thanks. Junio's reasoning [1] is spot on from my point of view. I intentionally did not add the negotiation to the test code to keep the test as simple as possible. However, I wrote this in the gitattributes docs [2]: After the version negotiation Git sends a list of all capabilities that it supports and a flush packet. Git expects to read a list of desired capabilities, which must be a subset of the supported capabilities list, and a flush packet as response: Maybe we should revive "Documentation/technical/api-sub-process.txt" [3] after all to explain these kind of things? - Lars [1] https://public-inbox.org/git/xmqq8tijpkrv@gitster.mtv.corp.google.com/ [2] https://github.com/git/git/blob/b3622a4ee94e4916cd05e6d96e41eeb36b941182/Documentation/gitattributes.txt#L408-L411 [3] https://public-inbox.org/git/20170807102136.30b23...@twelve2.svl.corp.google.com/
Re: [PATCH] check-ref-format: require a repository for --branch
Jeff Kingwrites: > On Mon, Jul 17, 2017 at 10:27:09AM -0700, Jonathan Nieder wrote: > >> > --- a/t/t1402-check-ref-format.sh >> > +++ b/t/t1402-check-ref-format.sh >> > @@ -161,6 +161,10 @@ test_expect_success 'check-ref-format --branch from >> > subdir' ' >> >test "$refname" = "$sha1" >> > ' >> > >> > +test_expect_success 'check-ref-format --branch from non-repo' ' >> > + test_must_fail nongit git check-ref-format --branch @{-1} >> > +' >> > + >> > valid_ref_normalized() { >> >prereq= >> >case $1 in >> >> I don't think it's right. Today I can do >> >> $ cd /tmp >> $ git check-ref-format --branch master >> master >> >> You might wonder why I'd ever do such a thing. But that's what "git >> check-ref-format --branch" is for --- it is for taking a >> argument and turning it into a branch name. For example, if you have >> a script with an $opt_branch variable that defaults to "master", it >> may do >> >> resolved_branch=$(git check-ref-format --branch "$opt_branch") >> >> even though it is in a mode that not going to have to use >> $resolved_branch and it is not running from a repository. > > I'm not sure I buy that. What does "turning it into a branch name" even > mean when you are not in a repository? Clearly @{-1} must fail. And > everything else is just going to output the exact input you provided. This "just going to output the exact input" is not entirely correct; there is just one use case for it. "git check-ref-format --branch a..b" would fail with a helpful error message, while the command run with "a.b" would tell you the name is safe. Use of 'check-ref-format --branch "@{-1}"' *IS* a nonsense, whether it is inside or outside a repository; it is OK to fail it outside a repository and I would say it is even OK to fail it inside a repository. After all "check-ref-format" is about checking if the string is a sensible name to use. I think calling interpret_branch_name() in the codepath is a mistake and we should instead report if "refs/heads/@{-1}" literally is a valid refname we could create instead.
Re: Submodule regression in 2.14?
> On 16 Aug 2017, at 20:51, Stefan Bellerwrote: > > On Wed, Aug 16, 2017 at 11:35 AM, Lars Schneider > wrote: >> Hi, >> >> I think we discovered a regression in Git 2.14.1 today. >> It looks like as if "git submodule update --init --recursive" removes >> the "skip submodules" config. >> >> Consider the following steps: >> >>git clone https://server/repo.git >>cd repo >>git config --local submodule.some/other/repo.update none >>git submodule update --init --recursive >>git pull --recurse-submodules >> >> With Git 2.14 the last "git pull" will clone the "some/other/repo" >> submodule. This did not happen with Git 2.13. >> >> Bug or feature? I don't have anymore time for Git today. I am happy to >> provide a proper test case tomorrow, though. > > $ git log --oneline v2.13.0..v2.14.1 -- git-submodule.sh > 532139940c add: warn when adding an embedded repository > (I am confident this is not the suspect, let's keep searching. > Not a lot happened in submodule land apparently) > > Looking through all commits v2.13..v2.14 doesn't have me > suspect any of them. > > Any chance the "did not happen with 2.13" was not > freshly cloned but tested on an existing repo? If so a hot > candidate for suspicion is a93dcb0a56 (Merge branch > 'bw/submodule-is-active', 2017-03-30), IMHO, just > gut feeling, though. > > Oh, wait. > $ git log --oneline v2.13.0..v2.14.1 -- builtin/pull.c > c9c63ee558 Merge branch 'sb/pull-rebase-submodule' > a6d7eb2c7a pull: optionally rebase submodules (remote submodule changes only) > could also be a culprit. Do you have pull.rebase set? I bisected the problem today and "a6d7eb2c7a pull: optionally rebase submodules (remote submodule changes only)" is indeed the culprit. The commit seems to break the following test case. diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh index dcac364c5f..24f9729015 100755 --- a/t/t7400-submodule-basic.sh +++ b/t/t7400-submodule-basic.sh @@ -1289,4 +1289,19 @@ test_expect_success 'init properly sets the config' ' test_must_fail git -C multisuper_clone config --get submodule.sub1.active ' +test_expect_success 'submodule update and git pull with disabled submodule' ' + test_when_finished "rm -rf multisuper_clone" && + pwd=$(pwd) && + git clone file://"$pwd"/multisuper multisuper_clone && + ( + cd multisuper_clone && + git config --local submodule.sub0.update none && + git submodule update --init --recursive && + git pull --recurse-submodules && + git submodule status | cut -c 1,43- >actual + ) && + ls && + test_cmp expect multisuper_clone/actual +' + test_done I am not familiar with the code. Does anyone see the problem right away? Thanks, Lars
Re: [PATCH v2 4/4] commit: rewrite read_graft_line
Patryk Obarawrites: > The previous implementation of read_graft_line used calculations based > on GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ to determine the number of commit > ids in a single graft line. New implementation does not depend on these > constants, so it adapts to any object_id buffer size. > > To make this possible, FLEX_ARRAY of object_id in struct was replaced > by an oid_array. There is a leap in logic between the two paragraphs. Your use of parse_oid_hex() is good. But I do not think moving the array body to outside commit_graft structure and forcing it to be separately allocated is necessary or beneficial. When we got a single line, we know how many fake parents a child described by that graft line has, and you can still use of FLEX_ARRAY to avoid separate allocation and need for separate freeing of it.
Re: [PATCH 3/5] commit: replace the raw buffer with strbuf in read_graft_line
Jeff Kingwrites: > I'd expect most of the GIT_MAX constants to eventually go away in favor > of "struct object_id", but that will still be using the same "big enough > to hold any hash" size under the hood. Indeed. It is good to see major contributors are in agreement ;-) I'd expect that an array of "struct object_id" would be how a fixed number of object names would be represented, i.e. struct object_id thing[num_elements]; instead of an array of uchar that is MAX bytes long, i.e. unsigned char name[GIT_MAX_RAWSZ][num_elements]; In fact, the former is already how we represent the list of fake parents in the commit_graft structure, so I think patch 5/5 in this series does two unrelated things, one of which is bad (i.e. use of parse_oid_hex() is good; turning the FLEX_ARRAY at the end into a oid_array that requires a separate allocation of the array is bad). > Agreed. Most code should be dealing with the abstract concept of a hash > and shouldn't have to care about the size. I really like parse_oid_hex() > for that reason (and I think parsing is the main place we've found that > needs to care). Yes.
Re: [PATCH] sub-process: print the cmd when a capability is unsupported
Christian Couderwrites: > ... but I think we should then emphasize more in our test > scripts (maybe by giving a good example) and perhaps also in the doc > that the filters/sub-processes should really pay attention and not > output any capability that are not supported by Git. Oh, absolutely. If you know there is such a breakage in our test script, please do fix it. Thanks.
Re: [PATCH v2] update revisions doc for quoting in ':/' notation
ryenuswrites: > To make sure the `` in `:/` is seen as one search string, > one should quote/escape `` properly. > > Especially, the example given in the manual `:/fix nasty bug` does not > work because of missing quotes when used in shell. A note about > quoting/escaping is added along with a working example, however, the > original example is left-as-is to be consistent with other examples. > > Signed-off-by: ryenus > --- > Documentation/revisions.txt | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt > index 61277469c..d2862d55d 100644 > --- a/Documentation/revisions.txt > +++ b/Documentation/revisions.txt > @@ -185,7 +185,9 @@ existing tag object. >e.g. ':/^foo'. The special sequence ':/!' is reserved for modifiers to what >is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a >literal '!' character, followed by 'foo'. Any other sequence beginning with > - ':/!' is reserved for now. > + ':/!' is reserved for now. And make sure to quote/escape for the text to be > + interpreted as the expected search string/pattern, e.g., for a commit whose > + message matches a literal \'`$`', use `git show :/\\\$` or `git show > ':/\$'`. Hmph. This seems to miss the most important point Andreas raised, which is that the way to quote them is entirely up to the shell and other UI machinery the user is using. And I am not sure if those who are using CUI should be told about shell's quoting rules when they are learning about :/ syntax. There are tons of other ways that the user needs to pass a string with whitespace in it as a single argument to commands, many of which may not even be related to Git at all. I was actually expecting a much milder text, perhaps literal copy of what Andreas gave you in his message <956ccc83-c291-4217-795c-fcef33fac...@gmail.com>. By the way, I do not mean to dictate what name and address you use to communicate with other people, but especially with something that is supposed to hopefully have some legal value down the line if somebody starts making SCO noises, it really would be nice to have a real person to associate things with recorded as the author of a change and the person who signed off the patch. It would be embarrassing later if there is no way to even look you up somehow. Thanks.
Re: [RFC PATCH] Updated "imported object" design
On 8/16/2017 5:35 PM, Jonathan Tan wrote: On Wed, 16 Aug 2017 13:32:23 -0700 Junio C Hamanowrote: Jonathan Tan writes: Also, let me know if there's a better way to send out these patches for review. Some of the code here has been reviewed before, for example. [1] https://public-inbox.org/git/cover.1502241234.git.jonathanta...@google.com/ [2] https://public-inbox.org/git/ffb734d277132802bcc25baa13e8ede3490af62a.1501532294.git.jonathanta...@google.com/ [3] https://public-inbox.org/git/20170807161031.7c4ea...@twelve2.svl.corp.google.com/ ... and some of the code exists only in the list archive, so we don't know which other topic if any we may want to eject tentatively if we wanted to give precedence to move this topic forward over others. I'll worry about it later but help from others is also appreciated. Thanks - I can help take a look when it is time to move the code in. I agree that having this depend on patches elsewhere in the list archive makes it more difficult to review. I know I like to see things in context to get a better picture. I think the issue here is whether we want to move this topic forward or not, that is, if this (special ".imported" objects) is the best way to solve (at least partially) the connectivity check part of tolerating missing objects. I hope that we can continue to talk about it. I think this topic should continue to move forward so that we can provide reasonable connectivity tests for fsck and check_connected in the face of partial clones. I'm not sure the prototype implementation of reading/parsing all imported objects to build the promised oidset is the most performant model but we can continue to investigate the best options. This collects names of the objects that are _directly_ referred to by imported objects. An imported pack may have a commit, whose top-level tree may or may not appear in the same pack, or the tree may exist locally but not in the same pack. Or the tree may not be locally available at all. In any of these four cases, the top-level tree is listed in the "promises" set. Same for trees and tags. I wonder if all of the calls to oidset_insert() in this function want to be guarded by "mark it as promised only when the referrent is *not* locally available" to keep the promises set minimally populated. The only change needed to fsck in order to make it refrain from treating a missing but promised object as an error would be: - if (object is missing) + if (object is missing && object is not promised) error("that object must be there but missing"); so there is no point in throwing something that we know we locally have in this oidset, right? On the other hand, cost of such additional checks in this function may outweigh the savings of both memory pressure and look-up cost, so I do not know how the tradeoff would turn out. I also don't know how the tradeoff would turn out, so I leaned towards the slightly simpler solution of not doing the check. In the future, maybe a t/perf test can be done to decide between the two. +static int is_promise(const struct object_id *oid) +{ + if (!promises_prepared) { + if (repository_format_lazy_object) + for_each_packed_object(add_promise, NULL, + FOR_EACH_OBJECT_IMPORTED_ONLY); + promises_prepared = 1; + } + return oidset_contains(, oid); +} Somehow I'm tempted to call this function "is_promised()" but that is a minor naming issue. Given all we need is an existance check for a given oid, I wonder if it would be faster overall to do a binary search through the list of imported idx files + an existence test for an imported loose object. Especially in the check_connected case which isn't verifying every object, that should be a lot less IO than loading all the imported commits, trees and blobs and pre-computing an oidset of all possible objects. The lookup for each object would be slower than a simple call to oidset_contains but we avoid the up front cost. With some caching of idx files and threading, I suspect this could be made pretty fast. I was trying to be consistent in using the name "promise" instead of "promised object/tag/commit/tree/blob" everywhere, but we can switch if need be (for example, if we don't want to limit the generic name "promise" to merely objects). static const char *describe_object(struct object *obj) { static struct strbuf buf = STRBUF_INIT; @@ -410,7 +472,7 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid, xstrfmt("%s@{%"PRItime"}", refname, timestamp)); obj->used = 1; mark_object_reachable(obj); - } else { + } else if (!is_promise(oid)) {
Re: git fetch with refspec does not include tags?
On Thu, Aug 17, 2017 at 01:38:36PM -0700, Junio C Hamano wrote: > Kevin Daudtwrites: > > > On Thu, Aug 17, 2017 at 12:38:58PM -0700, Junio C Hamano wrote: > >> Jeff King writes: > >> > >> > # no tags, we just populate FETCH_HEAD because of the bare URL > >> > git fetch ../parent > >> > > >> > # this does fetch tags, because we're storing the result according to > >> > # the configured refspec ("refs/heads/*:refs/remotes/origin/*"). > >> > git fetch origin > >> > >> The above two look good. > >> > >> > # this doesn't fetch tags, as the main command is "just" populating > >> > # FETCH_HEAD. But then our logic for "hey, we fetched the ref for > >> > # refs/remotes/origin/master, so let's update it on the side" kicks > >> > # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but > >> > # not the tags. Weird. > >> > git fetch origin master > >> > >> Yes, it looks weird, but I suspect that it is probably more correct > >> not to fetch tags in this case. I wonder if it would be a solution > >> not to do the "on the side" thing---after all the user didn't tell > >> us to update refs/remotes/origin/master with this command line. > > > > Isn't that how git fetch used to behave, or am I misunderstanding what > > you mean? It used to be that git fetch would not > > update any remote tracking branches. > > > > From the 1.8.4 release notes: > > > >> "git fetch origin master" unlike "git fetch origin" or "git fetch" > >> did not update "refs/remotes/origin/master"; this was an early > >> design decision to keep the update of remote tracking branches > >> predictable, but in practice it turns out that people find it more > >> convenient to opportunistically update them whenever we have a > >> chance, and we have been updating them when we run "git push" which > >> already breaks the original "predictability" anyway. > > No, you are not misunderstanding anything. The "pretend that we > immediately turned around and fetched" done by "git push" was > already breaking the predictability, but the change in 1.8.4 made it > even worse. I am saying that going back to the old behaviour may be > one option to address the issue being discussed in this thread. > Ok. The reason I'm bring this up is because we often had to tell users in the irc channel that git fetch did not update the remote tracking branches, which confused them, so reverting back might reintroduce this confusion again.
Re: git fetch with refspec does not include tags?
Kevin Daudtwrites: > On Thu, Aug 17, 2017 at 12:38:58PM -0700, Junio C Hamano wrote: >> Jeff King writes: >> >> > # no tags, we just populate FETCH_HEAD because of the bare URL >> > git fetch ../parent >> > >> > # this does fetch tags, because we're storing the result according to >> > # the configured refspec ("refs/heads/*:refs/remotes/origin/*"). >> > git fetch origin >> >> The above two look good. >> >> > # this doesn't fetch tags, as the main command is "just" populating >> > # FETCH_HEAD. But then our logic for "hey, we fetched the ref for >> > # refs/remotes/origin/master, so let's update it on the side" kicks >> > # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but >> > # not the tags. Weird. >> > git fetch origin master >> >> Yes, it looks weird, but I suspect that it is probably more correct >> not to fetch tags in this case. I wonder if it would be a solution >> not to do the "on the side" thing---after all the user didn't tell >> us to update refs/remotes/origin/master with this command line. > > Isn't that how git fetch used to behave, or am I misunderstanding what > you mean? It used to be that git fetch would not > update any remote tracking branches. > > From the 1.8.4 release notes: > >> "git fetch origin master" unlike "git fetch origin" or "git fetch" >> did not update "refs/remotes/origin/master"; this was an early >> design decision to keep the update of remote tracking branches >> predictable, but in practice it turns out that people find it more >> convenient to opportunistically update them whenever we have a >> chance, and we have been updating them when we run "git push" which >> already breaks the original "predictability" anyway. No, you are not misunderstanding anything. The "pretend that we immediately turned around and fetched" done by "git push" was already breaking the predictability, but the change in 1.8.4 made it even worse. I am saying that going back to the old behaviour may be one option to address the issue being discussed in this thread.
Re: [Patch size_t V3 00/19] use size_t
On Wed, Aug 16, 2017 at 10:16:12PM +0200, Martin Koegler wrote: > From: Martin Koegler> > This patchset is for next [24db08a6e8fed761d3bace7f2d5997806e20b9f7]. I applied it succesfully, and run the test suite on a 32 bit system. The Laptop reported one brekage in t0040: t0040-parse-options.sh not ok 19 - OPT_MAGNITUDE() 3giga Beside some t5561-http-backend.sh (which are most probably not caused by this patch. The raspi had 2 deadlocks, like "git push --signed dst noop ff +noff" or trash directory.t5541-http-push-smart/gpghome --sign -u commit...@example.com Even this most probably not caused by the patch. (and the test is still running) Well done, and on which platforms did you test ?
Re: [Patch size_t V3 01/19] delta: fix enconding size larger than an "uint" can hold
On Wed, Aug 16, 2017 at 10:16:13PM +0200, Martin Koegler wrote: > From: Martin Koegler> > The current delta code produces incorrect pack objects for files > 4GB. This may need a little bit more info (E.g. it does not fix anything on a 32 bit Linux) How about this: The current delta code produces incorrect pack objects for files > 4GB when "unsigned long" has 32 bits and size_t 64 bits.
Re: git fetch with refspec does not include tags?
On Thu, Aug 17, 2017 at 12:38:58PM -0700, Junio C Hamano wrote: > Jeff Kingwrites: > > > # no tags, we just populate FETCH_HEAD because of the bare URL > > git fetch ../parent > > > > # this does fetch tags, because we're storing the result according to > > # the configured refspec ("refs/heads/*:refs/remotes/origin/*"). > > git fetch origin > > The above two look good. > > > # this doesn't fetch tags, as the main command is "just" populating > > # FETCH_HEAD. But then our logic for "hey, we fetched the ref for > > # refs/remotes/origin/master, so let's update it on the side" kicks > > # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but > > # not the tags. Weird. > > git fetch origin master > > Yes, it looks weird, but I suspect that it is probably more correct > not to fetch tags in this case. I wonder if it would be a solution > not to do the "on the side" thing---after all the user didn't tell > us to update refs/remotes/origin/master with this command line. Isn't that how git fetch used to behave, or am I misunderstanding what you mean? It used to be that git fetch would not update any remote tracking branches. >From the 1.8.4 release notes: > "git fetch origin master" unlike "git fetch origin" or "git fetch" > did not update "refs/remotes/origin/master"; this was an early > design decision to keep the update of remote tracking branches > predictable, but in practice it turns out that people find it more > convenient to opportunistically update them whenever we have a > chance, and we have been updating them when we run "git push" which > already breaks the original "predictability" anyway.
Re: [RFC PATCH] Updated "imported object" design
On 8/15/2017 8:32 PM, Jonathan Tan wrote: This patch is based on an updated version of my refactoring of pack-related functions [1]. This corresponds to patch 1 of my previous series [2]. I'm sending this patch out (before I update the rest) for 2 reasons: * to provide a demonstration of how the feature could be implemented, in the hope of restarting the discussion * to obtain comments about this patch to see if I'm heading in the right direction In an earlier e-mail [3], I suggested that loose objects can also be marked as ".imported" (formerly ".remote" - after looking at the code, I decided to use "imported" throughout, since "remote" can be easily confused as the opposite of "local", used to represent objects in the local store as opposed to an alternate store). However, I have only implemented the imported packed objects part - imported loose objects can be added later. It still remains to be discussed whether we should mark the imported objects or the non-imported objects as the source of promises, but I still think that we should mark the imported objects. In this way, the new properties (the provision of promises and the mark) coincide on the same object, and the same things (locally created objects, fetches from non-lazy-object-serving remotes) behave in the same way regardless of whether extensions.lazyObject is set (allowing, for example, a repo to be converted into a promise-enabled one solely through modifying the configuration). This illustrates another place we need to resolve the naming/vocabulary. We should at least be consistent to make it easier to discuss/explain. We obviously went with "virtual" when building GVFS but I'm OK with "lazy" as long as we're consistent. Some examples of how the naming can clarify or confuse: 'Promise-enable your repo by setting the "extensions.lazyObject" flag' 'Enable your repo to lazily fetch objects by setting the "extensions.lazyObject"' 'Virtualize your repo by setting the "extensions.virtualize" flag' We may want to carry the same name into the filename we use to mark the (virtualized/lazy/promised/imported) objects. (This reminds me that there are only 2 hard problems in computer science...) ;) It is true that converting a normal repo to virtualized/lazy/promise-enabled repo wouldn't have to munge with the object marking but since this only happens once, I wouldn't make that primary decision maker (and switching back would require munging the objects anyway). I think we can make either work (ie tagging local vs non-local/remote/imported/promised/lazy/virtual). I'm fine leaving "how" the objects are marked as an implementation detail - the design requires them to be marked, the person writing the code can figure out the fastest way to accomplish that. In the current patch/design, the access time to check for the existence of an "imported" file is going to be dwarfed by the cost of opening and parsing every object (loose and pack) to get its oid so that isn't really an issue. +/* + * Objects that are believed to be loadable by the lazy loader, because + * they are referred to by an imported object. If an object that we have + * refers to such an object even though we don't have that object, it is + * not an error. + */ +static struct oidset promises; +static int promises_prepared; + +static int add_promise(const struct object_id *oid, struct packed_git *pack, + uint32_t pos, void *data) +{ + struct object *obj = parse_object(oid); + if (!obj) + /* +* Error messages are given when packs are verified, so +* do not print any here. +*/ + return 0; + + /* +* If this is a tree, commit, or tag, the objects it refers +* to are promises. (Blobs refer to no objects.) +*/ + if (obj->type == OBJ_TREE) { + struct tree *tree = (struct tree *) obj; + struct tree_desc desc; + struct name_entry entry; + if (init_tree_desc_gently(, tree->buffer, tree->size)) + /* +* Error messages are given when packs are +* verified, so do not print any here. +*/ + return 0; + while (tree_entry_gently(, )) + oidset_insert(, entry.oid); + } else if (obj->type == OBJ_COMMIT) { + struct commit *commit = (struct commit *) obj; + struct commit_list *parents = commit->parents; + + oidset_insert(, >tree->object.oid); + for (; parents; parents = parents->next) + oidset_insert(, >item->object.oid); + } else if (obj->type == OBJ_TAG) { + struct tag *tag = (struct tag *) obj; + oidset_insert(, >tagged->oid); + } + return 0; +} + +static int
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
Kaartic Sivaraamwrites: > diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt > index 81bd0a7b7..948d9c9ef 100644 > --- a/Documentation/git-branch.txt > +++ b/Documentation/git-branch.txt > @@ -195,10 +195,10 @@ start-point is either a local or remote-tracking branch. > branch.autoSetupMerge configuration variable is true. > > --set-upstream:: > - If specified branch does not exist yet or if `--force` has been > - given, acts exactly like `--track`. Otherwise sets up configuration > - like `--track` would when creating the branch, except that where > - branch points to is not changed. > + As this option had confusing syntax it's no longer supported. Please use > + --track or --set-upstream-to instead. > ++ > +Note: This could possibly become an alias of --set-upstream-to in the future. I'll tweak `--track` and `--set-upstream-to` in the updated text and remove the 'Note:' thing that does not give any useful information to the end users while queuing (no need to resend). Thanks.
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
Martin Ågrenwrites: > On 17 August 2017 at 04:54, Kaartic Sivaraam > wrote: >> Helped-by: Martin Ågren , Junio C Hamano >> >> Signed-off-by: Kaartic Sivaraam > > I didn't expect a "Helped-by", all I did was to give some random > comments. :-) I'm not so sure about the comma-separation, that seems to > be a first in the project. I didn't either ;-) The line looks odd so I'll remove it while queuing. Thanks for noticing.
ignoring extra bitmap file?
Hi everyone, I'm seeing the message remote: warning: ignoring extra bitmap file: ./objects/pack/pack-2943dc24pack and indeed, there is such a thing (two, actually): 171736188 Aug 17 08:20 pack-2943dc2477026f87b280ebcefa93fe28412688df.idx 12662268 Aug 17 08:24 pack-2943dc2477026f87b280ebcefa93fe28412688df.bitmap 12927989355 Aug 17 08:27 pack-2943dc2477026f87b280ebcefa93fe28412688df.pack 164857412 Aug 17 08:33 pack-8b4a42ca7aa2aca6f354292007910de1110117b2.idx 13164932 Aug 17 08:49 pack-8b4a42ca7aa2aca6f354292007910de1110117b2.bitmap 281872 Aug 17 09:40 pack-bddb40f984124ba8c2a4e5c55b0d1b2804fd5817.pack 13280 Aug 17 09:40 pack-bddb40f984124ba8c2a4e5c55b0d1b2804fd5817.idx 7904 Aug 17 15:51 pack-0f8b1478e17174c562d9a52cf577e0e050bdb7c5.idx 2373948 Aug 17 16:09 pack-23253e17510cacaae3bb38fb5429073b3bc59480.pack 6980 Aug 17 16:09 pack-23253e17510cacaae3bb38fb5429073b3bc59480.idx 144158 Aug 17 17:03 pack-0f8b1478e17174c562d9a52cf577e0e050bdb7c5.pack 12927996484 Aug 17 19:19 pack-8b4a42ca7aa2aca6f354292007910de1110117b2.pack 153332 Aug 17 20:17 pack-65ff13a10c29a6c1604017c50dc9a320044ee605.pack 14036 Aug 17 20:17 pack-65ff13a10c29a6c1604017c50dc9a320044ee605.idx But it looks like something went wrong in that repack cycle (that pack-2943dc247702 is the full repo), and it won't get removed later in the next repack in the evening. Question: Can I safely remove the .bitmap file, and repack will then clean up the .pack and .idx files as will? (This is still that repo in bitbucket (latest 4.x) server with git 2.6.2, now with cg.auto=0.) - Andreas -- "Totally trivial. Famous last words." From: Linus TorvaldsDate: Fri, 22 Jan 2010 07:29:21 -0800
Re: git fetch with refspec does not include tags?
Jeff Kingwrites: > I think it's a bit more complex because "git pull" uses "git fetch" > under the hood. In fact, your "git fetch origin master" is exactly what > gets run when you do: > > git pull origin master > > That's maybe OK. But I think one-off pulls like: > > git pull https://example.com/repo.git master > > probably wouldn't want it. I'd have to give it some thought. I agree with both. If you have named remote, you presumably are keeping copies of their branches as remote-tracking branches, and it may be fine to follow tags. An explicit URL used for one-off should not grab anything but the named thing, I would think.
Re: git fetch with refspec does not include tags?
Jeff Kingwrites: > # no tags, we just populate FETCH_HEAD because of the bare URL > git fetch ../parent > > # this does fetch tags, because we're storing the result according to > # the configured refspec ("refs/heads/*:refs/remotes/origin/*"). > git fetch origin The above two look good. > # this doesn't fetch tags, as the main command is "just" populating > # FETCH_HEAD. But then our logic for "hey, we fetched the ref for > # refs/remotes/origin/master, so let's update it on the side" kicks > # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but > # not the tags. Weird. > git fetch origin master Yes, it looks weird, but I suspect that it is probably more correct not to fetch tags in this case. I wonder if it would be a solution not to do the "on the side" thing---after all the user didn't tell us to update refs/remotes/origin/master with this command line. Then not following tags will be in line with all of the above reasoning above. > # and this one does fetch tags, because we have a real destination. > git fetch origin master:foo Yup, that is expected. > So what I'd say is: > > 1. Definitely these defaults are under-documented. I couldn't find > them anywhere in git-fetch(1). Yup. > 2. If we continue to follow the "are we storing any refs" rule for the > default, possibly it should expand to "did we store anything, > including opportunistic tracking-ref updates". That also is a workable way to make things consistent.
Re: git add -p breaks after split on change at the top of the file
Jeff Kingwrites: > Of course this is a pretty ridiculous input in the first place. In > theory it _could_ be figured out, but overlapping hunks like this are > always going to cause problems (in this case, context from the second > hunk became a removal, and the second hunk no longer applies). To be honest, I do not think it could be figured out by "git apply", but "git add -p" _should_ be able to. I've said already this earlier in the discussion: https://public-inbox.org/git/7vab5cn7wr@alter.siamese.dyndns.org/ "add -p" knows how the hunk _before_ it gave the user to edit looked like, and it knows how the hunk after the user edited looks like, so it may have enough information to figure it out. But the "(e)dit" feature was done in a way to discard that information and forced an impossible "guess what the correct shape of the patch would be, out of this broken patch input" with --recount and --allow-overlap. If we want to make "add -p/(e)dit" work in the corner cases and make it trustworthy, "apply" is a wrong tree to back at. A major part of the effort to fix would go to "add -p", not to "apply".
Re: git add -p breaks after split on change at the top of the file
Jeff Kingwrites: > [+cc Junio, as this gets deep into git-apply innards] I've written off --recount and --allow-overlap as ugly workaround that happens to work some of the time but cannot be trusted long time ago. IIRC, before the "(e)dit" thing was added to "add -p", we counted the line numbers correctly and merged the adjacent hunks before applying and neither of these two kluge was necessary. These threads may give us a bit more background: https://public-inbox.org/git/7viqk1ndlk@alter.siamese.dyndns.org/ https://public-inbox.org/git/1304117373-592-1-git-send-email-gits...@pobox.com/ The original that introduced the "(e)dit" thing was in this thread: https://public-inbox.org/git/200805232221.45406.tr...@student.ethz.ch/ As you can see, I was very much against it, as it cannot fundamentally sanely implemented (which I think is the same conclusion you reached at the end of the current thread). I think there should be a better failure mode, though.
Re: [PATCH] add test for bug in git-mv with nested submodules
On Thu, Aug 17, 2017 at 3:34 AM, Heiko Voigtwrote: > When using git-mv with a submodule it will detect that and update the > paths for its configurations (.gitmodules, worktree and gitfile). This > does not work for nested submodules where a user renames the root > submodule. > > We discovered this fact when working on on-demand fetch for renamed > submodules. Lets add a test to document. > > Signed-off-by: Heiko Voigt > --- > t/t7001-mv.sh | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh > index e365d1f..39f8aed 100755 > --- a/t/t7001-mv.sh > +++ b/t/t7001-mv.sh > @@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested > directories' ' > test_cmp actual expect > ' > > +test_expect_failure 'moving nested submodules' ' > + git commit -am "cleanup commit" && > + git submodule add ./. sub_nested && If possible, I would avoid adding the repo itself as a submodule as it is unrealistic in the wild. While it may be ok for the test here, later down the road other tests making use of it it may become an issue with the URL of the submodule. > + git commit -m "add sub_nested" && > + git submodule update --init --recursive && > + git mv sub_nested sub_nested_moved && > + git status > +' > + > test_done > -- > 2.0.0.274.g6b2cd91 >
Re: reftable [v7]: new ref storage format
Michael Haggertywrites: > On Wed, Aug 16, 2017 at 11:05 PM, Junio C Hamano wrote: >> I found it a slightly odd that we do not insist that update_indices >> that appear in a single reftable file are consecutive, yet we >> require that min_update_index of a reftable file must be one greater >> than the max_update_index of a previous one. That is not a new >> issue in v7, though. > > I think of `update_index` like a pseudo-time, and the > `min_update_index` and `max_update_index` to be stating that "this > reftable covers the time interval specified". So it's reasonable to > say that the reftable files, together, should cover all time. > > But it might be that there are values of `update_index` for which no > events survived within a reftable file that covers that time interval. > This can happen if reference update records have been compacted away > because later reference updates overwrote their effects, and either > > * reflogs were turned off for those updates, or > * the corresponding reflogs have been compacted into a separate file, or > * the corresponding reflog entries for those updates have been expired. Yeah, and I think it is reasonable that the specification does not dictate that indices within a single reftable must be consecutive. And if update_indices within a single reftable are allowed to be sparse, e.g. recording three transactions with indices 1 3 and 5, it is not immediately obvious to me why the transactions that are recorded in the next reftable cannot be with indices 7 8 and 10, leaving a gap between the max in the first table (i.e. 5) and the min in the second table (i.e. 7). That is what I found slightly odd.
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
On 17 August 2017 at 04:54, Kaartic Sivaraamwrote: > Helped-by: Martin Ågren , Junio C Hamano > > Signed-off-by: Kaartic Sivaraam I didn't expect a "Helped-by", all I did was to give some random comments. :-) I'm not so sure about the comma-separation, that seems to be a first in the project. > * The option has not yet been removed from the synopsis of the > documentation and I think >we can't remove it from the 'Synopsis' porion of the documentation as > it doesn't make >sense (at least to me) to give a description of an option not listed > in the synopsis. The "git interpret-trailers --parse" thread nearby is adding some options without mentioning them in the synopsis [1], and those options can actually be useful, whereas "--set-upstream" only results in a fatal error. So I don't know. >Moreover, we have to state the reason for not supporting it in some > place. > > I guess the phrase 'no longer supported' is equally communicative. Let me > know if that was not > a right decision. I think it's ok. Of course, I know exactly what you want to say, and why, so I'm biased. :-) [1] https://public-inbox.org/git/20170815102334.qc4w7akl44bti...@sigill.intra.peff.net/
Re: [PATCH v4 5/8] interpret-trailers: add --parse convenience option
On 16 August 2017 at 10:20, Jeff Kingwrote: > On Tue, Aug 15, 2017 at 01:26:53PM +0200, Martin Ågren wrote: > >> > This command reads some patches or commit messages from either the >> > - arguments or the standard input if no is specified. Then >> > -this command applies the arguments passed using the `--trailer` >> > -option, if any, to the commit message part of each input file. The >> > -result is emitted on the standard output. >> > + arguments or the standard input if no is specified. If >> > +`--parse` is specified, the output consists of the parsed trailers. >> > + >> > +Otherwise, the this command applies the arguments passed using the >> > +`--trailer` option, if any, to the commit message part of each input >> > +file. The result is emitted on the standard output. >> >> "the this" > > Thanks. > >> I think I get why you use --parse above (and in the synopsis), although >> it kind of feels like it should be --only-input or perhaps "--only-input >> (or --parse)". > > I really wanted to point people to --parse as the go-to option for the > parsing mode for the sake of simplicity (in fact I initially considered > not even exposing them at all). And I hoped that if they jumped to the > definition of --parse, that would lead them to the other options. > > I dunno. I agree it is does not read particularly well. Probably the > whole description section could be rewritten to cover the newly dual > nature of the command a bit better. But I didn't want to disrupt the > existing flow too much. Certainly. It's not like I can offer a "better" way. After thinking about this some more, I think this is a good example of "less is more". It's possibly less precise or complete in some sense, but it's probably much more useful and readable.
Re: [Patch size_t V3 17/19] Convert ref-filter to size_t
Junio C Hamanowrites: > I did not see anything fishy in the remaining parts I did not > comment on. > > Thanks. What I meant was "remainder of this patch 17/19". I am not claiming that I read all other patches; I haven't, not yet anyway.
Re: [Patch size_t V3 17/19] Convert ref-filter to size_t
Martin Koeglerwrites: > -static char *copy_subject(const char *buf, unsigned long len) > +static char *copy_subject(const char *buf, size_t len) > { > char *r = xmemdupz(buf, len); > int i; This has the same "we are still iterating with 'int i' over an array that can be 'size_t len' long" issue as I pointed out in 19/19. The remedy is similar, i.e. static char *copy_subject(const char *buf, size_t len) { char *ret = xmemdupz(buf, len); char *cp; for (cp = ret; cp < ret + len; cp++) if (*cp == '\n') *cp = ' '; return ret; } which should come either before the whole series or after the series settles, not inside this series. I do also wonder if we want to do more for other control characters (e.g. CR, HT, etc.) in this function, though. That is totally outside the scope of this series and should be done independently. > @@ -898,7 +898,7 @@ static void grab_date(const char *buf, struct atom_value > *v, const char *atomnam > } > > /* See grab_values */ > -static void grab_person(const char *who, struct atom_value *val, int deref, > struct object *obj, void *buf, unsigned long sz) > +static void grab_person(const char *who, struct atom_value *val, int deref, > struct object *obj, void *buf, size_t sz) > { > int i; > int wholen = strlen(who); Shouldn't this also become size_t, I wonder? I did not see anything fishy in the remaining parts I did not comment on. Thanks.
Re: [PATCH] files-backend: cheapen refname_available check when locking refs
On 08/17, Jeff King wrote: > On Thu, Aug 17, 2017 at 05:12:50PM +0200, Michael Haggerty wrote: > > > I was testing this using the reporter's recipe (but fetching from a > > local clone), and found the following surprising timing numbers: > > > > b05855b5bc (before the slowdown): 22.7 s > > 524a9fdb51 (immediately after the slowdown): 13 minutes > > 4e81f1ecf1 (after this fix): 14.5 s > > > > The fact that the fetch is now significantly *faster* than before the > > slowdown seems not to have anything to do with the reference code. > > I bisected this (with some hackery, since the commits in the middle all > take 13 minutes to run). The other speedup is indeed unrelated, and is > due to Brandon's aacc5c1a81 (submodule: refactor logic to determine > changed submodules, 2017-05-01). > > The commit message doesn't mention performance (it's mostly about code > reduction). I think the speedup comes from using > diff_tree_combined_merge() instead of manually diffing each commit > against its parents. But I didn't do further timings to verify that (I'm > reporting it here mostly as an interesting curiosity for submodule > folks). Haha always great to see an unintended improvement in performance! Yeah that commit was mostly about removing duplicate code but I'm glad that it ended up being a benefit to perf too. > > > diff --git a/refs/files-backend.c b/refs/files-backend.c > > index e9b95592b6..f2a420c611 100644 > > --- a/refs/files-backend.c > > +++ b/refs/files-backend.c > > @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, > > > > /* > > * If the ref did not exist and we are creating it, > > -* make sure there is no existing ref that conflicts > > -* with refname: > > +* make sure there is no existing packed ref that > > +* conflicts with refname: > > */ > > if (refs_verify_refname_available( > > - >base, refname, > > + refs->packed_ref_store, refname, > > extras, skip, err)) > > goto error_return; > > } > > This seems too easy to be true. :) But I think it matches what we were > doing before 524a9fdb51 (so it's correct), and the performance numbers > don't lie. > > -Peff -- Brandon Williams
Re: [Patch size_t V3 18/19] Convert tree-walk to size_t
Martin Koeglerwrites: > From: Martin Koegler > > Signed-off-by: Martin Koegler > --- > tree-walk.c | 17 + > tree-walk.h | 4 ++-- > tree.h | 2 +- > 3 files changed, 12 insertions(+), 11 deletions(-) In this one, I did not find anything suspicious. > diff --git a/tree-walk.c b/tree-walk.c > index 7c9f9e3..a7d8b2a 100644 > --- a/tree-walk.c > +++ b/tree-walk.c > @@ -22,10 +22,11 @@ static const char *get_mode(const char *str, unsigned int > *modep) > return str; > } > > -static int decode_tree_entry(struct tree_desc *desc, const char *buf, > unsigned long size, struct strbuf *err) > +static int decode_tree_entry(struct tree_desc *desc, const char *buf, size_t > size, struct strbuf *err) > { > const char *path; > - unsigned int mode, len; > + unsigned int mode; > + size_t len; > > if (size < 23 || buf[size - 21]) { > strbuf_addstr(err, _("too-short tree object")); The original was especially bad around here, as it didn't even use ulong. The conversion makes sense. Thanks.
Re: [Patch size_t V3 19/19] Convert xdiff-interface to size_t
Martin Koeglerwrites: > diff --git a/combine-diff.c b/combine-diff.c > index acf39ec..ad5d177 100644 > --- a/combine-diff.c > +++ b/combine-diff.c > @@ -343,7 +343,7 @@ struct combine_diff_state { > struct sline *lost_bucket; > }; > > -static void consume_line(void *state_, char *line, unsigned long len) > +static void consume_line(void *state_, char *line, size_t len) > { > struct combine_diff_state *state = state_; > if (5 < len && !memcmp("@@ -", line, 4)) { This is a correct logical consequence of making consume_fn to take size_t. > diff --git a/diff.c b/diff.c > index c12d062..f665f8d 100644 > --- a/diff.c > +++ b/diff.c > @@ -406,7 +406,7 @@ static struct diff_tempfile { > struct tempfile tempfile; > } diff_temp[2]; > > -typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len); > +typedef size_t (*sane_truncate_fn)(char *line, size_t len); It turns out that this type, the member of this type in ecb struct, and a conditional call to the function when it is not NULL, are all unused. If this were used, this conversion is correct ;-) > @@ -4536,7 +4536,7 @@ struct patch_id_t { > int patchlen; > }; > > -static int remove_space(char *line, int len) > +static size_t remove_space(char *line, size_t len) > { > int i; > char *dst = line; This function may also want to be rewritten. The loop counter and array index "i" is used this way: for (i = 0; i < len i++) if (!isspace((c = line[i])) *dst++ = c; return dst - line; So you are still risking data loss (later parts of line[] may not be scanned due to int possibly being narrower than size_t). Turning "len" and the return type into size_t is absolutely the right thing to do, and it is tempting to turn "i" also into size_t, but we may just want to scan the thing with another pointer, i.e. size_t remove_space(char *line, size_t len) { char *src = line; char *dst = line; while (src < line + len) { char c = *src++; if (!isspace(c)) *dst++ = c; } return dst - line; } Changes to size_t from types other than "ulong" is worth looking at twice for a potential issue like this. > diff --git a/xdiff-interface.c b/xdiff-interface.c > index d82cd4a..f12285d 100644 > --- a/xdiff-interface.c > +++ b/xdiff-interface.c > @@ -26,7 +26,7 @@ static int parse_num(char **cp_p, int *num_p) > return 0; > } > > -int parse_hunk_header(char *line, int len, > +int parse_hunk_header(char *line, size_t len, > int *ob, int *on, > int *nb, int *nn) > { This is a correct conversion for the purpose of this series, but the implementation of this function before (and after) this patch is already incorrect. It does not pay any attention to "len", and neither the helper function this function uses, parse_num(), is told how many bytes are remaining on the line to be parsed, so it will happily go beyond "len". The two things we may want to fix is obviously outside the scope of this series, but they should be fixed (not necessarily by you) after this series once the codebase solidifies, or before this series as preliminary clean-up. Thanks.
Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch
On 08/17, Stefan Beller wrote: > On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigtwrote: > > To make extending this logic later easier. > > > > Signed-off-by: Heiko Voigt > > --- > > I am quite sure I replicated the same logic but a few more eyes would be > > appreciated. > > A code cleanup is appreciated! > > I thought Brandon had a series in flight doing a very similar cleanup here, > but in master..pu there is nothing to be found. Yeah there are 2 series in flight which will probably conflict here. bw/grep-recurse-submodules and bw/submodule-config-cleanup > > > Cheers Heiko > > The code looks good to me. > > Cheers! > Stefan > > > > > submodule.c | 55 +++ > > 1 file changed, 27 insertions(+), 28 deletions(-) > > > > diff --git a/submodule.c b/submodule.c > > index 3ed78ac..a1011f4 100644 > > --- a/submodule.c > > +++ b/submodule.c > > @@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id > > *excl_oid, > > return ret; > > } > > > > +static int get_fetch_recurse_config(const struct submodule *submodule, int > > command_line_option) > > +{ > > + if (command_line_option != RECURSE_SUBMODULES_DEFAULT) > > + return command_line_option; > > + > > + if (submodule && submodule->fetch_recurse != > > RECURSE_SUBMODULES_NONE) > > + /* local config overrules everything except commandline */ > > + return submodule->fetch_recurse; > > + > > + if (gitmodules_is_unmerged) > > + return RECURSE_SUBMODULES_OFF; > > + > > + return config_fetch_recurse_submodules; > > +} > > + > > struct submodule_parallel_fetch { > > int count; > > struct argv_array args; > > @@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process > > *cp, > > if (!submodule) > > submodule = submodule_from_name(_oid, > > ce->name); > > > > - default_argv = "yes"; > > - if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) > > { > > - if (submodule && > > - submodule->fetch_recurse != > > - RECURSE_SUBMODULES_NONE) { > > - if (submodule->fetch_recurse == > > - RECURSE_SUBMODULES_OFF) > > - continue; > > - if (submodule->fetch_recurse == > > - > > RECURSE_SUBMODULES_ON_DEMAND) { > > - if > > (!unsorted_string_list_lookup(_submodule_names, > > - > > submodule->name)) > > - continue; > > - default_argv = "on-demand"; > > - } > > - } else { > > - if ((config_fetch_recurse_submodules == > > RECURSE_SUBMODULES_OFF) || > > - gitmodules_is_unmerged) > > - continue; > > - if (config_fetch_recurse_submodules == > > RECURSE_SUBMODULES_ON_DEMAND) { > > - if > > (!unsorted_string_list_lookup(_submodule_names, > > - > > submodule->name)) > > - continue; > > - default_argv = "on-demand"; > > - } > > - } > > - } else if (spf->command_line_option == > > RECURSE_SUBMODULES_ON_DEMAND) { > > - if > > (!unsorted_string_list_lookup(_submodule_names, > > + switch (get_fetch_recurse_config(submodule, > > spf->command_line_option)) > > + { > > + default: > > + case RECURSE_SUBMODULES_DEFAULT: > > + case RECURSE_SUBMODULES_ON_DEMAND: > > + if (!submodule || > > !unsorted_string_list_lookup(_submodule_names, > > submodule->name)) > > continue; > > default_argv = "on-demand"; > > + break; > > + case RECURSE_SUBMODULES_ON: > > + default_argv = "yes"; > > + break; > > + case RECURSE_SUBMODULES_OFF: > > + continue; > > } > > > > strbuf_addf(_path, "%s/%s", spf->work_tree, > > ce->name); > > -- > > 2.0.0.274.g6b2cd91 > > -- Brandon Williams
Re: [PATCH] t5526: fix some broken && chains
On Thu, Aug 17, 2017 at 3:36 AM, Heiko Voigtwrote: > Signed-off-by: Heiko Voigt Reviewed-by: Stefan Beller Thanks, Stefan > --- > t/t5526-fetch-submodules.sh | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > index ce788e9..22a7358 100755 > --- a/t/t5526-fetch-submodules.sh > +++ b/t/t5526-fetch-submodules.sh > @@ -193,7 +193,7 @@ test_expect_success "recurseSubmodules=true propagates > into submodules" ' > add_upstream_commit && > ( > cd downstream && > - git config fetch.recurseSubmodules true > + git config fetch.recurseSubmodules true && > git fetch >../actual.out 2>../actual.err > ) && > test_must_be_empty actual.out && > @@ -218,7 +218,7 @@ test_expect_success "--no-recurse-submodules overrides > config setting" ' > add_upstream_commit && > ( > cd downstream && > - git config fetch.recurseSubmodules true > + git config fetch.recurseSubmodules true && > git fetch --no-recurse-submodules >../actual.out > 2>../actual.err > ) && > ! test -s actual.out && > @@ -232,7 +232,7 @@ test_expect_success "Recursion doesn't happen when no new > commits are fetched in > cd submodule && > git config --unset fetch.recurseSubmodules > ) && > - git config --unset fetch.recurseSubmodules > + git config --unset fetch.recurseSubmodules && > git fetch >../actual.out 2>../actual.err > ) && > ! test -s actual.out && > @@ -312,7 +312,7 @@ test_expect_success "Recursion picks up all submodules > when necessary" ' > ) && > head1=$(git rev-parse --short HEAD^) && > git add subdir/deepsubmodule && > - git commit -m "new deepsubmodule" > + git commit -m "new deepsubmodule" && > head2=$(git rev-parse --short HEAD) && > echo "Fetching submodule submodule" > ../expect.err.sub && > echo "From $pwd/submodule" >> ../expect.err.sub && > -- > 2.0.0.274.g6b2cd91 >
Re: [PATCH] diff: retire sane_truncate_fn
On Thu, Aug 17, 2017 at 10:27 AM, Junio C Hamanowrote: > Long time ago, 23707811 ("diff: do not chomp hunk-header in the > middle of a character", 2008-01-02) introduced sane_truncate_line() > helper function to trim the "function header" line that is shown at > the end of the hunk header line, in order to avoid chomping it in > the middle of a single UTF-8 character. It also added a facility to > define a custom callback function to make it possible to extend it > to non UTF-8 encodings. > > During the following 8 1/2 years, nobody found need for this custom > callback facility. > > A custom callback function is a wrong design to use here anyway---if > your contents need support for non UTF-8 encoding, you shouldn't > have to write a custom function and recompile Git to plumb it in. A > better approach would be to extend sane_truncate_line() function and > have a new member in emit_callback to conditionally trigger it. > > Signed-off-by: Junio C Hamano This patch is Reviewed-by: Stefan Beller However while strolling around in code nearby, I do wonder if sane_truncate_line needs to make use of the return value of utf8_width. But that is not the case as we're interested in the byte length, not the print length. Thanks Stefan
Re: Question about git gc on large text files
> On Aug 17, 2017, at 10:33 AM, Jeff Kingwrote: > > On Thu, Aug 17, 2017 at 10:28:00AM -0700, Kai Zhang wrote: > >> I have a git repository maintaining one large json file (along with >> several other small files). With commits for large json file, the >> repository become bigger and bigger, so I tried to run command "git gc >> --prune=now --aggressive" to reduce disk usage, then I found .git >> folder size did not change. I had wonderful experience with git gc >> against files around 10M to 20M, and I am wondering if there is any >> configuration need to tweak for large text files? >> >> Here I have more details: >> 1. Json file size: 1G > > Git won't try to delta-compress anything over 500MB by default. Try: > > git config core.bigfilethreshold 2G > git gc --aggressive > > -Peff It works! Thank you so much! Kai
Re: Question about git gc on large text files
On Thu, Aug 17, 2017 at 10:28:00AM -0700, Kai Zhang wrote: > I have a git repository maintaining one large json file (along with > several other small files). With commits for large json file, the > repository become bigger and bigger, so I tried to run command "git gc > --prune=now --aggressive" to reduce disk usage, then I found .git > folder size did not change. I had wonderful experience with git gc > against files around 10M to 20M, and I am wondering if there is any > configuration need to tweak for large text files? > > Here I have more details: > 1. Json file size: 1G Git won't try to delta-compress anything over 500MB by default. Try: git config core.bigfilethreshold 2G git gc --aggressive -Peff
Question about git gc on large text files
Hi I have a git repository maintaining one large json file (along with several other small files). With commits for large json file, the repository become bigger and bigger, so I tried to run command "git gc --prune=now --aggressive" to reduce disk usage, then I found .git folder size did not change. I had wonderful experience with git gc against files around 10M to 20M, and I am wondering if there is any configuration need to tweak for large text files? Here I have more details: 1. Json file size: 1G 2. .git folder size before gc: 2G, after gc: 2G 3. Pack folder content before gc: pack$ du -sh * 1.5Kpack-087e186e44b4f3dfdbb9479c457f3f4e6cef1c58.idx 146Mpack-087e186e44b4f3dfdbb9479c457f3f4e6cef1c58.pack 1.5Kpack-26c3bc5ee1aee6ab5355cda205a7a071bd464950.idx 146Mpack-26c3bc5ee1aee6ab5355cda205a7a071bd464950.pack 1.5Kpack-2ce66eb23ee42a9ac080fc31d1d476c2f27a8ee7.idx 146Mpack-2ce66eb23ee42a9ac080fc31d1d476c2f27a8ee7.pack 1.5Kpack-523304e1b27f476d1ec8c92356af38762880077c.idx 146Mpack-523304e1b27f476d1ec8c92356af38762880077c.pack 1.5Kpack-53a474a44e2bbc5406c6ac9e2deabdd77fd7c129.idx 146Mpack-53a474a44e2bbc5406c6ac9e2deabdd77fd7c129.pack 1.5Kpack-5b4cc71e4a675cdf72e837d1928af6ae2a7b8d44.idx 146Mpack-5b4cc71e4a675cdf72e837d1928af6ae2a7b8d44.pack 1.5Kpack-84d95c86aca367915fb26a227b32ea08f03a48d5.idx 146Mpack-84d95c86aca367915fb26a227b32ea08f03a48d5.pack 1.5Kpack-85f1f587416cbbe7613496a7efcd0cd01ad262c5.idx 146Mpack-85f1f587416cbbe7613496a7efcd0cd01ad262c5.pack 1.5Kpack-9522c7ced56ec8a379fa79ab8209cc6e8052a75c.idx 146Mpack-9522c7ced56ec8a379fa79ab8209cc6e8052a75c.pack 1.5Kpack-d7e517de6a72557cb68a7a406f1b526698b8f5d9.idx 146Mpack-d7e517de6a72557cb68a7a406f1b526698b8f5d9.pack 1.5Kpack-dd87e41420828d24edaf886ba2a11a49c76fbc4c.idx 146Mpack-dd87e41420828d24edaf886ba2a11a49c76fbc4c.pack 1.5Kpack-e0b47f03d74919a5ca6f8c40b04d49886e2a767a.idx 146Mpack-e0b47f03d74919a5ca6f8c40b04d49886e2a767a.pack 1.5Kpack-ee5deaeb15ae6d713d99059a7361fa51ae21a173.idx 146Mpack-ee5deaeb15ae6d713d99059a7361fa51ae21a173.pack 1.5Kpack-f727952b8f8928a875e35f34549d3f9ebe5f0740.idx 146Mpack-f727952b8f8928a875e35f34549d3f9ebe5f0740.pack 4. Pack folder content after gc: du -sh * 6.0Kpack-2a8326dbe94de5c62c51b617397ef80729fde92f.idx 2.0Gpack-2a8326dbe94de5c62c51b617397ef80729fde92f.pack 5. Output for git verify-pack after gc: git verify-pack -v repo/.git/objects/pack/pack-2a8326dbe94de5c62c51b617397ef80729fde92f.idx z73c2ff9093fb1daf96364d49c8d2d025b1da6cf5 commit 245 147 12 eca81268167f362e69136df4f098e83bd49f7c3b commit 83 87 159 1 73c2ff9093fb1daf96364d49c8d2d025b1da6cf5 b9546502b6a77dc75bb25fc249606880285a1082 commit 80 89 246 1 73c2ff9093fb1daf96364d49c8d2d025b1da6cf5 dd500e7a39df13dd08eee1ab45d537a14764046c commit 82 90 335 1 73c2ff9093fb1daf96364d49c8d2d025b1da6cf5 018c0ebbbcd53fa1575b923cfc953a9bf0c6b9e6 commit 245 147 425 a44facd2fa2bc820c7ad0d9ce0a7cf5c9e0bb4ee commit 84 90 572 1 018c0ebbbcd53fa1575b923cfc953a9bf0c6b9e6 064d040a4baf19469b8d0ffbac8b7b61dc570ef6 commit 79 86 662 2 a44facd2fa2bc820c7ad0d9ce0a7cf5c9e0bb4ee 965b259ebee154290d59c3bd10b44465521deb17 commit 81 89 748 2 a44facd2fa2bc820c7ad0d9ce0a7cf5c9e0bb4ee 0bcee651e5575e87d8a2b0c6f4b9058f2b11629d commit 245 148 837 e9362deba68746c1667b9d238e0b0fde3d5a4ec9 commit 82 90 985 1 0bcee651e5575e87d8a2b0c6f4b9058f2b11629d 684c356682afdf5273e9dba3c8272d330174a464 commit 80 89 1075 2 e9362deba68746c1667b9d238e0b0fde3d5a4ec9 45cd004175c238d27f1af9d340c3fa5867f6c9e3 commit 80 88 1164 1 0bcee651e5575e87d8a2b0c6f4b9058f2b11629d e876b6dde0c407855cd8fa4f25cff13b44c78489 commit 245 148 1252 831a50b8338b2b211b35d29a4210179155022179 commit 84 90 1400 1 e876b6dde0c407855cd8fa4f25cff13b44c78489 6eaebe76669a4a16d74d53aec605dee3c0427527 commit 86 91 1490 1 e876b6dde0c407855cd8fa4f25cff13b44c78489 3cccb7d0a997a0dbc887cd47b8b8e9c46fabc66a commit 81 89 1581 2 6eaebe76669a4a16d74d53aec605dee3c0427527 136592f9618d17712a818f0f84bed9ffd5655c0a commit 80 87 1670 3 3cccb7d0a997a0dbc887cd47b8b8e9c46fabc66a 9c7cf0528e49f16118afb514709fd392d917769c commit 245 150 1757 c564ffd73eb621349aa8d601acb035990c159103 commit 80 90 1907 1 9c7cf0528e49f16118afb514709fd392d917769c db804ffd8b7e3106b2a30c593866549eabbffda8 commit 81 89 1997 1 9c7cf0528e49f16118afb514709fd392d917769c 6c18d72d3308b3c0a5ab412fe05ce0b01acc5b8b commit 245 149 2086 f3f6d9b9cb798f818a34addc500972e28149c579 commit 82 89 2235 1 6c18d72d3308b3c0a5ab412fe05ce0b01acc5b8b c737d0af7fa8fca38c7c8478d6755fbfbf94e947 commit 80 89 2324 2 f3f6d9b9cb798f818a34addc500972e28149c579 12630d4e5f25a06e95636f773adbad42768587d7 commit 79 89 2413 3 c737d0af7fa8fca38c7c8478d6755fbfbf94e947 66386f99e1f3c6c1d4fa2a64d3171757ded84a35 commit 82 91 2502 2 f3f6d9b9cb798f818a34addc500972e28149c579 0f57e9c185de68585feaa46fc88c1031afb0dd73 commit 245 150 2593 f66dc5d75d72474ca84bd23bf40f90b37b6b2d10 commit 83 88 2743 1
[PATCH] diff: retire sane_truncate_fn
Long time ago, 23707811 ("diff: do not chomp hunk-header in the middle of a character", 2008-01-02) introduced sane_truncate_line() helper function to trim the "function header" line that is shown at the end of the hunk header line, in order to avoid chomping it in the middle of a single UTF-8 character. It also added a facility to define a custom callback function to make it possible to extend it to non UTF-8 encodings. During the following 8 1/2 years, nobody found need for this custom callback facility. A custom callback function is a wrong design to use here anyway---if your contents need support for non UTF-8 encoding, you shouldn't have to write a custom function and recompile Git to plumb it in. A better approach would be to extend sane_truncate_line() function and have a new member in emit_callback to conditionally trigger it. Signed-off-by: Junio C Hamano--- diff.c | 5 - 1 file changed, 5 deletions(-) diff --git a/diff.c b/diff.c index a628ac3a95..4bdaed8197 100644 --- a/diff.c +++ b/diff.c @@ -408,8 +408,6 @@ static struct diff_tempfile { struct tempfile tempfile; } diff_temp[2]; -typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len); - struct emit_callback { int color_diff; unsigned ws_rule; @@ -417,7 +415,6 @@ struct emit_callback { int blank_at_eof_in_postimage; int lno_in_preimage; int lno_in_postimage; - sane_truncate_fn truncate; const char **label_path; struct diff_words_data *diff_words; struct diff_options *opt; @@ -1246,8 +1243,6 @@ static unsigned long sane_truncate_line(struct emit_callback *ecb, char *line, u unsigned long allot; size_t l = len; - if (ecb->truncate) - return ecb->truncate(line, len); cp = line; allot = l; while (0 < l) {
Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch
On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigtwrote: > To make extending this logic later easier. > > Signed-off-by: Heiko Voigt > --- > I am quite sure I replicated the same logic but a few more eyes would be > appreciated. A code cleanup is appreciated! I thought Brandon had a series in flight doing a very similar cleanup here, but in master..pu there is nothing to be found. > Cheers Heiko The code looks good to me. Cheers! Stefan > > submodule.c | 55 +++ > 1 file changed, 27 insertions(+), 28 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 3ed78ac..a1011f4 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id > *excl_oid, > return ret; > } > > +static int get_fetch_recurse_config(const struct submodule *submodule, int > command_line_option) > +{ > + if (command_line_option != RECURSE_SUBMODULES_DEFAULT) > + return command_line_option; > + > + if (submodule && submodule->fetch_recurse != RECURSE_SUBMODULES_NONE) > + /* local config overrules everything except commandline */ > + return submodule->fetch_recurse; > + > + if (gitmodules_is_unmerged) > + return RECURSE_SUBMODULES_OFF; > + > + return config_fetch_recurse_submodules; > +} > + > struct submodule_parallel_fetch { > int count; > struct argv_array args; > @@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process > *cp, > if (!submodule) > submodule = submodule_from_name(_oid, ce->name); > > - default_argv = "yes"; > - if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) { > - if (submodule && > - submodule->fetch_recurse != > - RECURSE_SUBMODULES_NONE) { > - if (submodule->fetch_recurse == > - RECURSE_SUBMODULES_OFF) > - continue; > - if (submodule->fetch_recurse == > - RECURSE_SUBMODULES_ON_DEMAND) > { > - if > (!unsorted_string_list_lookup(_submodule_names, > - > submodule->name)) > - continue; > - default_argv = "on-demand"; > - } > - } else { > - if ((config_fetch_recurse_submodules == > RECURSE_SUBMODULES_OFF) || > - gitmodules_is_unmerged) > - continue; > - if (config_fetch_recurse_submodules == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(_submodule_names, > - > submodule->name)) > - continue; > - default_argv = "on-demand"; > - } > - } > - } else if (spf->command_line_option == > RECURSE_SUBMODULES_ON_DEMAND) { > - if > (!unsorted_string_list_lookup(_submodule_names, > + switch (get_fetch_recurse_config(submodule, > spf->command_line_option)) > + { > + default: > + case RECURSE_SUBMODULES_DEFAULT: > + case RECURSE_SUBMODULES_ON_DEMAND: > + if (!submodule || > !unsorted_string_list_lookup(_submodule_names, > submodule->name)) > continue; > default_argv = "on-demand"; > + break; > + case RECURSE_SUBMODULES_ON: > + default_argv = "yes"; > + break; > + case RECURSE_SUBMODULES_OFF: > + continue; > } > > strbuf_addf(_path, "%s/%s", spf->work_tree, > ce->name); > -- > 2.0.0.274.g6b2cd91 >
Re: [RFC PATCH 1/2] implement fetching of moved submodules
On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigtwrote: > We store the changed submodules paths to calculate which submodule needs > fetching. This does not work for moved submodules since their paths do > not stay the same in case of a moved submodules. In case of new > submodules we do not have a path in the current checkout, since they > just appeared in this fetch. > > It is more general to collect the submodule names for changes instead of > their paths to include the above cases. > > With the change described above we implement 'on-demand' fetching of > changes in moved submodules. This sounds as if this would also enable fetching new submodules eventually? > Note: This does only work when repositories have a .gitmodules file. In > other words: It breaks if we do not get a name for a repository. > IIRC, consensus was that this is a requirement to get nice submodule > handling these days? I think that should have been the consensus since ~1.7.8 (since the submodules git dir can live inside the superprojects /module/). A gitlink entry without corresponding .gitmodules entry is just a gitlink. If we happen to have a repository at that path of the gitlink, we can be nice and pretend like it is a functional submodule, but it really is not. It's just another repo inside the superproject that happen to live at the path of a gitlink. > Signed-off-by: Heiko Voigt > --- > > I updated the leftover code from my series implementing recursive fetch > for moved submodules[1] to the current master. > > This breaks t5531 and t5545 because they do not use a .gitmodules file. > > I also have some code leftover that does fallback on paths in case no > submodule names can be found. But I do not really like it. The question > here is how far do we support not using .gitmodules. Is it e.g. > reasonable to say: "For --recurse-submodules=on-demand you need a > .gitmodules file?" I would not intentionally break users here, but any new functionality can safely assume (a) we have a proper .gitmodules entry or (b) it is not a submodule, so do nothing/be extra careful. For example in recursive diff sort of makes sense to also handle non-submodule gitlinks, but fetch is harder to tell. (just last night I was rereading https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/ which I think is a super cute application of gitlinks. If you happen to checkout such a tree, you don't want to fetch all of the fake submodules) > > [1] > https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/ Oha, that is from way back in the time. :) > submodule.c | 92 > + > t/t5526-fetch-submodules.sh | 35 + > 2 files changed, 86 insertions(+), 41 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 27de65a..3ed78ac 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -23,7 +23,7 @@ > static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND; > static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; > static int parallel_jobs = 1; > -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP; > +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP; > static int initialized_fetch_ref_tips; > static struct oid_array ref_tips_before_fetch; > static struct oid_array ref_tips_after_fetch; > @@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct > cache_entry *ce) > } > > static struct oid_array *submodule_commits(struct string_list *submodules, > - const char *path) > + const char *name) > { > struct string_list_item *item; > > - item = string_list_insert(submodules, path); > + item = string_list_insert(submodules, name); > if (item->util) > return (struct oid_array *) item->util; > > @@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct > string_list *submodules, > return (struct oid_array *) item->util; > } > > +struct collect_changed_submodules_cb_data { > + struct string_list *changed; Here a comment would be helpful or a more concise variable name. (What is changed?) > + const struct object_id *commit_oid; > +}; > + > static void collect_changed_submodules_cb(struct diff_queue_struct *q, > struct diff_options *options, > void *data) > { > + struct collect_changed_submodules_cb_data *me = data; > + struct string_list *changed = me->changed; > + const struct object_id *commit_oid = me->commit_oid; > int i; > - struct string_list *changed = data; > > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; >
Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
Junio C Hamanowrites: > Torsten Bögershausen writes: > >> Unless we re-define the meaning of "NULL" into "don't do CRLF conversions, >> like SAFE_CRLF_KEEP_CRLF does. > > My preference is not to use NULL as any hint. Instead, the "flag" > parameter we already pass to convert_to_git(), just like the updated > read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should > not disturb existing CRLF without looking at the istate, should be > used to tell convert_to_git() to do the opposite, but do so without > looking at the istate. > > Perhaps SAFE_CRLF_FALSE should be such a flag. Or perhaps we need > to invent another flag. I dunno. I grepped for SAFE_CRLF_FALSE and found only two callers that explicitly pass it down the callchain, both of which essentially says "if we are writing the object out, use core.safecrlf, but if we are merely hashing, do SAFE_CRLF_FALSE thing". I think their use case is quite similar to the codepath we are discussing---they have a data that come from the outside world, and they know the index entry that happens to be at the path has nothing to do with the data they are asking convert_to_git() to massage (i.e. it is _wrong_ if the contents of the blob that happens to be in the index at the path affected the outcome of the conversion). So I think the fix to convert_to_git() can just use SAFE_CRLF_FALSE as such an instruction that tells the function not do the "safe crlf" thing, which looks at the contents in the index and decide if the CRLFs in the contents being converted should be turned into LFs. And because the function is told not to look at the index, it should be made safe to pass istate=NULL. There does not seem to be a need to invent another flag. Thanks.
Re: [PATCH] files-backend: cheapen refname_available check when locking refs
On Thu, Aug 17, 2017 at 05:12:50PM +0200, Michael Haggerty wrote: > I was testing this using the reporter's recipe (but fetching from a > local clone), and found the following surprising timing numbers: > > b05855b5bc (before the slowdown): 22.7 s > 524a9fdb51 (immediately after the slowdown): 13 minutes > 4e81f1ecf1 (after this fix): 14.5 s > > The fact that the fetch is now significantly *faster* than before the > slowdown seems not to have anything to do with the reference code. I bisected this (with some hackery, since the commits in the middle all take 13 minutes to run). The other speedup is indeed unrelated, and is due to Brandon's aacc5c1a81 (submodule: refactor logic to determine changed submodules, 2017-05-01). The commit message doesn't mention performance (it's mostly about code reduction). I think the speedup comes from using diff_tree_combined_merge() instead of manually diffing each commit against its parents. But I didn't do further timings to verify that (I'm reporting it here mostly as an interesting curiosity for submodule folks). > diff --git a/refs/files-backend.c b/refs/files-backend.c > index e9b95592b6..f2a420c611 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, > > /* >* If the ref did not exist and we are creating it, > - * make sure there is no existing ref that conflicts > - * with refname: > + * make sure there is no existing packed ref that > + * conflicts with refname: >*/ > if (refs_verify_refname_available( > - >base, refname, > + refs->packed_ref_store, refname, > extras, skip, err)) > goto error_return; > } This seems too easy to be true. :) But I think it matches what we were doing before 524a9fdb51 (so it's correct), and the performance numbers don't lie. -Peff
[PATCH] files-backend: cheapen refname_available check when locking refs
When locking references in preparation for updating them, we need to check that none of the newly added references D/F conflict with existing references (e.g., we don't allow `refs/foo` to be added if `refs/foo/bar` already exists, or vice versa). Prior to 524a9fdb51 (refs_verify_refname_available(): use function in more places, 2017-04-16), conflicts with existing loose references were checked by looking directly in the filesystem, and then conflicts with existing packed references were checked by running `verify_refname_available_dir()` against the packed-refs cache. But that commit changed the final check to call `refs_verify_refname_available()` against the *whole* files ref-store, including both loose and packed references, with the following comment: > This means that those callsites now check for conflicts with all > references rather than just packed refs, but the performance cost > shouldn't be significant (and will be regained later). That comment turned out to be too sanguine. User s...@kazlauskas.me reported that fetches involving a very large number of references in neighboring directories were slowed down by that change. The problem is that when fetching, each reference is updated individually, within its own reference transaction. This is done because some reference updates might succeed even though others fail. But every time a reference update transaction is finished, `clear_loose_ref_cache()` is called. So when it is time to update the next reference, part of the loose ref cache has to be repopulated for the `refs_verify_refname_available()` call. If the references are all in neighboring directories, then the cost of repopulating the reference cache increases with the number of references, resulting in O(N²) effort. The comment above also claims that the performance cost "will be regained later". The idea was that once the packed-refs were finished being split out into a separate ref-store, we could limit the `refs_verify_refname_available()` call to the packed references again. That is what we do now. Signed-off-by: Michael Haggerty--- This patch applies on top of branch mh/packed-ref-store. It can also be obtained from my fork [1] as branch "faster-refname-available-check". I was testing this using the reporter's recipe (but fetching from a local clone), and found the following surprising timing numbers: b05855b5bc (before the slowdown): 22.7 s 524a9fdb51 (immediately after the slowdown): 13 minutes 4e81f1ecf1 (after this fix): 14.5 s The fact that the fetch is now significantly *faster* than before the slowdown seems not to have anything to do with the reference code. refs/files-backend.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index e9b95592b6..f2a420c611 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs, /* * If the ref did not exist and we are creating it, -* make sure there is no existing ref that conflicts -* with refname: +* make sure there is no existing packed ref that +* conflicts with refname: */ if (refs_verify_refname_available( - >base, refname, + refs->packed_ref_store, refname, extras, skip, err)) goto error_return; } @@ -938,7 +938,7 @@ static struct ref_lock *lock_ref_sha1_basic(struct files_ref_store *refs, * our refname. */ if (is_null_oid(>old_oid) && - refs_verify_refname_available(>base, refname, + refs_verify_refname_available(refs->packed_ref_store, refname, extras, skip, err)) { last_errno = ENOTDIR; goto error_return; -- 2.11.0
Re: git fetch with refspec does not include tags?
On Thu, Aug 17, 2017 at 11:29:48AM +, Carlsson, Magnus wrote: > > 2. If we continue to follow the "are we storing any refs" rule for the > > default, possibly it should expand to "did we store anything, > > including opportunistic tracking-ref updates". > > Personally I think that I would prefer to always fetch relevant tags. > If I don't want tags I can simply use the "--no-tags" I think it's a bit more complex because "git pull" uses "git fetch" under the hood. In fact, your "git fetch origin master" is exactly what gets run when you do: git pull origin master That's maybe OK. But I think one-off pulls like: git pull https://example.com/repo.git master probably wouldn't want it. I'd have to give it some thought. -Peff
(no subject)
-- I'd like you to be in custody of my late client's fortune. My client died along with his family including his next-of-kin The funds shall be used for investment under your management Do reply for details. Regards Richard Williams Email:rich19willi...@gmail.com
Liebste,
Liebste, Wie gehts Ihnen heute, ich hoffe, dass meine Post Sie in gutem Zustand der Gesundheit erfüllt? Sehr geehrte Damen und Herren, ich habe mich entschlossen, mit Ihnen in Kontakt zu treten, wenn ich bedenkt habe, dass wir uns noch nicht getroffen haben, aber wegen eines Umstandes, der mich verpflichtet hat, habe ich mich entschlossen, Sie wegen der Dringlichkeit meiner gegenwärtigen Situation hier im Flüchtlingslager für Ihre Rettung zu kontaktieren Auch für ein Business-Venture / Projekt, das ich brauche Ihre Assistentin in diesem Business- Niederlassung in Ihrem Land als mein ausländischer Partner sowie meine juristische ernannte Treuhänder. Ich bin Aisha Muammar Gaddafi, die einzige Tochter des umkämpften Präsidenten von Libyen, Hon. Muammar Gaddafi Ich wohne derzeit in Burkina Faso leider als Flüchtling. Ich schreibe diese Post mit Tränen und Trauer aus meinem Herzen und frage nach deiner dringenden Hilfe. Ich bin seit dem Tod meines verstorbenen Vaters durch Schmerzen und traurigen Moment gegangen. Mittlerweile ist meine Familie das Ziel der westlichen Nationen, die von Nato geführt werden, die meinen Vater um jeden Preis zerstören will. Unsere Investitionen und Bankkonten in mehreren Ländern sind ihre Ziele zum Einfrieren. Mein Vater des gesegneten Gedächtnisses stellte die Summe von $ 10.5M (Zehn Millionen, Fünfhundert Tausend Dollar) in (ECO) Bank Burkina Faso, die er benutzte meinen Namen als die nächste Angehörige. Ich habe von der (ECO) Bank beauftragt, einen interessierten ausländischen Investor / Partner zu präsentieren, der als Treuhänder stehen kann und den Fonds in seinem Konto für eine mögliche Investition in sein Land aufgrund meines Flüchtlingsstatus hier in Burkina Faso erhalten kann. Ich bin auf der Suche nach einer ehrlichen und zuverlässigen Person, die mir helfen und als Treuhänder stehen wird, damit ich ihn der Bank für die Übertragung des Fonds auf sein Bankkonto in Übersee vorstellen werde. Ich habe beschlossen, dich nach meinen Gebeten zu kontaktieren und ich glaube, dass du mein Vertrauen nicht verraten wirst. Aber nehmen Sie mich als Ihre eigene Schwester oder Tochter. Wenn diese Transaktion Sie interessiert, müssen Sie es niemandem offenlegen, weil was mit meiner ganzen Familie los ist, wenn die vereinigte Nation dieses Konto kennt, werden sie es einfrieren, wenn sie andere einfrieren, also bitte diese Transaktion behalten Nur für dich, bis wir es fertig machen Apologetic für meine Bilder werde ich es in meine nächste Mail und mehr über mich einschließen, wenn ich von dir höre okay. Bitte, ich möchte, dass Sie mich hier für weitere Konversationen kontaktieren. Danke und beste Grüße, Mit freundlichen Grüßen. Aisha Gaddafi
Re: What's cooking in git.git (Aug 2017, #03; Mon, 14)
On Thursday 17 August 2017 12:58 AM, Junio C Hamano wrote: Nothing has. Neither thread seems to have any comment from anybody but you, and I took it as an indication that people do not think it is a good change. I do not find the s/branch/parameter/ too bad (although I would have said "arguments" instead). For the other one, I personally think split sentences and lines make output look worse, but this is obviously subjective (just like the patch itself). Thanks! Will keep this in mind in future. BTW, what about the patch that did get user attention? https://public-inbox.org/git/<0102015d7ae53b0a-a6505296-9257-4b0d-84d6-2152e17eb070-000...@eu-west-1.amazonses.com> --- Kaartic
Re: git fetch with refspec does not include tags?
Thanks for your quick answer! > 1. Definitely these defaults are under-documented. I couldn't find >them anywhere in git-fetch(1). Yes, some sort of explanation would be good, especially since one of the first sentences state that you always get relevant tags. > 2. If we continue to follow the "are we storing any refs" rule for the > default, possibly it should expand to "did we store anything, > including opportunistic tracking-ref updates". Personally I think that I would prefer to always fetch relevant tags. If I don't want tags I can simply use the "--no-tags" -- Magnus MAGNUS CARLSSON Staff Software Engineer ARRIS o: +46 13 36 75 92 e: magnus.carls...@arris.com w: www.arris.com ARRIS: Legal entity: Arris Sweden AB - Registered Office: Teknikringen 2, 583 30 Linkoping, Sweden - Reg No:556518-5831 - VAT No:SE 556518-583 This electronic transmission (and any attached document) is for the sole use of the individual or entity to whom it is addressed. It is confidential and may be attorney-client privileged. In any event the Sender reserves, to the fullest extent, any "legal advice privilege". Any further distribution or copying of this message is strictly prohibited. If you received this message in error, please notify the Sender immediately and destroy the attached message (and all attached documents). From: Jeff KingSent: Thursday, August 17, 2017 11:28 To: Carlsson, Magnus Cc: git@vger.kernel.org Subject: Re: git fetch with refspec does not include tags? On Thu, Aug 17, 2017 at 09:02:52AM +, Carlsson, Magnus wrote: > In the git fetch documentation it states that by default you will > fetch all tags that point into the history to the branches fetched. > > "By default, any tag that points into the histories being fetched is > also fetched; the effect is to fetch tags that point at branches that > you are interested in. This default behavior can be changed by using > the --tags or --no-tags options or by configuring > remote..tagOpt. By using a refspec that fetches tags explicitly, > you can fetch tags that do not point into branches you are interested > in as well." > > But for me I get tags if I do "git fetch" or "git fetch origin" but if > I do "git fetch origin master" I don't get tags related to the master > branch. > > I understand that this might be due to me specifying a refspec and > then it will only get that exact refspec, but for me it's not that > clear from the documentation what I should expect. I read it as when I > fetch something all related tags will come along. I'll admit that our tag-autofollow behavior has often confused me. So I'll try to untangle what's happening at least if not the reasoning. :) I think the problem is not that you have a refspec, but that your refspec has no destination. Looking at the fetch code, we seem to turn on autotags only when the destination is a "real" ref and not just the default FETCH_HEAD. Which sort-of makes sense. If you're doing a one-off into FETCH_HEAD, you probably don't want to create tags, even if you have the objects they point to. But this is further complicated by the opportunistic tracking-ref updates. You can see some interesting behavior with a setup like this: git init parent git -C parent commit --allow-empty -m one && git -C parent tag -m foo mytag git init child cd child git remote add origin ../parent and then: # no tags, we just populate FETCH_HEAD because of the bare URL git fetch ../parent # this does fetch tags, because we're storing the result according to # the configured refspec ("refs/heads/*:refs/remotes/origin/*"). git fetch origin # this doesn't fetch tags, as the main command is "just" populating # FETCH_HEAD. But then our logic for "hey, we fetched the ref for # refs/remotes/origin/master, so let's update it on the side" kicks # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but # not the tags. Weird. git fetch origin master # and this one does fetch tags, because we have a real destination. git fetch origin master:foo So what I'd say is: 1. Definitely these defaults are under-documented. I couldn't find them anywhere in git-fetch(1). 2. If we continue to follow the "are we storing any refs" rule for the default, possibly it should expand to "did we store anything, including opportunistic tracking-ref updates". -Peff
[RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch
To make extending this logic later easier. Signed-off-by: Heiko Voigt--- I am quite sure I replicated the same logic but a few more eyes would be appreciated. Cheers Heiko submodule.c | 55 +++ 1 file changed, 27 insertions(+), 28 deletions(-) diff --git a/submodule.c b/submodule.c index 3ed78ac..a1011f4 100644 --- a/submodule.c +++ b/submodule.c @@ -1171,6 +1171,21 @@ int submodule_touches_in_range(struct object_id *excl_oid, return ret; } +static int get_fetch_recurse_config(const struct submodule *submodule, int command_line_option) +{ + if (command_line_option != RECURSE_SUBMODULES_DEFAULT) + return command_line_option; + + if (submodule && submodule->fetch_recurse != RECURSE_SUBMODULES_NONE) + /* local config overrules everything except commandline */ + return submodule->fetch_recurse; + + if (gitmodules_is_unmerged) + return RECURSE_SUBMODULES_OFF; + + return config_fetch_recurse_submodules; +} + struct submodule_parallel_fetch { int count; struct argv_array args; @@ -1203,37 +1218,21 @@ static int get_next_submodule(struct child_process *cp, if (!submodule) submodule = submodule_from_name(_oid, ce->name); - default_argv = "yes"; - if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) { - if (submodule && - submodule->fetch_recurse != - RECURSE_SUBMODULES_NONE) { - if (submodule->fetch_recurse == - RECURSE_SUBMODULES_OFF) - continue; - if (submodule->fetch_recurse == - RECURSE_SUBMODULES_ON_DEMAND) { - if (!unsorted_string_list_lookup(_submodule_names, - submodule->name)) - continue; - default_argv = "on-demand"; - } - } else { - if ((config_fetch_recurse_submodules == RECURSE_SUBMODULES_OFF) || - gitmodules_is_unmerged) - continue; - if (config_fetch_recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND) { - if (!unsorted_string_list_lookup(_submodule_names, - submodule->name)) - continue; - default_argv = "on-demand"; - } - } - } else if (spf->command_line_option == RECURSE_SUBMODULES_ON_DEMAND) { - if (!unsorted_string_list_lookup(_submodule_names, + switch (get_fetch_recurse_config(submodule, spf->command_line_option)) + { + default: + case RECURSE_SUBMODULES_DEFAULT: + case RECURSE_SUBMODULES_ON_DEMAND: + if (!submodule || !unsorted_string_list_lookup(_submodule_names, submodule->name)) continue; default_argv = "on-demand"; + break; + case RECURSE_SUBMODULES_ON: + default_argv = "yes"; + break; + case RECURSE_SUBMODULES_OFF: + continue; } strbuf_addf(_path, "%s/%s", spf->work_tree, ce->name); -- 2.0.0.274.g6b2cd91
Re: tsan: t5400: set_try_to_free_routine
On Tue, Aug 15, 2017 at 02:53:07PM +0200, Martin Ågren wrote: > Using SANITIZE=thread made t5400-send-pack.sh hit the potential race > below. > > This is set_try_to_free_routine in wrapper.c. The race relates to the > reading of the "old" value. The caller doesn't care about the "old" > value, so this should be harmless right now. But it seems that using > this mechanism from multiple threads and restoring the earlier value > will probably not work out every time. (Not necessarily because of the > race in set_try_to_free_routine, but, e.g., because callers might not > restore the function pointer in the reverse order of how they > originally set it.) > > Properly "fixing" this for thread-safety would probably require some > redesigning, which at the time might not be warranted. I'm just posting > this for completeness. > > Martin > > WARNING: ThreadSanitizer: data race (pid=21382) > Read of size 8 at 0x00979970 by thread T1: > #0 set_try_to_free_routine wrapper.c:35 (git+0x006cde1c) > #1 prepare_trace_line trace.c:105 (git+0x006a3bf0) > #2 trace_strbuf_fl trace.c:185 (git+0x006a3bf0) > #3 packet_trace pkt-line.c:80 (git+0x005f9f43) > #4 packet_read pkt-line.c:309 (git+0x005fbe10) > #5 recv_sideband sideband.c:37 (git+0x00684c5e) > #6 sideband_demux send-pack.c:216 (git+0x0065a38c) > #7 run_thread run-command.c:933 (git+0x00655a93) > #8 (libtsan.so.0+0x000230d9) I was curious why the trace code would care about the free routine in the first place. Digging in the mailing list, I didn't find a lot of discussion. But I think the problem is basically that the trace infrastructure wants to be thread-safe, but the default free-pack-memory callback isn't. It's ironic that we fix the thread-unsafety of the free-pack-memory function by using the also-thread-unsafe set_try_to_free_routine. Further irony: the trace routines aren't thread-safe in the first place, as they do lazy initialization of key->fd using an "initialized" field. In practice it probably means double-writing key->fd and leaking a descriptor (though there are no synchronizing operations there, so it's entirely possible a compiler could reorder the assignments to key->fd and key->initialized and a simultaneous reader could read a garbage key->fd value). We also call getenv(), which isn't thread-safe with other calls to getenv() or setenv(). I can think of a few possible directions: 1. Make set_try_to_free_routine() skip the write if it would be a noop. This is racy if threads are actually changing the value, but in practice they aren't (the first trace of any kind will set it to NULL, and it will remain there). 2. Make the free-packed routine thread-safe by taking a lock. It should hardly ever be called, so performance wouldn't matter. The big question is: _which_ lock. pack-objects, which uses threads already, has a version which does this. But it knows to take the big program-wide "I'm accessing unsafe parts of Git" lock that the rest of the program uses during its multi-threaded parts. There's no notion in the rest of Git for "now we're going into a multi-threaded part, so most calls will need to take a big global lock before doing anything interesting". For parts of Git that are explicitly multi-threaded (like the pack-objects delta search, or index-pack's delta resolution) that's not so bad. But the example above is just using a sideband demuxer. It would be unfortunate if the entire rest of send-pack had to start caring about taking that lock. 3. Is the free-pack-memory thing actually accomplishing much these days? It comes from 97bfeb34df (Release pack windows before reporting out of memory., 2006-12-24), and the primary issue is not actual allocated memory, but mmap'd packs clogging up the address space so that malloc can't find a suitable block. On 64-bit systems this is likely doing nothing. We have tons of address space. But even on 32-bit systems, the default core.packedGitLimit is only 256MiB (which was set around the same time). You can certainly come up with a corner case where freeing up that address space could matter. But I'd be surprised if this has actually helped much in practice over the years. And if you have a repo which is running so close to the address space limits of your system, the right answer is probably: upgrade to a 64-bit system. Even if the try-to-free thing helped in one run, it's likely that similar runs are not going to be so lucky, and even with it you're going to see sporadic out-of-memory failures. -Peff
[RFC PATCH 1/2] implement fetching of moved submodules
We store the changed submodules paths to calculate which submodule needs fetching. This does not work for moved submodules since their paths do not stay the same in case of a moved submodules. In case of new submodules we do not have a path in the current checkout, since they just appeared in this fetch. It is more general to collect the submodule names for changes instead of their paths to include the above cases. With the change described above we implement 'on-demand' fetching of changes in moved submodules. Note: This does only work when repositories have a .gitmodules file. In other words: It breaks if we do not get a name for a repository. IIRC, consensus was that this is a requirement to get nice submodule handling these days? Signed-off-by: Heiko Voigt--- I updated the leftover code from my series implementing recursive fetch for moved submodules[1] to the current master. This breaks t5531 and t5545 because they do not use a .gitmodules file. I also have some code leftover that does fallback on paths in case no submodule names can be found. But I do not really like it. The question here is how far do we support not using .gitmodules. Is it e.g. reasonable to say: "For --recurse-submodules=on-demand you need a .gitmodules file?" [1] https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/ submodule.c | 92 + t/t5526-fetch-submodules.sh | 35 + 2 files changed, 86 insertions(+), 41 deletions(-) diff --git a/submodule.c b/submodule.c index 27de65a..3ed78ac 100644 --- a/submodule.c +++ b/submodule.c @@ -23,7 +23,7 @@ static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND; static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; static int parallel_jobs = 1; -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP; +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP; static int initialized_fetch_ref_tips; static struct oid_array ref_tips_before_fetch; static struct oid_array ref_tips_after_fetch; @@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce) } static struct oid_array *submodule_commits(struct string_list *submodules, - const char *path) + const char *name) { struct string_list_item *item; - item = string_list_insert(submodules, path); + item = string_list_insert(submodules, name); if (item->util) return (struct oid_array *) item->util; @@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct string_list *submodules, return (struct oid_array *) item->util; } +struct collect_changed_submodules_cb_data { + struct string_list *changed; + const struct object_id *commit_oid; +}; + static void collect_changed_submodules_cb(struct diff_queue_struct *q, struct diff_options *options, void *data) { + struct collect_changed_submodules_cb_data *me = data; + struct string_list *changed = me->changed; + const struct object_id *commit_oid = me->commit_oid; int i; - struct string_list *changed = data; for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; struct oid_array *commits; + const struct submodule *submodule; + if (!S_ISGITLINK(p->two->mode)) continue; - if (S_ISGITLINK(p->one->mode)) { - /* -* NEEDSWORK: We should honor the name configured in -* the .gitmodules file of the commit we are examining -* here to be able to correctly follow submodules -* being moved around. -*/ - commits = submodule_commits(changed, p->two->path); - oid_array_append(commits, >two->oid); - } else { - /* Submodule is new or was moved here */ - /* -* NEEDSWORK: When the .git directories of submodules -* live inside the superprojects .git directory some -* day we should fetch new submodules directly into -* that location too when config or options request -* that so they can be checked out from there. -*/ + submodule = submodule_from_path(commit_oid, p->two->path); + if (!submodule) continue; - } + + commits = submodule_commits(changed, submodule->name); +
Re: [PATCH 1/9] Convert pack-objects to size_t
On Wed, Aug 16, 2017 at 10:22:39PM +0200, Martin Koegler wrote: > On Mon, Aug 14, 2017 at 10:08:05AM -0700, Junio C Hamano wrote: > >It may help reducing the maintenance if we introduced obj_size_t > >that is defined to be size_t for now, so that we can later swap > >it to ofs_t or some larger type when we know we do need to > >support objects whose size cannot be expressed in size_t, but I > >do not offhand know what the pros-and-cons with such an approach > >would look like. > > Where should the use of obj_size_t end and the use of size_t start? > > We often determine a object size and then pass it to malloc. > We would start with a larger datatyp and then truncate for memory allocation, > which use size_t. The truncation is done with xsize_t: The "old" sha1_file.c has something like this: idx_size = xsize_t(st.st_size); I personally don't think that obj_size_t gives us so much. There are "objects" which are "on disk", and they may have a length off_t, And there are "objects" loaded into memory, and they have a length size_t. And everybody can check that we use the right type. Additionally I don't like it very much, when object size in memory is counted in a 64 bit value (and this will happen if obj_size_ = off_t == 64bit) Anyhow, to answer your question: All calles xmalloc() must be prepared like this: p = xmalloc(xsize_t(size_of_object)); That should do the trick. > > Regards, > Martin
[PATCH] t5526: fix some broken && chains
Signed-off-by: Heiko Voigt--- t/t5526-fetch-submodules.sh | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh index ce788e9..22a7358 100755 --- a/t/t5526-fetch-submodules.sh +++ b/t/t5526-fetch-submodules.sh @@ -193,7 +193,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" ' add_upstream_commit && ( cd downstream && - git config fetch.recurseSubmodules true + git config fetch.recurseSubmodules true && git fetch >../actual.out 2>../actual.err ) && test_must_be_empty actual.out && @@ -218,7 +218,7 @@ test_expect_success "--no-recurse-submodules overrides config setting" ' add_upstream_commit && ( cd downstream && - git config fetch.recurseSubmodules true + git config fetch.recurseSubmodules true && git fetch --no-recurse-submodules >../actual.out 2>../actual.err ) && ! test -s actual.out && @@ -232,7 +232,7 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in cd submodule && git config --unset fetch.recurseSubmodules ) && - git config --unset fetch.recurseSubmodules + git config --unset fetch.recurseSubmodules && git fetch >../actual.out 2>../actual.err ) && ! test -s actual.out && @@ -312,7 +312,7 @@ test_expect_success "Recursion picks up all submodules when necessary" ' ) && head1=$(git rev-parse --short HEAD^) && git add subdir/deepsubmodule && - git commit -m "new deepsubmodule" + git commit -m "new deepsubmodule" && head2=$(git rev-parse --short HEAD) && echo "Fetching submodule submodule" > ../expect.err.sub && echo "From $pwd/submodule" >> ../expect.err.sub && -- 2.0.0.274.g6b2cd91
[PATCH] add test for bug in git-mv with nested submodules
When using git-mv with a submodule it will detect that and update the paths for its configurations (.gitmodules, worktree and gitfile). This does not work for nested submodules where a user renames the root submodule. We discovered this fact when working on on-demand fetch for renamed submodules. Lets add a test to document. Signed-off-by: Heiko Voigt--- t/t7001-mv.sh | 9 + 1 file changed, 9 insertions(+) diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh index e365d1f..39f8aed 100755 --- a/t/t7001-mv.sh +++ b/t/t7001-mv.sh @@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested directories' ' test_cmp actual expect ' +test_expect_failure 'moving nested submodules' ' + git commit -am "cleanup commit" && + git submodule add ./. sub_nested && + git commit -m "add sub_nested" && + git submodule update --init --recursive && + git mv sub_nested sub_nested_moved && + git status +' + test_done -- 2.0.0.274.g6b2cd91
Re: [PATCH] check-ref-format: require a repository for --branch
On Mon, Jul 17, 2017 at 11:18:43PM +0200, Marko Kungla wrote: > I guess that should only be about that it should not hit a (BUG). > In my case in the example I gave I scan trough the directories to > check repository status one of the tasks make use of check-ref-format. > Since it may hit directory which is not a git repository it should not > expose error (BUG) right. Right. The BUG should definitely be corrected. Between what Jonathan is suggesting and my patch, either would be fine for the case you described originally ("--branch @{-1}" would always fail in a non-repo). -Peff
Re: [PATCH] check-ref-format: require a repository for --branch
On Mon, Jul 17, 2017 at 10:27:09AM -0700, Jonathan Nieder wrote: > > --- a/t/t1402-check-ref-format.sh > > +++ b/t/t1402-check-ref-format.sh > > @@ -161,6 +161,10 @@ test_expect_success 'check-ref-format --branch from > > subdir' ' > > test "$refname" = "$sha1" > > ' > > > > +test_expect_success 'check-ref-format --branch from non-repo' ' > > + test_must_fail nongit git check-ref-format --branch @{-1} > > +' > > + > > valid_ref_normalized() { > > prereq= > > case $1 in > > I don't think it's right. Today I can do > > $ cd /tmp > $ git check-ref-format --branch master > master > > You might wonder why I'd ever do such a thing. But that's what "git > check-ref-format --branch" is for --- it is for taking a > argument and turning it into a branch name. For example, if you have > a script with an $opt_branch variable that defaults to "master", it > may do > > resolved_branch=$(git check-ref-format --branch "$opt_branch") > > even though it is in a mode that not going to have to use > $resolved_branch and it is not running from a repository. I'm not sure I buy that. What does "turning it into a branch name" even mean when you are not in a repository? Clearly @{-1} must fail. And everything else is just going to output the exact input you provided. So any script calling "check-ref-format --branch" outside of a repo would be better off not calling it at all. At best it does nothing, and at worst it's going to give a confusing error when $opt_branch is something like "@{-1}". A more compelling argument along these lines is something like: Accepting --branch outside of a repo is pointless, but it's something we've historically accepted. To avoid breaking existing scripts (even if they are doing something pointless), we'll continue to allow it. I'm not sure I buy _that_ line of reasoning either, but it at least makes sense to me (I just think it isn't worth the complexity of trying to audit the innards of interpret_branch_name()). -Peff
Dear Talented
Dear Talented, I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue Sky Studio a Film Corporation Located in the United State, is Soliciting for the Right to use Your Photo/Face and Personality as One of the Semi -Major Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) The Movie is Currently Filming (In Production) Please Note That There Will Be No Auditions, Traveling or Any Special / Professional Acting Skills, Since the Production of This Movie Will Be Done with our State of Art Computer -Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD.For More Information/Understanding. Talent Scout Kim Sharma
Quick Loans
Loan Offer at 3%, Feel Free to REPLY back to us for more info
Re: git fetch with refspec does not include tags?
On Thu, Aug 17, 2017 at 09:02:52AM +, Carlsson, Magnus wrote: > In the git fetch documentation it states that by default you will > fetch all tags that point into the history to the branches fetched. > > "By default, any tag that points into the histories being fetched is > also fetched; the effect is to fetch tags that point at branches that > you are interested in. This default behavior can be changed by using > the --tags or --no-tags options or by configuring > remote..tagOpt. By using a refspec that fetches tags explicitly, > you can fetch tags that do not point into branches you are interested > in as well." > > But for me I get tags if I do "git fetch" or "git fetch origin" but if > I do "git fetch origin master" I don't get tags related to the master > branch. > > I understand that this might be due to me specifying a refspec and > then it will only get that exact refspec, but for me it's not that > clear from the documentation what I should expect. I read it as when I > fetch something all related tags will come along. I'll admit that our tag-autofollow behavior has often confused me. So I'll try to untangle what's happening at least if not the reasoning. :) I think the problem is not that you have a refspec, but that your refspec has no destination. Looking at the fetch code, we seem to turn on autotags only when the destination is a "real" ref and not just the default FETCH_HEAD. Which sort-of makes sense. If you're doing a one-off into FETCH_HEAD, you probably don't want to create tags, even if you have the objects they point to. But this is further complicated by the opportunistic tracking-ref updates. You can see some interesting behavior with a setup like this: git init parent git -C parent commit --allow-empty -m one && git -C parent tag -m foo mytag git init child cd child git remote add origin ../parent and then: # no tags, we just populate FETCH_HEAD because of the bare URL git fetch ../parent # this does fetch tags, because we're storing the result according to # the configured refspec ("refs/heads/*:refs/remotes/origin/*"). git fetch origin # this doesn't fetch tags, as the main command is "just" populating # FETCH_HEAD. But then our logic for "hey, we fetched the ref for # refs/remotes/origin/master, so let's update it on the side" kicks # in. And we end up updating FETCH_HEAD _and_ the tracking branch, but # not the tags. Weird. git fetch origin master # and this one does fetch tags, because we have a real destination. git fetch origin master:foo So what I'd say is: 1. Definitely these defaults are under-documented. I couldn't find them anywhere in git-fetch(1). 2. If we continue to follow the "are we storing any refs" rule for the default, possibly it should expand to "did we store anything, including opportunistic tracking-ref updates". -Peff
Re: git add -p breaks after split on change at the top of the file
On Thu, Aug 17, 2017 at 05:03:08AM -0400, Jeff King wrote: > But that does the opposite of what we want: it makes this case work when > --allow-overlap isn't specified. I think my first attempt is probably > closer to the right direction (but we'd want to have it kick in only > when --allow-overlap is specified; well formed patches shouldn't overlap > but we'd want to barf loudly if they do). > > I'll stop here for now before digging any further. I'm not all that > familiar with the apply code, so I have a feeling Junio's comments may > stop me from going down an unproductive dead end. :) OK, last email, I promise. :) My first attempt which turns off match_beginning for the overlapping hunk really does feel wrong. It would blindly match a similar hunk later in the file. So if you did something like: 1. The first hunk deletes context line "x". 2. The second overlapping hunk claims to start at the beginning of the file (line 1), and then adds a line after "x". 3. Later in the file, we have another line "x". Then the second hunk should be rejected, and not find the unrelated "x" from (3). But with my patch, I suspect it would be found (because we dropped the requirement to find it at the beginning). Of course this is a pretty ridiculous input in the first place. In theory it _could_ be figured out, but overlapping hunks like this are always going to cause problems (in this case, context from the second hunk became a removal, and the second hunk no longer applies). So maybe it's not worth caring about. But I think the most robust solution would involve checking the lines between try_lno and the start to make sure they were all added. I just don't know that we have an easy way of checking that. Perhaps if we recorded the cumulative number of lines added in previous hunks, and used that as our starting point (instead of "line 0"). I also suspect that match_end has a similar problem, but I couldn't trigger it in practice (perhaps it's impossible, or perhaps my brain is just not up to the challenge today). -Peff
Re: git add -p breaks after split on change at the top of the file
On Thu, Aug 17, 2017 at 04:41:09AM -0400, Jeff King wrote: > diff --git a/apply.c b/apply.c > index 41ee63e1db..606db58218 100644 > --- a/apply.c > +++ b/apply.c > @@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state, >* In other words, a hunk that is (frag->oldpos <= 1) with or >* without leading context must match at the beginning. >*/ > - match_beginning = (!frag->oldpos || > -(frag->oldpos == 1 && !state->unidiff_zero)); > + match_beginning = (nth_fragment == 1 && > +(!frag->oldpos || > + (frag->oldpos == 1 && !state->unidiff_zero))); > > /* >* A hunk without trailing lines must match at the end. > > > But I'm not quite sure if that's right. The rest of the overlap code > seems to mark patched lines with a flag. Meaning that instead of giving > up and saying "well, this is the second line so we can't ever try > matching the beginning", we should be redefining "beginning" in that > case to allow 0 or more PATCHED lines starting from the beginning of the > file. Hmm, actually I was reading that part of the code backwards. We record the PATCHED flag _only_ when allow_overlap isn't set, not the other way around. So I had been imagining we'd want something like this: diff --git a/apply.c b/apply.c index 41ee63e1db..4048751807 100644 --- a/apply.c +++ b/apply.c @@ -2489,8 +2489,11 @@ static int match_fragment(struct apply_state *state, return 0; } - if (match_beginning && try_lno) - return 0; + if (match_beginning) { + for (i = 0; i < try_lno; i++) + if (!(img->line[i].flag & LINE_PATCHED)) + return 0; + } /* Quick hash check */ for (i = 0; i < preimage_limit; i++) But that does the opposite of what we want: it makes this case work when --allow-overlap isn't specified. I think my first attempt is probably closer to the right direction (but we'd want to have it kick in only when --allow-overlap is specified; well formed patches shouldn't overlap but we'd want to barf loudly if they do). I'll stop here for now before digging any further. I'm not all that familiar with the apply code, so I have a feeling Junio's comments may stop me from going down an unproductive dead end. :) -Peff
git fetch with refspec does not include tags?
Hi In the git fetch documentation it states that by default you will fetch all tags that point into the history to the branches fetched. "By default, any tag that points into the histories being fetched is also fetched; the effect is to fetch tags that point at branches that you are interested in. This default behavior can be changed by using the --tags or --no-tags options or by configuring remote..tagOpt. By using a refspec that fetches tags explicitly, you can fetch tags that do not point into branches you are interested in as well." But for me I get tags if I do "git fetch" or "git fetch origin" but if I do "git fetch origin master" I don't get tags related to the master branch. I understand that this might be due to me specifying a refspec and then it will only get that exact refspec, but for me it's not that clear from the documentation what I should expect. I read it as when I fetch something all related tags will come along. Using: git version 2.13.0.rc1.15.gd2bbb7c -- Magnus MAGNUS CARLSSON Staff Software Engineer ARRIS o: +46 13 36 75 92 e: magnus.carls...@arris.com w: www.arris.com ARRIS: Legal entity: Arris Sweden AB - Registered Office: Teknikringen 2, 583 30 Linkoping, Sweden - Reg No:556518-5831 - VAT No:SE 556518-583 This electronic transmission (and any attached document) is for the sole use of the individual or entity to whom it is addressed. It is confidential and may be attorney-client privileged. In any event the Sender reserves, to the fullest extent, any "legal advice privilege". Any further distribution or copying of this message is strictly prohibited. If you received this message in error, please notify the Sender immediately and destroy the attached message (and all attached documents).
Re: git add -p breaks after split on change at the top of the file
[+cc Junio, as this gets deep into git-apply innards] On Wed, Aug 16, 2017 at 10:25:04PM +0200, Simon Ruderich wrote: > $ git add -p > diff --git a/file b/file > index 587be6b..74a69a0 100644 > --- a/file > +++ b/file > @@ -1 +1,4 @@ > +a > +b > x > +c > Stage this hunk [y,n,q,a,d,/,s,e,?]? <-- press s > Split into 2 hunks. > @@ -1 +1,3 @@ > +a > +b > x > Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? <-- press e > > Now delete the line "+a" in your editor and save. > > error: patch failed: file:1 > error: file: patch does not apply > > I expected git add -p to stage this change without error. It > works fine without splitting the hunk (by deleting the first and > last + line in the diff). Interesting case. The problem isn't in add--interactive itself (I don't think), but in git-apply. This is the diff we end up feeding to "git apply --cached --check --recount --allow-overlap" to see if it applies: diff --git a/file b/file index 587be6b..74a69a0 100644 --- a/file +++ b/file @@ -1 +1,3 @@ +b x @@ -1 +3,2 @@ x +c The first hunk (that we edited) applies fine. But the second one does not. Its hunk header says that it starts at line "1", so we expect to find it at the beginning of the file. But of course it _isn't_ at the beginning of the file anymore, because the first hunk added a line before there. So this diff is somewhat bogus; it has two hunks which start at the same spot. But I think that's exactly the sort of thing that "--allow-overlap" should handle. Doing this makes your case work for me: diff --git a/apply.c b/apply.c index 41ee63e1db..606db58218 100644 --- a/apply.c +++ b/apply.c @@ -2966,8 +2966,9 @@ static int apply_one_fragment(struct apply_state *state, * In other words, a hunk that is (frag->oldpos <= 1) with or * without leading context must match at the beginning. */ - match_beginning = (!frag->oldpos || - (frag->oldpos == 1 && !state->unidiff_zero)); + match_beginning = (nth_fragment == 1 && + (!frag->oldpos || + (frag->oldpos == 1 && !state->unidiff_zero))); /* * A hunk without trailing lines must match at the end. But I'm not quite sure if that's right. The rest of the overlap code seems to mark patched lines with a flag. Meaning that instead of giving up and saying "well, this is the second line so we can't ever try matching the beginning", we should be redefining "beginning" in that case to allow 0 or more PATCHED lines starting from the beginning of the file. -Peff
Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
On Thu, Aug 17, 2017 at 12:12:36AM -0700, Junio C Hamano wrote: > Torsten Bögershausenwrites: > > > Unless we re-define the meaning of "NULL" into "don't do CRLF conversions, > > like SAFE_CRLF_KEEP_CRLF does. > > My preference is not to use NULL as any hint. Instead, the "flag" > parameter we already pass to convert_to_git(), just like the updated > read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should > not disturb existing CRLF without looking at the istate, should be > used to tell convert_to_git() to do the opposite, but do so without > looking at the istate. > > Perhaps SAFE_CRLF_FALSE should be such a flag. Or perhaps we need > to invent another flag. I dunno. OK, message taken, in short: I will come up with a new series in a couple of days - thanks for the input.
Re: reftable [v7]: new ref storage format
On Wed, Aug 16, 2017 at 11:05 PM, Junio C Hamanowrote: > I found it a slightly odd that we do not insist that update_indices > that appear in a single reftable file are consecutive, yet we > require that min_update_index of a reftable file must be one greater > than the max_update_index of a previous one. That is not a new > issue in v7, though. I think of `update_index` like a pseudo-time, and the `min_update_index` and `max_update_index` to be stating that "this reftable covers the time interval specified". So it's reasonable to say that the reftable files, together, should cover all time. But it might be that there are values of `update_index` for which no events survived within a reftable file that covers that time interval. This can happen if reference update records have been compacted away because later reference updates overwrote their effects, and either * reflogs were turned off for those updates, or * the corresponding reflogs have been compacted into a separate file, or * the corresponding reflog entries for those updates have been expired. Michael
Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
Torsten Bögershausenwrites: > Unless we re-define the meaning of "NULL" into "don't do CRLF conversions, > like SAFE_CRLF_KEEP_CRLF does. My preference is not to use NULL as any hint. Instead, the "flag" parameter we already pass to convert_to_git(), just like the updated read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should not disturb existing CRLF without looking at the istate, should be used to tell convert_to_git() to do the opposite, but do so without looking at the istate. Perhaps SAFE_CRLF_FALSE should be such a flag. Or perhaps we need to invent another flag. I dunno.
Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
Torsten Bögershausenwrites: > I don't have time to look at this today or tomorrow, > please give a hint if you are working further. It is past my bedtime, and generally I prefer not to touch topics that I know other people are willing to look into, especially when I know those "other people" are well informed and capable. Thanks.
Re: [PATCH] fix revisions doc about quoting for ':/' notation
On 17 August 2017 at 05:57, Junio C Hamanowrote: > Andreas Heiduk writes: > >> Am 16.08.2017 um 05:21 schrieb ryenus: >>> To make sure the `` in `:/` is seen as one search string, >>> one should quote/escape `` properly. >>> >>> Especially, the example given in the manual `:/fix nasty bug` does not >>> work because of missing quotes. The examples are now corrected, and a >>> note about quoting/escaping is added as well. >> >> Right now the documentation describes the syntax as git sees the >> parameters. This is agnostic of the shell or other UI with their >> different quoting rules. For example users of fish must quote >> `rev@{2}`. A GUI might require no quoting at all. In that case `:/"fix >> nasty bugs"` would be given to git verbatim and hence not find the revision. > > These are all good points that I didn't consider when responding. > Makes sense for me, too. I've just sent a v2 patch, which leaves the original example as-is, meanwhile added a example inside the explanation. Thanks!
[PATCH v2] update revisions doc for quoting in ':/' notation
To make sure the `` in `:/` is seen as one search string, one should quote/escape `` properly. Especially, the example given in the manual `:/fix nasty bug` does not work because of missing quotes when used in shell. A note about quoting/escaping is added along with a working example, however, the original example is left-as-is to be consistent with other examples. Signed-off-by: ryenus--- Documentation/revisions.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt index 61277469c..d2862d55d 100644 --- a/Documentation/revisions.txt +++ b/Documentation/revisions.txt @@ -185,7 +185,9 @@ existing tag object. e.g. ':/^foo'. The special sequence ':/!' is reserved for modifiers to what is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a literal '!' character, followed by 'foo'. Any other sequence beginning with - ':/!' is reserved for now. + ':/!' is reserved for now. And make sure to quote/escape for the text to be + interpreted as the expected search string/pattern, e.g., for a commit whose + message matches a literal \'`$`', use `git show :/\\\$` or `git show ':/\$'`. ':', e.g. 'HEAD:README', ':README', 'master:./README':: A suffix ':' followed by a path names the blob or tree -- 2.14.1
Re: [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply
tbo...@web.de writes: > Analyze the patch if there is a) any context line with CRLF, > or b) if any line with CRLF is to be removed. > Thanks to Junio C Hamano, his input became the base for the changes in t4124. > One test case is split up into 3: > - Detect the " a\r" line in the patch > - Detect the "-a\r" line in the patch > - Use LF in repo and CLRF in the worktree. (*) > > * This one proves that convert_to_git(_index,...) still needs to pass > the , otherwise Git will crash. I do not understand why you think it proves anything like that. Forget about "in repo" when you think about "git apply" without "--index/--cache". There is *NO* role the index should play in that mode. "Otherwise Git will crash" is true, because convert_to_git() tries to dereference the istate it is given to see if there is CR in the blob that happens to be in the index at the path. But step back and *think*. It only proves that convert_to_git() you have and/or the way read_old_data() you updated calls it after applying these two patches are still broken. The "blob that happens to be in the index at the path" does *NOT* have anything to do with the file in the working tree that you are patching. Such an index entry may not exist (and the code would see that there is 0 CRs and 0 LFs---so what?), or it may have a blob full of CRLF, or it may have a blob full of CR not LF, or any random thing that has NO relation to the file you are patching. Why should that random blob (which may not even exist---we are discussing "git apply" without "--index/--cache" here) affect the outcome? If it does not affect the outcome, why should convert_to_git() even look at it? It shouldn't be calling down to "has_cr_in_index(istate, path)" that is called from crlf_to_git() in the first place. The check for CR was done in the caller of convert_to_git(), i.e. read_old_data(), long before convert_to_git() is called, and the caller knows if it is (or it is not) dealing with data that needs crlf conversion at that point, based on the contents of the file being patched (which, again, does not have any relation to the blob that may or may not happen to be in the index). Your updated caller is already telling convert_to_git() codepath when it needs CRLF and it refrains from peeking in the index with SAFE_CRLF_KEEP_CRLF flag. The bug still in the code after applying these two patches is that the other choice, i.e. SAFE_CRLF_FALSE, that is passed from read_old_data() to convert_to_git() does *not* prevent the latter from peeking into the in-core index. And that is why "Otherwise Git will crash".
Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
On Wed, Aug 16, 2017 at 11:34:45AM -0700, Junio C Hamano wrote: > With the previous fixes to CRLF handling in place, read_old_data() > knows what it wants convert_to_git() to do with respect to CRLF. In > fact, this codepath is about applying a patch to a file in the > filesystem, which may not exist in the index, or may exist but may > not match what is recorded in the index, or in the extreme case, we > may not even be in a Git repository. If convert_to_git() peeked at > the index while doing its work, it *would* be a bug. > > Pass NULL instead of _index to the function to make sure we > catch future bugs to clarify this. Thanks for the work, and now our emails crossed. Calling convert_to_git(NULL,...) makes Git crash today, see t4124, my latest version, "LF in repo, CRLF in working tree) Unless we re-define the meaning of "NULL" into "don't do CRLF conversions, like SAFE_CRLF_KEEP_CRLF does. The combination of convert_to_git(NULL,...,SAFE_CRLF_KEEP_CRLF) is OK, but all others must supply an At a very first glance the end result may look like this: - Take my changes in convert.[ch] - Take your changes/commit in apply.c (except the NULL, see above) - Take my changes in t4124. I don't have time to look at this today or tomorrow, please give a hint if you are working further.
[PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
From: Torsten BögershausenWhen convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen --- convert.c | 10 ++ convert.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index deaf0ba7b3..040123b4fe 100644 --- a/convert.c +++ b/convert.c @@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate, src = dst->buf; len = dst->len; } - ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); - if (ret && dst) { - src = dst->buf; - len = dst->len; + if (checksafe != SAFE_CRLF_KEEP_CRLF) { + ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } } return ret | ident_to_git(path, src, len, dst, ca.ident); } diff --git a/convert.h b/convert.h index cecf59d1aa..cabd5ed6dd 100644 --- a/convert.h +++ b/convert.h @@ -10,7 +10,8 @@ enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, SAFE_CRLF_WARN = 2, - SAFE_CRLF_RENORMALIZE = 3 + SAFE_CRLF_RENORMALIZE = 3, + SAFE_CRLF_KEEP_CRLF = 4 }; extern enum safe_crlf safe_crlf; -- 2.14.1.145.gb3622a4ee9
[PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply
From: Torsten BögershausenWhen a file had been commited with CRLF but now .gitattributes say "* text=auto" (or core.autocrlf is true), the following does not roundtrip, `git apply` fails: printf "Added line\r\n" >>file && git diff >patch && git checkout -- . && git apply patch Before applying the patch, the file from working tree is converted into the index format (clean filter, CRLF conversion, ...) Here, when commited with CRLF, the line endings should not be converted. Note that `git apply --index` or `git apply --cache` doesn't call convert_to_git() because the source material is already in index format. Analyze the patch if there is a) any context line with CRLF, or b) if any line with CRLF is to be removed. In this case the patch file `patch` has mixed line endings, for a) it looks like this (ignore the $ at the begin of the line): $ diff --git a/one b/one $ index 533790e..c30dea8 100644 $ --- a/one $ +++ b/one $ @@ -1 +1,2 @@ $ a\r $ +b\r And for b) it looks like this: $ diff --git a/one b/one $ index 533790e..485540d 100644 $ --- a/one $ +++ b/one $ @@ -1 +1 @@ $ -a\r $ +b\r If `git apply` detects that the patch itself has CRLF, (look at the line " a\r" or "-a\r" above), the new flag has_crlf is set in "struct patch" and two things will happen: - read_old_data() will not convert CRLF into LF by calling convert_to_git(..., SAFE_CRLF_KEEP_CRLF); - The WS_CR_AT_EOL bit is set in the "white space rule", CRLF are no longer treated as white space. Thanks to Junio C Hamano, his input became the base for the changes in t4124. One test case is split up into 3: - Detect the " a\r" line in the patch - Detect the "-a\r" line in the patch - Use LF in repo and CLRF in the worktree. (*) * This one proves that convert_to_git(_index,...) still needs to pass the , otherwise Git will crash. Reported-by: Anthony Sottile Signed-off-by: Torsten Bögershausen --- apply.c | 28 +++- t/t4124-apply-ws-rule.sh | 33 +++-- 2 files changed, 50 insertions(+), 11 deletions(-) diff --git a/apply.c b/apply.c index f2d599141d..bebb176099 100644 --- a/apply.c +++ b/apply.c @@ -220,6 +220,7 @@ struct patch { unsigned int recount:1; unsigned int conflicted_threeway:1; unsigned int direct_to_threeway:1; + unsigned int has_crlf:1; struct fragment *fragments; char *result; size_t resultsize; @@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state, record_ws_error(state, result, line + 1, len - 2, state->linenr); } +/* Check if the patch has context lines with CRLF or + the patch wants to remove lines with CRLF */ +static void check_old_for_crlf(struct patch *patch, const char *line, int len) +{ + if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') { + patch->ws_rule |= WS_CR_AT_EOL; + patch->has_crlf = 1; + } +} + + /* * Parse a unified diff. Note that this really needs to parse each * fragment separately, since the only way to know the difference @@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state, if (!deleted && !added) leading++; trailing++; + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action == correct_ws_error) check_whitespace(state, line, len, patch->ws_rule); break; case '-': + check_old_for_crlf(patch, line, len); if (state->apply_in_reverse && state->ws_error_action != nowarn_ws_error) check_whitespace(state, line, len, patch->ws_rule); @@ -2268,8 +2282,11 @@ static void show_stats(struct apply_state *state, struct patch *patch) add, pluses, del, minuses); } -static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) +static int read_old_data(struct stat *st, struct patch *patch, +const char *path, struct strbuf *buf) { + enum safe_crlf safe_crlf = patch->has_crlf ? + SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE; switch (st->st_mode & S_IFMT) { case S_IFLNK: if (strbuf_readlink(buf, path, st->st_size) < 0) @@ -2278,7 +2295,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) case S_IFREG: if (strbuf_read_file(buf, path, st->st_size) != st->st_size) return error(_("unable to open or read %s"), path); - convert_to_git(_index, path, buf->buf, buf->len, buf, 0); + convert_to_git(_index, path, buf->buf, buf->len, buf, safe_crlf);