Re: why doesn't "git reset" mention optional pathspec?
On Sat, Dec 8, 2018 at 6:37 PM Robert P. J. Day wrote: > > On Sat, 8 Dec 2018, Duy Nguyen wrote: > > > On Sat, Dec 8, 2018 at 6:32 PM Robert P. J. Day > > wrote: > > > > > > On Sat, 8 Dec 2018, Duy Nguyen wrote: > > > > > > > On Sat, Dec 8, 2018 at 5:08 PM Robert P. J. Day > > > > wrote: > > > > > > > > > > > > > > > from "man git-reset": > > > > > > > > > > SYNOPSIS > > > > > git reset [-q] [] [--] ... > > > > > git reset (--patch | -p) [] [--] [...] > > > > > git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] > > > > > [] > > > > > > > > > > oddly, the third form says nothing about possible "", even > > > > > though i'm pretty sure they're valid in that third case (at least > > > > > for "--mixed"). thoughts? is that just an oversight in the man > > > > > page? > > > > > > > > --mixed prints a deprecation warning. I don't think it's worth > > > > making the synopsis more complicated for that. All other modes > > > > reject pathspec. > > > > > > i just tested this, and i don't see a deprecation warning. > > > > Hmm.. maybe I misread the code. I just tried it > > > > $ ./git reset --mixed HEAD foo > > warning: --mixed with paths is deprecated; use 'git reset -- ' > > instead. > > weird ... i just tried this two ways, explicitly specifying > "--mixed" and also without (which is the default mode, right?), and i > got the deprecated message with the first but not the second. that > seems ... odd. Without --mixed, you're using the first form git reset [-q] [] [--] ... which accepts pathspec. If it's not clear, of course patches are welcome. -- Duy
Re: why doesn't "git reset" mention optional pathspec?
On Sat, 8 Dec 2018, Duy Nguyen wrote: > On Sat, Dec 8, 2018 at 6:32 PM Robert P. J. Day wrote: > > > > On Sat, 8 Dec 2018, Duy Nguyen wrote: > > > > > On Sat, Dec 8, 2018 at 5:08 PM Robert P. J. Day > > > wrote: > > > > > > > > > > > > from "man git-reset": > > > > > > > > SYNOPSIS > > > > git reset [-q] [] [--] ... > > > > git reset (--patch | -p) [] [--] [...] > > > > git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] > > > > [] > > > > > > > > oddly, the third form says nothing about possible "", even > > > > though i'm pretty sure they're valid in that third case (at least > > > > for "--mixed"). thoughts? is that just an oversight in the man > > > > page? > > > > > > --mixed prints a deprecation warning. I don't think it's worth > > > making the synopsis more complicated for that. All other modes > > > reject pathspec. > > > > i just tested this, and i don't see a deprecation warning. > > Hmm.. maybe I misread the code. I just tried it > > $ ./git reset --mixed HEAD foo > warning: --mixed with paths is deprecated; use 'git reset -- ' instead. my test: Changes to be committed: (use "git reset HEAD ..." to unstage) modified: README.asc modified: Rakefile $ git reset -- README.asc Unstaged changes after reset: M README.asc $ git reset --mixed -- Rakefile warning: --mixed with paths is deprecated; use 'git reset -- ' instead. Unstaged changes after reset: M README.asc M Rakefile $ that definitely seems inconsistent. rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: why doesn't "git reset" mention optional pathspec?
On Sat, 8 Dec 2018, Duy Nguyen wrote: > On Sat, Dec 8, 2018 at 6:32 PM Robert P. J. Day wrote: > > > > On Sat, 8 Dec 2018, Duy Nguyen wrote: > > > > > On Sat, Dec 8, 2018 at 5:08 PM Robert P. J. Day > > > wrote: > > > > > > > > > > > > from "man git-reset": > > > > > > > > SYNOPSIS > > > > git reset [-q] [] [--] ... > > > > git reset (--patch | -p) [] [--] [...] > > > > git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] > > > > [] > > > > > > > > oddly, the third form says nothing about possible "", even > > > > though i'm pretty sure they're valid in that third case (at least > > > > for "--mixed"). thoughts? is that just an oversight in the man > > > > page? > > > > > > --mixed prints a deprecation warning. I don't think it's worth > > > making the synopsis more complicated for that. All other modes > > > reject pathspec. > > > > i just tested this, and i don't see a deprecation warning. > > Hmm.. maybe I misread the code. I just tried it > > $ ./git reset --mixed HEAD foo > warning: --mixed with paths is deprecated; use 'git reset -- ' instead. weird ... i just tried this two ways, explicitly specifying "--mixed" and also without (which is the default mode, right?), and i got the deprecated message with the first but not the second. that seems ... odd. rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: why doesn't "git reset" mention optional pathspec?
On Sat, Dec 8, 2018 at 6:32 PM Robert P. J. Day wrote: > > On Sat, 8 Dec 2018, Duy Nguyen wrote: > > > On Sat, Dec 8, 2018 at 5:08 PM Robert P. J. Day > > wrote: > > > > > > > > > from "man git-reset": > > > > > > SYNOPSIS > > > git reset [-q] [] [--] ... > > > git reset (--patch | -p) [] [--] [...] > > > git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] > > > [] > > > > > > oddly, the third form says nothing about possible "", even > > > though i'm pretty sure they're valid in that third case (at least > > > for "--mixed"). thoughts? is that just an oversight in the man > > > page? > > > > --mixed prints a deprecation warning. I don't think it's worth > > making the synopsis more complicated for that. All other modes > > reject pathspec. > > i just tested this, and i don't see a deprecation warning. Hmm.. maybe I misread the code. I just tried it $ ./git reset --mixed HEAD foo warning: --mixed with paths is deprecated; use 'git reset -- ' instead. -- Duy
Re: why doesn't "git reset" mention optional pathspec?
On Sat, 8 Dec 2018, Duy Nguyen wrote: > On Sat, Dec 8, 2018 at 5:08 PM Robert P. J. Day wrote: > > > > > > from "man git-reset": > > > > SYNOPSIS > > git reset [-q] [] [--] ... > > git reset (--patch | -p) [] [--] [...] > > git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] > > [] > > > > oddly, the third form says nothing about possible "", even > > though i'm pretty sure they're valid in that third case (at least > > for "--mixed"). thoughts? is that just an oversight in the man > > page? > > --mixed prints a deprecation warning. I don't think it's worth > making the synopsis more complicated for that. All other modes > reject pathspec. i just tested this, and i don't see a deprecation warning. rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: why doesn't "git reset" mention optional pathspec?
On Sat, Dec 8, 2018 at 5:08 PM Robert P. J. Day wrote: > > > from "man git-reset": > > SYNOPSIS > git reset [-q] [] [--] ... > git reset (--patch | -p) [] [--] [...] > git reset [--soft | --mixed [-N] | --hard | --merge | --keep] [-q] > [] > > oddly, the third form says nothing about possible "", even > though i'm pretty sure they're valid in that third case (at least for > "--mixed"). thoughts? is that just an oversight in the man page? --mixed prints a deprecation warning. I don't think it's worth making the synopsis more complicated for that. All other modes reject pathspec. -- Duy
Re: [PATCH v3 1/1] git clone C:\cygwin\home\USER\repo' is working (again)
On Sat, Dec 8, 2018 at 9:11 AM wrote: > Changes since V2: latest patch still fixes original issue - thanks > - Settled on a better name: > The common code is in compat/win32/path-utils.c/h > [...] > - The "DOS" moniker is still used for 2 reasons: > Windows inherited the "drive letter" concept from DOS, > and everybody (tm) familar with the code and the path handling > in Git is used to that wording. > Even if there was a better name, it needed to be addressed > in a patch series different from this one. > Here I want to fix a reported regression. i still disagree with this - but i understand if naming argument is out of scope for thread > And, before any cleanup is done, I sould like to ask if anybody > can build the code with VS and confirm that it works, please ? sorry but i am decidedly *not* interested in doing this. i use cygwin specifically so that i can avoid VS. hopefully someone else will be able to test. cheers
Re: "git add -p" versus "git add -i", followed by "p"
On Sun, 2 Dec 2018, Duy Nguyen wrote: > On Sun, Dec 2, 2018 at 6:05 PM Robert P. J. Day wrote: > > > Patch update>> 2 > > > staged unstaged path > > > * 1:unchanged+1/-0 README.md > > > * 2:unchanged+1/-0 contrib/README > > > 3:unchanged+1/-0 t/README > > > Patch update>> > > > > > > Here I hit enter. Did you? > > > > perhaps i'm just not seeing it, but from "man git-add", it > > doesn't seem obvious that you would first select the files to work > > with, then hit a simple CR to get into actual patch mode. > > I think it's the same procedure as the "update" step, which > describes this in more detail. I agree that the "patch" section does > not make this obvious. when i get a chance, i'll try to reword it. rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: [PATCH] mailmap: update brandon williams's email address
On Sat, Dec 08 2018, Junio C Hamano wrote: > Stefan Beller writes: > >> On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder wrote: >>> >>> Brandon Williams wrote: >>> >>> > Signed-off-by: Brandon Williams >>> > --- >>> > .mailmap | 1 + >>> > 1 file changed, 1 insertion(+) >>> >>> I can confirm that this is indeed the same person. >> >> What would be more of interest is why we'd be interested in this patch >> as there is no commit/patch sent by Brandon with this email in gits history. > > Once I "git am" the message that began this thread, there will be a > commit under this new ident, so that would be somewhat a moot point. "Get to the top of 'git shortlog -sn' with this one easy trick" :) (The patch makes sense, good to see you back on-list Brandon)
Re: [PATCH] mailmap: update brandon williams's email address
On Sat, Dec 8, 2018 at 7:09 AM Junio C Hamano wrote: > If this were "Jonathan asked Brandon if we want to record an address > we can reach him in our .mailmap file and sent a patch to add one", Not sure about Jonathan, but I did. > then the story is different, and I tend to agree with you that such > a patch is more or less pointless. That's not the purpose of the > mailmap file. Not directly, but when multiple commands use mailmap to show the canonical mail addresses, then it kinda is. When I look back at an old commit message, I may look up the author name and address, which is mapped. > Not until git-send-email learns to use that file to rewrite > To/cc/etc to the "canonical" addresses, anyway ;-) git-send-email does not have to when I copy/paste the address from git-log anyway. > I am not sure if there are people whose "canonical" address to be > used as the author is not necessarily the best address they want to > get their e-mails at, though. If we can be reasonably sure that the > set of such people is empty, then people can take the above mention > about send-email as a hint about a low-hanging fruit ;-) > > Thanks. > > -- Duy
Re: Retrieving a file in git that was deleted and committed
On Fri, Dec 07, 2018 at 01:50:57PM -0800, biswaranjan panda wrote: > Thanks Jeff and Bryan! However, I am curious that if there were a way > to tell git blame to skip a commit (the one which added the file again > and maybe the one which deleted it originally) while it walks back > through history, then it should just get back the > entire history right ? Not easily. ;) You can feed a set of revisions to git-blame with the "-S" option, but I don't offhand know how it handles diffs (I think it would have to still diff each commit against its parent, since history is non-linear, and a list is inherently linear). You might want to experiment with that. Other than that, you can play with git-replace to produce a fake history, as if the deletion never happened. But note that will affect all commands, not just one particular blame. It might be a neat way to play with blame, but I doubt I'd leave the replacement in place in the long term. -Peff
Re: Difficulty with parsing colorized diff output
On Fri, Dec 07, 2018 at 07:09:58PM -0500, George King wrote: > My goal is to detect SGR color sequences, e.g. '\x1b[32m', that exist > in the source text, and have my highlighter print escaped > representations of those. For example, I have checked in files that > are expected test outputs for tools that emit color codes, and diffs > of those get very confusing. > > Figuring out which color codes are from the source text and which were > added by git is proving very difficult. The obvious solution is to > turn git diff coloring off, but as far as I can see this also turns > off all coloring for logs, which is undesirable. Another gotcha that I don't think you've hit yet: whitespace highlighting can color arbitrary parts of the line. Try this, for example: printf 'foo\n' >old printf '\t \tfoo\n' >new git diff --no-index old new So I'm not sure what you want to do can ever be completely robust. That said, I'm not opposed to making Git's color output more regular. It seems like a reasonable cleanup on its own. (For the Git project itself, we long ago realized that putting raw color codes into the source is a big pain when working with diffs, and we instead use tools like t/test-lib-functions.sh's test_decode_color). > * Context lines do not begin with reset code, but do end with a reset > code. It would be preferable in my opinion if they had both (like > every other line), or none at all. Context lines do have both. It's just that the default color for context lines is empty. ;) But yes, I think it might be reasonable to recognize when an opening color is empty, and turn the closing reset into a noop in that case (or I guess you are also advocating the opposite: turn an empty color into a noop \x1b[m or similar). I think most of the coloring, including context lines, is coming from diff.c:emit_diff_symbol_from_struct(). Instead of unconditionally calling: context = diff_get_color_opt(o, DIFF_CONTEXT); reset = diff_get_color_opt(o, DIFF_RESET); I suspect we could have a helper like this: static const char *diff_get_reset(const char *color) { if (!color || !*color) return ""; return diff_colors[DIFF_RESET]; } ... context = diff_get_color_opt(o, DIFF_CONTEXT); reset = diff_get_reset(context); > * Added lines have excess codes after the plus sign. The entire prefix > is, `\x1b[32m+\x1b[m\x1b[32m` translating to GREEN PLUS RESET GREEN. > Emitting codes after the plus sign makes the parsing more complex and > idiosyncratic. Yeah, I've run into this one, too (when working on diff-highlight, which similarly tries to pass-through Git's existing colors). It's unfortunately not quite trivial, due to some interactions with whitespace coloring. See this old patch: https://public-inbox.org/git/20101109220136.ga5...@sigill.intra.peff.net/ and the followup: https://public-inbox.org/git/20101118064050.ga12...@sigill.intra.peff.net/ > * Add a config feature to turn on log coloring while leaving diff coloring > off. Yes, the fact that --pretty log coloring is tied to diffs is mostly a historical artifact. I think it would be OK to introduce a color.log config option that defaults to the value of color.diff if unset. That would be backwards compatible, but allow you to set them independently if you wanted to. -Peff
Re: [PATCH 4/4] submodule deinit: unset core.worktree
Stefan Beller writes: > This re-introduces 984cd77ddb (submodule deinit: unset core.worktree, > 2018-06-18), which was reverted as part of f178c13fda (Revert "Merge > branch 'sb/submodule-core-worktree'", 2018-09-07) > > The whole series was reverted as the offending commit e98317508c > (submodule: ensure core.worktree is set after update, 2018-06-18) > was relied on by other commits such as 984cd77ddb. > > Keep the offending commit reverted, but its functionality came back via > 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17), such > that we can reintroduce 984cd77ddb now. None of the above three explains the most important thing directly, so readers fail to grasp what the main theme of the three-patch series is, without looking at the commits that were reverted already. Is the theme of the overall series to make sure core.worktree is set to point at the working tree when submodule's working tree is instantiated, and unset it when it is not? 2/4 was also explained (in the original) that it wants to unset and did so when "move_head" gets called. This one does the unset when a submodule is deinited. Are these the only two cases a submodule loses its working tree? If so, the log message for this step should declare victory by ending with something like ... as we covered the only other case in which a submodule loses its working tree in the earlier step (i.e. switching branches of top-level project to move to a commit that did not have the submodule), this makes the code always maintain core.worktree correctly unset when there is no working tree for a submodule. Thanks. I think I agree with what the series wants to do (if I read what it wants to do correctly, that is). > Signed-off-by: Stefan Beller > --- > builtin/submodule--helper.c | 2 ++ > t/lib-submodule-update.sh | 2 +- > t/t7400-submodule-basic.sh | 5 + > 3 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index 31ac30cf2f..672b74db89 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -1131,6 +1131,8 @@ static void deinit_submodule(const char *path, const > char *prefix, > if (!(flags & OPT_QUIET)) > printf(format, displaypath); > > + submodule_unset_core_worktree(sub); > + > strbuf_release(_rm); > } > > diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh > index 51d449..5b56b23166 100755 > --- a/t/lib-submodule-update.sh > +++ b/t/lib-submodule-update.sh > @@ -235,7 +235,7 @@ reset_work_tree_to_interested () { > then > mkdir -p submodule_update/.git/modules/sub1/modules && > cp -r submodule_update_repo/.git/modules/sub1/modules/sub2 > submodule_update/.git/modules/sub1/modules/sub2 > - GIT_WORK_TREE=. git -C > submodule_update/.git/modules/sub1/modules/sub2 config --unset core.worktree > + # core.worktree is unset for sub2 as it is not checked out > fi && > # indicate we are interested in the submodule: > git -C submodule_update config submodule.sub1.url "bogus" && > diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh > index 76a7cb0af7..aba2d4d6ee 100755 > --- a/t/t7400-submodule-basic.sh > +++ b/t/t7400-submodule-basic.sh > @@ -984,6 +984,11 @@ test_expect_success 'submodule deinit should remove the > whole submodule section > rmdir init > ' > > +test_expect_success 'submodule deinit should unset core.worktree' ' > + test_path_is_file .git/modules/example/config && > + test_must_fail git config -f .git/modules/example/config core.worktree > +' > + > test_expect_success 'submodule deinit from subdirectory' ' > git submodule update --init && > git config submodule.example.foo bar &&
Re: [PATCH 3/4] submodule--helper: fix BUG message in ensure_core_worktree
Stefan Beller writes: > Shortly after f178c13fda (Revert "Merge branch > 'sb/submodule-core-worktree'", 2018-09-07), we had another series > that implemented partially the same, ensuring that core.worktree was > set in a checked out submodule, namely 74d4731da1 (submodule--helper: > replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13) > > As the series 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', > 2018-09-17) has different goals than the reverted series 7e25437d35 > (Merge branch 'sb/submodule-core-worktree', 2018-07-18), I'd wanted to > replay the series on top of it to reach the goal of having `core.worktree` > correctly set when the submodules worktree is present, and unset when the > worktree is not present. > > The replay resulted in a strange merge conflict highlighting that > the BUG message was not changed in 74d4731da1 (submodule--helper: > replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13). > > Fix the error message. > > Signed-off-by: Stefan Beller > --- Unlike the step 2/4 I commented on, this does explain what this wants to do and why, at least when looked from sideways. Is the above saying the same as the following two-liner? An ealier mistake while rebasing to produce 74d4731da1 failed to update this BUG message. Fix this. > builtin/submodule--helper.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index d38113a31a..31ac30cf2f 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -2045,7 +2045,7 @@ static int ensure_core_worktree(int argc, const char > **argv, const char *prefix) > struct repository subrepo; > > if (argc != 2) > - BUG("submodule--helper connect-gitdir-workingtree > "); > + BUG("submodule--helper ensure-core-worktree "); > > path = argv[1];
Re: [PATCH] mailmap: update brandon williams's email address
On Fri, Dec 7, 2018 at 10:08 PM Junio C Hamano wrote: > > Stefan Beller writes: > > > On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder wrote: > >> > >> Brandon Williams wrote: > >> > >> > Signed-off-by: Brandon Williams > >> > --- > >> > .mailmap | 1 + > >> > 1 file changed, 1 insertion(+) > >> > >> I can confirm that this is indeed the same person. > > > > What would be more of interest is why we'd be interested in this patch > > as there is no commit/patch sent by Brandon with this email in gits history. > > Once I "git am" the message that began this thread, there will be a > commit under this new ident, so that would be somewhat a moot point. > > If this were "Jonathan asked Brandon if we want to record an address > we can reach him in our .mailmap file and sent a patch to add one", > then the story is different, and I tend to agree with you that such > a patch is more or less pointless. That's not the purpose of the > mailmap file. > Turns out this is exactly the reason :) I've had a couple of people reach out to me asking me to do this because CCing my old email bounces and they've wanted my input/comments on something related to work I've done. If that's not the intended purpose then please ignore this patch > Not until git-send-email learns to use that file to rewrite > To/cc/etc to the "canonical" addresses, anyway ;-) > > I am not sure if there are people whose "canonical" address to be > used as the author is not necessarily the best address they want to > get their e-mails at, though. If we can be reasonably sure that the > set of such people is empty, then people can take the above mention > about send-email as a hint about a low-hanging fruit ;-) > > Thanks. > >
Re: [PATCH 2/4] submodule: unset core.worktree if no working tree is present
Stefan Beller writes: > This reintroduces 4fa4f90ccd (submodule: unset core.worktree if no working > tree is present, 2018-06-12), which was reverted as part of f178c13fda > (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07). > > 4fa4f90ccd was reverted as its followup commit was faulty, but without > the accompanying change of the followup, we'd have an incomplete workflow > of setting `core.worktree` again, when it is needed such as checking out > a revision that contains a submodule. > > So re-introduce that commit as-is, focusing on fixing up the followup I was hoping to hear (given what 0/4 claimed) a clearer explanation of what this change wants to achieve, but that is lacking. No need to grumble about an earlier work was that turned out to be inappropriate for the codebase back then. Repeatedly saying "this is needed" without giving further explaining why it is so, or anything like that, would help readers. Just pretend that the ealier commits and their reversion never happened, and further pretend that we are doing the best thing that should happen to our codebase. How would we explain this change, what the problem it tries to solve and what the solution looks like in the larger picture? When removing a working tree of a submodule (e.g. we may be switching back to an earlier commit in the superproject that did not have the submodule in question yet), we failed to unset core.worktree of the submodule's repository. That caused this and that issues, exhibited by a few new tests this commit adds. Make sure that core.worktree gets unset so that a leftover setting won't cause these issues. or something like that? I am just guessing by looking at the old commit's text, as the above two paragraphs and one sentence does not say much. > diff --git a/submodule.c b/submodule.c > index 6415cc5580..d393e947e6 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1561,6 +1561,18 @@ int bad_to_remove_submodule(const char *path, unsigned > flags) > return ret; > } > > +void submodule_unset_core_worktree(const struct submodule *sub) > +{ > + char *config_path = xstrfmt("%s/modules/%s/config", > + get_git_common_dir(), sub->name); > + > + if (git_config_set_in_file_gently(config_path, "core.worktree", NULL)) > + warning(_("Could not unset core.worktree setting in submodule > '%s'"), > + sub->path); > + > + free(config_path); > +} > + > static const char *get_super_prefix_or_empty(void) > { > const char *s = get_super_prefix(); > @@ -1726,6 +1738,8 @@ int submodule_move_head(const char *path, > > if (is_empty_dir(path)) > rmdir_or_warn(path); > + > + submodule_unset_core_worktree(sub); > } > } > out: > diff --git a/submodule.h b/submodule.h > index a680214c01..9e18e9b807 100644 > --- a/submodule.h > +++ b/submodule.h > @@ -131,6 +131,8 @@ int submodule_move_head(const char *path, > const char *new_head, > unsigned flags); > > +void submodule_unset_core_worktree(const struct submodule *sub); > + > /* > * Prepare the "env_array" parameter of a "struct child_process" for > executing > * a submodule by clearing any repo-specific environment variables, but > diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh > index 016391723c..51d449 100755 > --- a/t/lib-submodule-update.sh > +++ b/t/lib-submodule-update.sh > @@ -709,7 +709,8 @@ test_submodule_recursing_with_args_common() { > git branch -t remove_sub1 origin/remove_sub1 && > $command remove_sub1 && > test_superproject_content origin/remove_sub1 && > - ! test -e sub1 > + ! test -e sub1 && > + test_must_fail git config -f .git/modules/sub1/config > core.worktree > ) > ' > # ... absorbing a .git directory along the way.
Re: [RFC PATCH 2/3] Documentation/git-rev-list: s///
Matthew DeVore writes: > $ git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" > Where observed.oids contains all the blobs that were missing. It tells > the remote that it already has the "refs/heads/master" commit, which > means it is excluded. Before, this worked fine, since it didn't mean > the blobs were excluded, only the commit itself. So 'master' has some blob in its tree, which is missing from the repository, in this test? Which means that such a repository is "corrupt" and does not pass connectivity check by fsck. I am of two minds. If we claim that we have everything that is needed to complete the commit sitting at the tip of 'master', I think it is correct for the other side not to send a blob that is in 'master' (or its ancestors), so your "fix" may (at least technically) be more correct than the status quo. On the other hand, if possession of commit 'master' does not defeat an explicit request for a blob in it, that would actually be a good thing---it would be a very straight-forward way to recover from such form of repository corruption. Fetching isolated objects without walking is also something that would help backfill a lazily created clone, and I even vaguely recall an effort to allow objects explicitly requested to be always fetched regardless of the connectivity, if I am not mistaken (JTan?) Anyway, thanks for a thoughtful response.
Re: [PATCH] mailmap: update brandon williams's email address
Stefan Beller writes: > On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder wrote: >> >> Brandon Williams wrote: >> >> > Signed-off-by: Brandon Williams >> > --- >> > .mailmap | 1 + >> > 1 file changed, 1 insertion(+) >> >> I can confirm that this is indeed the same person. > > What would be more of interest is why we'd be interested in this patch > as there is no commit/patch sent by Brandon with this email in gits history. Once I "git am" the message that began this thread, there will be a commit under this new ident, so that would be somewhat a moot point. If this were "Jonathan asked Brandon if we want to record an address we can reach him in our .mailmap file and sent a patch to add one", then the story is different, and I tend to agree with you that such a patch is more or less pointless. That's not the purpose of the mailmap file. Not until git-send-email learns to use that file to rewrite To/cc/etc to the "canonical" addresses, anyway ;-) I am not sure if there are people whose "canonical" address to be used as the author is not necessarily the best address they want to get their e-mails at, though. If we can be reasonably sure that the set of such people is empty, then people can take the above mention about send-email as a hint about a low-hanging fruit ;-) Thanks.
Re: RFE: git-patch-id should handle patches without leading "diff"
Jonathan Nieder writes: >> So it seems most sensible to me if this is going to be supported that we >> go a bit beyond the call of duty and fake up the start of it, namely: >> >> --- a/arch/x86/kernel/process.c >> +++ b/arch/x86/kernel/process.c >> >> To be: >> >> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >> --- a/arch/x86/kernel/process.c >> +++ b/arch/x86/kernel/process.c > > Right. We may want to handle diff.mnemonicPrefix as well. I definitely think under the --stable option, we should pretend as if the canonical a/ vs b/ prefixes were given with the "diff --git" header, just like we try to reverse the effect of diff-orderfile, etc. I am unsure what the right behaviour under --unstable is, though.
Re: [PATCH 0/4]
Stefan Beller writes: > A couple days before the 2.19 release we had a bug report about > broken submodules[1] and reverted[2] the commits leading up to them. > > The behavior of said bug fixed itself by taking a different approach[3], > specifically by a weaker enforcement of having `core.worktree` set in a > submodule [4]. > > The revert [2] was overly broad as we neared the release, such that we wanted > to rather keep the known buggy behavior of always having `core.worktree` set, > rather than figuring out how to fix the new bug of having 'git submodule > update' > not working in old style repository setups. > > This series re-introduces those reverted patches, with no changes in code, > but with drastically changed commit messages, as those focus on why it is safe > to re-introduce them instead of explaining the desire for the change. The above was a bit too cryptic for me to grok, so let me try rephrasing to see if I got them all correctly. - three-patch series leading to 984cd77ddb were meant to fix some bug, but the series itself was buggy and caused problems; we got rid of them - the problem 984cd77ddb wanted to fix was fixed differently without reintroducing the problem three-patch series introduced. That fix is already with us since 4d6d6ef1fc. - now these three changes that were problematic in the past is resent without any update (other than that it has one preparatory patch to add tests). Is that what is going on? Obviously I am not getting "the other" benefit we wanted to gain out of these three patches (because the above description fails to explain what that is), other than to fix the issue that was fixed by 4d6d6ef1fc. Sorry for being puzzled... > [1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight > [2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", > 2018-09-07) > [3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17) > [4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by > ensure-core-worktree, 2018-08-13) > > Stefan Beller (4): > submodule update: add regression test with old style setups > submodule: unset core.worktree if no working tree is present > submodule--helper: fix BUG message in ensure_core_worktree > submodule deinit: unset core.worktree > > builtin/submodule--helper.c| 4 +++- > submodule.c| 14 ++ > submodule.h| 2 ++ > t/lib-submodule-update.sh | 5 +++-- > t/t7400-submodule-basic.sh | 5 + > t/t7412-submodule-absorbgitdirs.sh | 7 ++- > 6 files changed, 33 insertions(+), 4 deletions(-)
Re: [PATCH] commit: abort before commit-msg if empty message
Jonathan Tan writes: > When a user runs "git commit" without specifying a message, an editor > appears with advice: > > Please enter the commit message for your changes. Lines starting > with '#' will be ignored, and an empty message aborts the commit. > > However, if the user supplies an empty message and has a commit-msg hook > which updates the message to be non-empty, the commit proceeds to occur, > despite what the advice states. When "--no-edit" is given, and when commit-msg fills that blank, the command should go ahead and record the commit, I think. An automation where commit-msg is used to produce whatever appropriate message for the automation is entirely a reasonable thing to arrange. Of course, you can move the logic to produce an appropriate message for the automation from commit-msg to the script that drives the "git commit" and use the output of that logic as the value for the "-m" option to achieve the same, so in that sense, there is an escape hatch even if you suddenly start to forbid such a working set-up, but it nevertheless is unnecessary busywork for those with such a set-up to adjust to this change. I actually think in this partcular case, the commit-msg hook that adds Change-ID to an empty message is BUGGY. If the hook looked at the message contents and refrains from making an otherwise empty message to non-empty, there is no need for any change here. In any case, you'll have plenty of time to make your case after the rc freeze. I am not so sympathetic to a patch that makes us bend backwards to support such a buggy hook to e honest.
Re: [wishlist] git submodule update --reset-hard
On Fri, 07 Dec 2018, Yaroslav Halchenko wrote: > On Fri, 07 Dec 2018, Stefan Beller wrote: > > > the initial "git submodule update --reset-hard" is pretty much a > > > crude workaround for some of those cases, so I would just go earlier in > > > the history, and redo some things, whenever I could just drop or revert > > > some selected set of commits. > > That makes sense. > > Do you want to give the implementation a try for the --reset-hard switch? > ok, will do, thanks for the blessing ;-) The patch is attached (please advise if should be done differently) and also submitted as PR https://github.com/git/git/pull/563 I guess it would need more tests. Took me some time to figure out why I was getting fatal: bad value for update parameter after all my changes to the git-submodule.sh script after looking at an example commit 42b491786260eb17d97ea9fb1c4b70075bca9523 which introduced --merge to the update ;-) -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From 170296dc661b4bc3d942917ce27288df52ff650d Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Fri, 7 Dec 2018 21:28:29 -0500 Subject: [PATCH] submodule: Add --reset-hard option for git submodule update This patch adds a --reset-hard option for the update command to hard reset submodule(s) to the gitlink for the corresponding submodule in the superproject. This feature is desired e.g. to be able to discard recent changes in the entire hierarchy of the submodules after running git reset --hard PREVIOUS_STATE in the superproject which leaves submodules in their original state, and git reset --hard --recurse-submodules PREVIOUS_STATE would result in the submodules being checked into detached HEADs. As in the original git reset --hard no checks or any kind of safe-guards against jumping into some state which was never on the current branch is done. must_die_on_failure is not set to yes to mimic behavior of a update --checkout strategy, which would leave user with a non-clean state immediately apparent via git status so an informed decision/actions could be done manually. Signed-off-by: Yaroslav Halchenko --- Documentation/git-submodule.txt | 12 +++- Documentation/gitmodules.txt| 4 ++-- builtin/submodule--helper.c | 3 ++- git-submodule.sh| 10 +- submodule.c | 4 submodule.h | 1 + t/t7406-submodule-update.sh | 17 - 7 files changed, 45 insertions(+), 6 deletions(-) diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt index ba3c4df550..f90a42d265 100644 --- a/Documentation/git-submodule.txt +++ b/Documentation/git-submodule.txt @@ -124,7 +124,7 @@ If you really want to remove a submodule from the repository and commit that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal options. -update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference ] [--depth ] [--recursive] [--jobs ] [--] [...]:: +update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge|--reset-hard] [--reference ] [--depth ] [--recursive] [--jobs ] [--] [...]:: + -- Update the registered submodules to match what the superproject @@ -358,6 +358,16 @@ the submodule itself. If the key `submodule.$name.update` is set to `rebase`, this option is implicit. +--reset-hard:: + This option is only valid for the update command. + Hard reset current state to the commit recorded in the superproject. +If this option is given, the submodule's HEAD will not get detached +if it was not detached before. Note that, like with a regular +git reset --hard no safe-guards are in place to prevent jumping +to a commit which was never part of the current branch. + If the key `submodule.$name.update` is set to `reset-hard`, this + option is implicit. + --init:: This option is only valid for the update command. Initialize all submodules for which "git submodule init" has not been diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt index 312b6f9259..e085dbc01f 100644 --- a/Documentation/gitmodules.txt +++ b/Documentation/gitmodules.txt @@ -43,8 +43,8 @@ submodule..update:: command in the superproject. This is only used by `git submodule init` to initialize the configuration variable of the same name. Allowed values here are 'checkout', 'rebase', - 'merge' or 'none'. See description of 'update' command in - linkgit:git-submodule[1] for their meaning. Note that the + 'merge', 'reset-hard' or 'none'. See description of 'update' command + in linkgit:git-submodule[1] for their meaning. Note that the '!command' form
Re: bug: git fetch reports too many unreachable loose objects
On Fri, Dec 7, 2018 at 6:14 PM Josh Wolfe wrote: > > git version 2.19.1 > steps to reproduce: > > # start in a brand new repo > git init > > # create lots of unreachable loose objects > for i in {1..1}; do git commit-tree -m "$(head -c 12 /dev/urandom > | base64)" "$(git mktree <&-)" <&-; done > # this prints a lot of output and takes a minute or so to run > > # trigger git gc to run in the background > git fetch > # Auto packing the repository in background for optimum performance. > # See "git help gc" for manual housekeeping. > > # trigger it again > git fetch > # Auto packing the repository in background for optimum performance. > # See "git help gc" for manual housekeeping. > # error: The last gc run reported the following. Please correct the root cause > # and remove .git/gc.log. > # Automatic cleanup will not be performed until the file is removed. > # > # warning: There are too many unreachable loose objects; run 'git > prune' to remove them. > > # to manually fix this, run git prune: > git prune > > # note that `git gc` does not fix the problem, and appears to do > nothing in this situation: > git gc > > > According to the `git fetch` output, the `git help gc` docs, and the > `git help prune` docs, I don't think I shouldn't ever have to run `git > prune` manually, so this behavior seems like a bug to me. Please > correct me if this is expected behavior. Known bug, there are a variety of other ways to trigger it too. See the threads here for more info: https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/ https://public-inbox.org/git/20180716172717.237373-1-jonathanta...@google.com/ There are probably other threads as well.
Re: [wishlist] git submodule update --reset-hard
On Fri, 07 Dec 2018, Stefan Beller wrote: > > the initial "git submodule update --reset-hard" is pretty much a > > crude workaround for some of those cases, so I would just go earlier in > > the history, and redo some things, whenever I could just drop or revert > > some selected set of commits. > That makes sense. > Do you want to give the implementation a try for the --reset-hard switch? ok, will do, thanks for the blessing ;-) -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
Re: [PATCH v2 1/3] git clone C:\cygwin\home\USER\repo' is working (again)
On Fri, Dec 7, 2018 at 11:04 AM wrote: > The solution is to implement has_dos_drive_prefix(), skip_dos_drive_prefix() > is_dir_sep(), offset_1st_component() and convert_slashes() for cygwin > in the same way as it is done in 'Git for Windows' in compat/mingw.[ch] > > Instead of duplicating the code, it is extracted into compat/mingw-cygwin.[ch] > Some need for refactoring and cleanup came up in the review, they are adressed > in a seperate commit. i have applied the 3 patches against current master, and my original test passes, so looks good to me. however like Johannes i take issue with the naming. as he said "mingw-cygwin" really isnt appropriate. ideally it would be "windows.h", but as that is conspicuously in use, something like these: - pc-windows - pc-win - win i disagree with him on using "win32" - that doesnt really make sense, as obviously you can compile 64-bit Git for Windows. if you wanted to go that route you would want to use something like: - windows-api - win-api - winapi further - i disagree with the "DOS" moniker being used at all. DOS is a defunkt operating system that i dont think Git has *ever* supported, so it doesnt make sense to be referring to it this way. again, a more approriate name would be something like "win_drive_prefix".
Re: [PATCH] docs/gitweb.conf: config variable typo
On Fri, Dec 7, 2018 at 7:45 PM Jonathan Nieder wrote: > > > Thanks for fixing it. May we forge your sign-off? Yes please, guess I didn't read far enough down that document. My apologies. Consider the previous patch email: Signed-off-by: FeRD (Frank Dana)
Re: [PATCH] docs/gitweb.conf: config variable typo
Hi, Frank Dana wrote: > The documentation for the feature 'snapshot' claimed > "This feature can be configured on a per-repository basis via > repository's `gitweb.blame` configuration variable" > > Fixed to specify `gitweb.snapshot` as the variable name. > --- > Documentation/gitweb.conf.txt | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks for fixing it. May we forge your sign-off? See Documentation/SubmittingPatches section "Certify your work" for what this means. Sincerely, Jonathan
Re: [PATCH] commit: abort before commit-msg if empty message
Hi, Jonathan Tan wrote: > (The implementation in this commit reads the commit message twice even > if there is no commit-msg hook. I think that this is fine, since this is > for interactive use - an alternative would be to plumb information about > the absence of the hook from run_hook_ve() upwards, which seems more > complicated.) Sounds like a good followup for an interested person to do later. Can you include a NEEDSWORK comment describing this? > Signed-off-by: Jonathan Tan > --- > This was noticed with the commit-msg hook that comes with Gerrit, which > basically just calls git interpret-trailers to add a Change-Id trailer. Thanks for fixing it. This kind of context tends to be very useful when looking back at a commit later, so I'd be happy to see it in the commit message too. [...] > --- a/builtin/commit.c > +++ b/builtin/commit.c > @@ -652,6 +652,21 @@ static void adjust_comment_line_char(const struct strbuf > *sb) > comment_line_char = *p; > } > > +static void read_and_clean_commit_message(struct strbuf *sb) > +{ > + if (strbuf_read_file(sb, git_path_commit_editmsg(), 0) < 0) { > + int saved_errno = errno; > + rollback_index_files(); > + die(_("could not read commit message: %s"), > strerror(saved_errno)); Long line. More importantly, I wonder if this can use die_errno. rollback_index_files calls delete_tempfile, which can clobber errno, so that would require restoring errno either here or in that function: diff --git i/lockfile.h w/lockfile.h index 35403ccc0d..d592f384e7 100644 --- i/lockfile.h +++ w/lockfile.h @@ -298,10 +298,14 @@ static inline int commit_lock_file_to(struct lock_file *lk, const char *path) * remove the lockfile. It is a NOOP to call `rollback_lock_file()` * for a `lock_file` object that has already been committed or rolled * back. + * + * Saves and restores errno for more convenient use during error handling. */ static inline void rollback_lock_file(struct lock_file *lk) { + int save_errno = errno; delete_tempfile(>tempfile); + errno = save_errno; } #endif /* LOCKFILE_H */ It doesn't make the code more obvious so what you have is probably better. Also, on second glance, this is just moved code. Still hopefully some of the above is amusing (and rewrapping would be fine to do during the move). [...] > @@ -970,6 +985,22 @@ static int prepare_to_commit(const char *index_file, > const char *prefix, > argv_array_clear(); > } > > + if (use_editor && !no_verify) { > + /* > + * Abort the commit if the user supplied an empty commit > + * message in the editor. (Because the commit-msg hook is to be > + * run, we need to check this now, since that hook may change > + * the commit message.) > + */ > + read_and_clean_commit_message(); > + if (message_is_empty(, cleanup_mode) && > !allow_empty_message) { > + rollback_index_files(); > + fprintf(stderr, _("Aborting commit due to empty commit > message.\n")); > + exit(1); What about the template_untouched case? Should the two call sites share code? [...] > + } > + strbuf_release(); > + } > + > if (!no_verify && > run_commit_hook(use_editor, index_file, "commit-msg", > git_path_commit_editmsg(), NULL)) { > return 0; This means we have a little duplication of code: first we check whether to abort here in prepare_to_commit, and then again after the hook is run in cmd_commit. Is there some other code structure that would make things more clear? cmd_commit also seems to be rather long --- is there some logical way to split it up that would make the code clearer (as a preparatory or followup patch)? In cmd_commit, we spend a while (in number of lines, not actual running time) to determine parents before deciding whether the user has chosen to abort by writing an empty message. Should we perform that check sooner, closer to the prepare_to_commit call? > @@ -1608,17 +1639,7 @@ int cmd_commit(int argc, const char **argv, const char > *prefix) > > /* Finally, get the commit message */ > strbuf_reset(); > - if (strbuf_read_file(, git_path_commit_editmsg(), 0) < 0) { > - int saved_errno = errno; > - rollback_index_files(); > - die(_("could not read commit message: %s"), > strerror(saved_errno)); > - } > - > - if (verbose || /* Truncate the message just before the diff, if any. */ > - cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) > - strbuf_setlen(, wt_status_locate_end(sb.buf, sb.len)); > - if (cleanup_mode != COMMIT_MSG_CLEANUP_NONE) > - strbuf_stripspace(, cleanup_mode == COMMIT_MSG_CLEANUP_ALL); > + read_and_clean_commit_message(); > > if (message_is_empty(, cleanup_mode) &&
Re: [RFC PATCH 2/3] Documentation/git-rev-list: s///
On Sun, Oct 21, 2018 at 7:21 PM Junio C Hamano wrote: > > Matthew DeVore writes: > > >> It is more like "this is a set operation across commits. We also > >> show objects that are reachable from the commits in the resulting > >> set and are not reachable from the commits in the set that were > >> excluded when --objects option is given". > >> > > That would be correct though it wouldn't tell that you can use > > "--objects ^foo-tree bar-tree." > > Yeah, but quite honestly, I consider that it is working by accident, > not by design, in the current code (iow, you are looking at a > behaviour of whatever the code happens to do). "rev-list" is pretty > much set operation across commits, and anything that deals with a > non commit-ish given from the command line is an afterthought at > best, and happenstance in reality. > > I do not mean to say that the code must stay that way, though. I tried fixing the issue of "--objects ^commitobj treeobj" not properly excluding objects reachable from commitobj, but this ended up causing t5616-partial-clone.sh to fail. In the test labeled "manual prefetch of missing objects", we create a clone of srv.bare without blobs called "pc1", then push some new commits to srv.bare (via a separate "local" repo), and try to fetch missing blobs with this command: $ git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare"
Re: enhancement: support for author.email and author.name in "git config"
William Hubbs wrote: > On Thu, Dec 06, 2018 at 11:20:07PM +0100, Johannes Schindelin wrote: >> There *is* a way to get what you want that is super easy and will >> definitely work: if you sit down and do it ;-) >> >> Please let us know if you need any additional information before you can >> start. > > Which branch should I work off of in the repo? "master". Also, please make sure the documentation (e.g. in Documentation/config/user.txt) describes when a user would want to set this option. See also: - Documentation/SubmittingPatches - the DISCUSSION section of "git help format-patch" - [1] - https://www.kernel.org/pub/software/scm/git/docs/user-manual.html#hacking-git Thanks, Jonathan [1] https://github.com/gitgitgadget/gitgitgadget#a-bot-to-serve-as-glue-between-github-pull-requests-and-the-git-mailing-list
Re: RFE: git-patch-id should handle patches without leading "diff"
Ævar Arnfjörð Bjarmason wrote: > On Fri, Dec 07 2018, Jonathan Nieder wrote: >> The patch-id appears to only care about the diff text, so it should be >> able to handle this. So if we have a better heuristic for where the >> diff starts, it would be good to use it. > > No, the patch-id doesn't just care about the diff, it cares about the > context before the diff too. Sorry, I did a bad job of communicating. When I said "diff text", I was including context. [...] > Observe that the diff --git line matters, we hash it: > > $ git diff-tree -p HEAD~.. | git patch-id > 5870d115b7e2a9a936ab8fdc254932234413c710 > > $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id > --stable > 5870d115b7e2a9a936ab8fdc254932234413c710 > > $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id > --stable > 4cd136f2b98760150f700ac6a5b126389d6d05a7 > Oh, hm. That's unfortunate. [...] > So it seems most sensible to me if this is going to be supported that we > go a bit beyond the call of duty and fake up the start of it, namely: > > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > > To be: > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c Right. We may want to handle diff.mnemonicPrefix as well. Jonathan
Re: RFE: git-patch-id should handle patches without leading "diff"
On Fri, Dec 07 2018, Jonathan Nieder wrote: > Hi, > > Konstantin Ryabitsev wrote: > >> Every now and again I come across a patch sent to LKML without a leading >> "diff a/foo b/foo" -- usually produced by quilt. E.g.: >> >> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/ >> >> I am guessing quilt does not bother including the leading "diff a/foo >> b/foo" because it's redundant with the next two lines, however this >> remains a valid patch recognized by git-am. >> >> If you pipe that patch via git-patch-id, it produces nothing, but if I >> put in the leading "diff", like so: >> >> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >> >> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". > > Interesting. As Ævar mentioned, the relevant code is > > /* Ignore commit comments */ > if (!patchlen && !starts_with(line, "diff ")) > continue; > > which is trying to handle a case where a line that is special to the > parser appears before the diff begins. > > The patch-id appears to only care about the diff text, so it should be > able to handle this. So if we have a better heuristic for where the > diff starts, it would be good to use it. No, the patch-id doesn't just care about the diff, it cares about the context before the diff too. See this patch: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. diff --git x/refs/files-backend.c y/refs/files-backend.c index 9183875dad..dd8abe9185 100644 --- x/refs/files-backend.c +++ y/refs/files-backend.c @@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs, break; case REF_TYPE_OTHER_PSEUDOREF: case REF_TYPE_MAIN_PSEUDOREF: - return files_reflog_path_other_worktrees(refs, sb, refname); + files_reflog_path_other_worktrees(refs, sb, refname); + break; case REF_TYPE_NORMAL: strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname); break; Observe that the diff --git line matters, we hash it: $ git diff-tree -p HEAD~.. | git patch-id 5870d115b7e2a9a936ab8fdc254932234413c710 $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable 5870d115b7e2a9a936ab8fdc254932234413c710 $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable 4cd136f2b98760150f700ac6a5b126389d6d05a7 The thing it doesn't care about is the "index" between the "diff" and patch: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable 4cd136f2b98760150f700ac6a5b126389d6d05a7 We also care about the +++ and --- lines: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id 56985c2c38cce6079de2690082e1770a8e81214c Then we normalize the @@ line, e.g.: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id 4cd136f2b98760150f700ac6a5b126389d6d05a7 $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id 4cd136f2b98760150f700ac6a5b126389d6d05a7 There's other caveats (see the code, e.g. "strip space") but to a first approximation a patch id is a hash of something that looks like this: diff --git x/refs/files-backend.c y/refs/files-backend.c --- x/refs/files-backend.c +++ y/refs/files-backend.c @@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs, break; case REF_TYPE_OTHER_PSEUDOREF: case REF_TYPE_MAIN_PSEUDOREF: - return files_reflog_path_other_worktrees(refs, sb, refname); + files_reflog_path_other_worktrees(refs, sb, refname); + break; case REF_TYPE_NORMAL: strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname); break; Which means that accepting a patch like this as input would actually give you a different patch-id than if it had the proper header. So it seems most sensible to me if this is going to be supported that we go a bit beyond the call of duty and fake up the start of it, namely: --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c To be: diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c It'll make the state machine a bit more complex, but IMO it would suck more if
Re: [PATCH] mailmap: update brandon williams's email address
Hi, Stefan Beller wrote: > What would be more of interest is why we'd be interested in this patch > as there is no commit/patch sent by Brandon with this email in gits history. I think there's an implicit assumption in this question that isn't spelled out. Do I understand correctly that you're saying the main purpose of .mailmap is to figure out whether two commits are by the same author? My own uses of .mailmap primarily have a different purpose: to find out the preferred contact address for the author of a given commit. Thanks, Jonathan
Re: enhancement: support for author.email and author.name in "git config"
Hi Johannes, On Thu, Dec 06, 2018 at 11:20:07PM +0100, Johannes Schindelin wrote: > Hi William, > >[...] > > There *is* a way to get what you want that is super easy and will > definitely work: if you sit down and do it ;-) > > Please let us know if you need any additional information before you can > start. Which branch should I work off of in the repo? William
Re: [PATCH v2 3/3] Refactor mingw_cygwin_offset_1st_component()
Hi Torsten, On Fri, 7 Dec 2018, tbo...@web.de wrote: > diff --git a/compat/mingw-cygwin.c b/compat/mingw-cygwin.c > index 5552c3ac20..c379a72775 100644 > --- a/compat/mingw-cygwin.c > +++ b/compat/mingw-cygwin.c > @@ -10,10 +10,8 @@ size_t mingw_cygwin_skip_dos_drive_prefix(char **path) > size_t mingw_cygwin_offset_1st_component(const char *path) > { > char *pos = (char *)path; > - > - /* unc paths */ This comment is still useful (and now even more correct), and should stay. > - if (!skip_dos_drive_prefix() && > - is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { > + if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { > + /* unc path */ > /* skip server name */ > pos = strpbrk(pos + 2, "\\/"); > if (!pos) > @@ -22,7 +20,8 @@ size_t mingw_cygwin_offset_1st_component(const char *path) > do { > pos++; > } while (*pos && !is_dir_sep(*pos)); > + } else { > + skip_dos_drive_prefix(); > } > - Why remove this empty line? It structures the code quite nicely. The rest looks correct to me, Johannes > return pos + is_dir_sep(*pos) - path; > } > -- > 2.19.0.271.gfe8321ec05 > >
Re: [PATCH] mailmap: update brandon williams's email address
On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder wrote: > > Brandon Williams wrote: > > > Signed-off-by: Brandon Williams > > --- > > .mailmap | 1 + > > 1 file changed, 1 insertion(+) > > I can confirm that this is indeed the same person. What would be more of interest is why we'd be interested in this patch as there is no commit/patch sent by Brandon with this email in gits history. Is that so you get cc'd on your private address and can follow things you worked on without being subscribed to the mailing list? (I'd be interested to see the use case in the commit message;) Thanks, Stefan
Re: RFE: git-patch-id should handle patches without leading "diff"
Hi, Konstantin Ryabitsev wrote: > Every now and again I come across a patch sent to LKML without a leading > "diff a/foo b/foo" -- usually produced by quilt. E.g.: > > https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/ > > I am guessing quilt does not bother including the leading "diff a/foo > b/foo" because it's redundant with the next two lines, however this > remains a valid patch recognized by git-am. > > If you pipe that patch via git-patch-id, it produces nothing, but if I > put in the leading "diff", like so: > > diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". Interesting. As Ævar mentioned, the relevant code is /* Ignore commit comments */ if (!patchlen && !starts_with(line, "diff ")) continue; which is trying to handle a case where a line that is special to the parser appears before the diff begins. The patch-id appears to only care about the diff text, so it should be able to handle this. So if we have a better heuristic for where the diff starts, it would be good to use it. "git apply" uses apply.c::find_header, which is more permissive. Maybe it would be possible to unify these somehow. (I haven't looked closely enough to tell how painful that would be.) Thanks and hope that helps, Jonathan
Re: [wishlist] git submodule update --reset-hard
On Thu, Dec 6, 2018 at 5:23 PM Yaroslav Halchenko wrote: > > There was a proposal to "re-attach HEAD" in the submodule, i.e. > > if the branch branch points at the same commit, we don't need > > a detached HEAD, but could go with the branch instead. > > if I got the idea right, if we are talking about any branch, it > would also non-deterministic since who knows what left over branch(es) > point to that commit. Not sure if I would have used that ;) I would think we'd rather want to have it deterministic, i.e. something like 1) record branch name of the submodule 2) update submodules HEAD to to superprojects gitlink 3) if recorded branch (1) matches the sha1 of detached HEAD, have HEAD point to the branch instead. You notice a small inefficiency here as we write HEAD twice, so it could be reworded as: 1) compare superprojects gitlink with the submodules branch 2a) if equal, set submodules HEAD to branch 2b) if unequal set HEAD to gitlink value, resulting in detached HEAD Note that this idea of reattaching reflects the idea (a) below. > > a) "stay on submodule branch (i.e. HEAD still points at $branch), and > > reset --hard" such that the submodule has a clean index and at that $branch > > or > > b) "stay on submodule branch (i.e. HEAD still points at $branch), but > > $branch is > >set to the gitlink from the superproject, and then a reset --hard > >will have the worktree set to it as well. > NB "gitlink" -- just now discovered the thing for me. Thought it would be > called a subproject echoing what git diff/log -p shows for submodule > commits. The terminology is messy: The internal representation in Gits object model is a "gitlink" entry in a tree object. Once we have a .gitmodules entry, we call it submodule. The term 'subproject' is a historic artifact and will likely not be changed in the diff output (or format-patch), because these diffs can be applied using git-am for example. That makes the diff output effectively a transport protocol, and changing protocols is hard if you have no versioning in them. More in https://git-scm.com/docs/gitsubmodules (a rather recent new write of a man page, going into concepts). > > > right -- I meant the local changes and indeed reset --recurse-submodules > > > indeed seems to recurse nicely. Then the undesired effect remaining only > > > the detached HEAD > > > For that we may want to revive discussions in > > https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/ > > well, isn't that one requires a branch to be specified in .gitmodules? Ah good point. > > git reset --hard --recursive=hard,keep-branch PREVIOUSPOINT > > 'keep-branch' (given aforementioned keeping the specified in .gitmodules > branch) might be confusing. Also what if a submodule already in a > detached HEAD? IMHO --recursive=hard, and just saying that it would do > "reset --hard", is imho sufficient. (that is why I like pure > --reset hard since it doesn't care and neither does anything to the > branch) For that we might want to first do the git submodule update --reset-hard which runs reset --hard inside the submodule, no matter which branch the submodule is on (if any) and resets to the given superproject sha1. See git-submodule.sh in git.git[1] in cmd_update. We'd need to add a command line flag (`--reset-hard` would be the obvious choice?) which would set the `update` variable, which then is evaluated to what needs to be done in the submodule, which in that case would be the hard reset. https://github.com/git/git/blob/master/git-submodule.sh#L606 Once that is done we'd want to add a test case, presumably in t/t7406-submodule-update.sh > > > I would have asked for > > > >git revert --recursive ... > > >git rebase --recursive [-i] ... > > > > which I also frequently desire (could elaborate on the use cases etc). > > > These would be nice to have. It would be nice if you'd elaborate on the > > use cases for future reference in the mailing list archive. :-) > > ok, will try to do so ;-) In summary: they are just a logical extension > of git support for submodules for anyone actively working with > submodules to keep entire tree in sync. Then quite often the need for > reverting a specific commit (which also has changes reflected in > submodules) arises. The same with rebase, especially to trim away some > no longer desired changes reflected in submodules. > > the initial "git submodule update --reset-hard" is pretty much a > crude workaround for some of those cases, so I would just go earlier in > the history, and redo some things, whenever I could just drop or revert > some selected set of commits. That makes sens
Re: [PATCH v2 1/3] git clone C:\cygwin\home\USER\repo' is working (again)
Hi Torsten, On Fri, 7 Dec 2018, tbo...@web.de wrote: > compat/mingw-cygwin.c | 28 > compat/mingw-cygwin.h | 20 Please use compat/win32/path.c (or .../path-utils.c) instead. This is not so much about MINGW or Cygwin or MSys or MSYS2 or Visual C++, but about Windows. Thanks, Johannes
Re: Retrieving a file in git that was deleted and committed
On Thu, Dec 6, 2018 at 11:26 PM Jeff King wrote: >> >> On Thu, Dec 06, 2018 at 11:07:00PM -0800, biswaranjan panda wrote: >> > >> Thanks! Strangely git log --follow does work. >> >> I suspect it would work even without --follow. When you limit a log >> traversal with a pathspec, like: >> >> git log foo >> >> that is not about following some continuous stream of content, but >> rather just applying that pathspec to the diff of each commit, and >> pruning ones where it did not change. So even if there are gaps where >> the file did not exist, we continue to apply the pathspec to the older >> commits. > Ah, of course. Thanks for the clarification, Jeff. And my > apologies to > Biswaranjan Panda for the incorrect information. Thanks Jeff and Bryan! However, I am curious that if there were a way to tell git blame to skip a commit (the one which added the file again and maybe the one which deleted it originally) while it walks back through history, then it should just get back the entire history right ? On Thu, Dec 6, 2018 at 11:37 PM Bryan Turner wrote: > > On Thu, Dec 6, 2018 at 11:26 PM Jeff King wrote: > > > > On Thu, Dec 06, 2018 at 11:07:00PM -0800, biswaranjan panda wrote: > > > > > Thanks! Strangely git log --follow does work. > > > > I suspect it would work even without --follow. When you limit a log > > traversal with a pathspec, like: > > > > git log foo > > > > that is not about following some continuous stream of content, but > > rather just applying that pathspec to the diff of each commit, and > > pruning ones where it did not change. So even if there are gaps where > > the file did not exist, we continue to apply the pathspec to the older > > commits. > > Ah, of course. Thanks for the clarification, Jeff. And my apologies to > Biswaranjan Panda for the incorrect information. > > > > > Tools like git-blame will _not_ work, though, as they really are trying > > to track the content as they walk back through history. And Once all of > > the content seems to appear from nowhere in your new commit, that seems > > like a dead end. > > > > In theory there could be some machine-readable annotation in the commit > > object (or in a note created after the fact) to say "even though 'foo' > > is a new file here, it came from $commit:foo". And then git-blame could > > keep following the content there. But such a feature does not yet exist. > > > > -Peff -- Thanks, -Biswa
Re: [PATCH] mailmap: update brandon williams's email address
Brandon Williams wrote: > Signed-off-by: Brandon Williams > --- > .mailmap | 1 + > 1 file changed, 1 insertion(+) I can confirm that this is indeed the same person. Reviewed-by: Jonathan Nieder Welcome back!
Re: RFE: git-patch-id should handle patches without leading "diff"
On Fri, Dec 07 2018, Konstantin Ryabitsev wrote: > Hi, all: > > Every now and again I come across a patch sent to LKML without a leading > "diff a/foo b/foo" -- usually produced by quilt. E.g.: > > https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/ > > I am guessing quilt does not bother including the leading "diff a/foo > b/foo" because it's redundant with the next two lines, however this > remains a valid patch recognized by git-am. > > If you pipe that patch via git-patch-id, it produces nothing, but if I > put in the leading "diff", like so: > > diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". > > Can we please teach git-patch-id to work without the leading diff a/foo > b/foo, same as git-am? > > Best, > -K The state machine is sensitive there being a "diff" line, then "index" etc. diff --git a/builtin/patch-id.c b/builtin/patch-id.c index 970d0d30b4..b99e4455fd 100644 --- a/builtin/patch-id.c +++ b/builtin/patch-id.c @@ -97,7 +97,9 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result, } /* Ignore commit comments */ - if (!patchlen && !starts_with(line, "diff ")) + if (!patchlen && starts_with(line, "--- a/")) + ; + else if (!patchlen && !starts_with(line, "diff ")) continue; /* Parsing diff header? */ This would make it produce a patch-id for that input, however note that I've done "--- a/" there, with just "--- " (which is legit) we'd get confused and start earlier before the diffstat. So if you're interested in having this I leave it to you to run with this & write tests for it, but more convincingly run it on the git & LKML archives and see that the output is the same (or just extra in case where we now find patches) with --stable etc.
Re: [PATCH] git-rebase.txt: update note about directory rename detection and am
On Fri, Dec 7, 2018 at 9:51 AM Johannes Sixt wrote: > > From: Elijah Newren > > In commit 6aba117d5cf7 ("am: avoid directory rename detection when > calling recursive merge machinery", 2018-08-29), the git-rebase manpage > probably should have also been updated to note the stronger > incompatibility between git-am and directory rename detection. Update > it now. > > Signed-off-by: Elijah Newren > Signed-off-by: Johannes Sixt > --- > Documentation/git-rebase.txt | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt > index 41631df6e4..7bea21e8e3 100644 > --- a/Documentation/git-rebase.txt > +++ b/Documentation/git-rebase.txt > @@ -569,8 +569,9 @@ it to keep commits that started empty. > Directory rename detection > ~~ > > -The merge and interactive backends work fine with > -directory rename detection. The am backend sometimes does not. > +Directory rename heuristics are enabled in the merge and interactive > +backends. Due to the lack of accurate tree information, directory > +rename detection is disabled in the am backend. > > include::merge-strategies.txt[] I was intending to send this out the past couple days, was just kinda busy. Thanks for handling it for me.
Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
On 12/6/2018 6:36 PM, Jonathan Tan wrote: AFAICT oid_object_info doesn't take advantage of the commit graph, but just looks up the object header, which is still less than completely parsing it. Then lookup_commit is overly strict, as it may return NULL as when there still is a type mismatch (I don't think a mismatch could happen here, as both rely on just the object store, and not the commit graph.), so this would be just defensive programming for the sake of it. I dunno. struct commit *c; if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT && (c = lookup_commit(revs->repo, oid)) && !repo_parse_commit(revs->repo, c)) object = (struct object *) c; else object = parse_object(revs->repo, oid); I like this way better - I'll do it in the next version. If we do _not_ have a commit-graph or if the commit-graph does not have that commit, this will have the same performance problem, right? Should we instead create a direct dependence on the commit-graph, and try to parse the oid from the graph directly? If it succeeds, then we learn that the object is a commit, in addition to all of the parsing work. This means we could avoid oid_object_info() loading data if we succeed. We would fall back to parse_object() if it fails. I was thinking this should be a simple API call to parse_commit_in_graph(), but that requires a struct commit filled with an oid, which is not the best idea if we don't actually know it is a commit yet. The approach I recommend would then be more detailed: 1. Modify find_commit_in_graph() to take a struct object_id instead of a struct commit. This helps find the integer position in the graph. That position can be used in fill_commit_in_graph() to load the commit contents. Keep find_commit_in_graph() static as it should not be a public function. 2. Create a public function with prototype struct commit *try_parse_commit_from_graph(struct repository *r, struct object_id *oid) that returns a commit struct fully parsed if and only if the repository has that oid. It can call find_commit_in_graph(), then lookup_commit() and fill_commit_in_graph() to create the commit and parse the data. 3. In replace of the snippet above, do: struct commit *c; if ((c = try_parse_commit_from_graph(revs->repo, oid)) object = (struct object *)c; else object = parse_object(revs->repo, oid); A similar pattern _could_ be used in parse_object(), but I don't recommend doing this pattern unless we have a reasonable suspicion that we are going to parse commits more often than other objects. (It adds an O(log(# commits)) binary search to each object.) A final thought: consider making this "try the commit graph first, but fall back to parse_object()" a library function with a name like struct object *parse_probably_commit(struct repository *r, struct object_id *oid) so other paths that are parsing a lot of commits (but also maybe tags) could use the logic. Thanks! -Stolee
Re: [PATCH v2 2/3] commit-graph: fix buffer read-overflow
On 12/6/2018 3:20 PM, Josh Steadmon wrote: + +# usage: corrupt_and_zero_graph_then_verify +# Manipulates the commit-graph file at by inserting the data, +# then zeros the file starting at . Finally, runs +# 'git commit-graph verify' and places the output in the file 'err'. Tests 'err' +# for the given string. +corrupt_and_zero_graph_then_verify() { This method is very similar to to 'corrupt_graph_and_verify()', the only difference being the zero_pos, which zeroes the graph. Could it instead be a modification of corrupt_graph_and_verify() where $4 is interpreted as zero_pos, and if it is blank we don't do the truncation? +test_expect_success 'detect truncated graph' ' + corrupt_and_zero_graph_then_verify $GRAPH_BYTE_CHUNK_COUNT "\xff" \ + $GRAPH_CHUNK_LOOKUP_OFFSET "chunk lookup table entry missing" +' + Thanks for this! I think it's valuable to keep explicit tests around that were discovered from your fuzz tests. Specifically, I can repeat the test when I get around to the next file format. Thanks, -Stolee
Re: Get "responsible" .gitignore file / rule
Am Fr., 7. Dez. 2018 um 13:45 Uhr schrieb Eric Sunshine : > > On Fri, Dec 7, 2018 at 7:36 AM Victor Toni wrote: > > I'm wondering if there is any way to show which rules (ideally with > > the .gitignore file they are coming from) are causing a specific file > > to get ignored so I could easily fix the .gitignore file? > > Perhaps the "git check-ignore" command would help. Thanks for the tip! Works like a charm (had to use the --verbose option though, without, it does not give much feedback)
Re: Get "responsible" .gitignore file / rule
On Fri, Dec 7, 2018 at 7:36 AM Victor Toni wrote: > I'm wondering if there is any way to show which rules (ideally with > the .gitignore file they are coming from) are causing a specific file > to get ignored so I could easily fix the .gitignore file? Perhaps the "git check-ignore" command would help.
Re: [PATCH v2 2/3] commit-graph: fix buffer read-overflow
On Thu, Dec 06, 2018 at 12:20:54PM -0800, Josh Steadmon wrote: > diff --git a/commit-graph.c b/commit-graph.c > index 07dd410f3c..224a5f161e 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -165,10 +165,20 @@ struct commit_graph *parse_commit_graph(void > *graph_map, int fd, > last_chunk_offset = 8; > chunk_lookup = data + 8; > for (i = 0; i < graph->num_chunks; i++) { > - uint32_t chunk_id = get_be32(chunk_lookup + 0); > - uint64_t chunk_offset = get_be64(chunk_lookup + 4); > + uint32_t chunk_id; > + uint64_t chunk_offset; > int chunk_repeated = 0; > > + if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > > + data + graph_size) { > + error(_("chunk lookup table entry missing; graph file > may be incomplete")); > + free(graph); > + return NULL; > + } Is it possible to overflow the addition here? E.g., if I'm on a 32-bit system and the truncated chunk appears right at the 4GB limit, in which case we wrap back around? I guess that's pretty implausible, since it would mean that the mmap is bumping up against the end of the address space. I didn't check, but I wouldn't be surprised if sane operating systems avoid allocating those addresses. But I think you could write this as: if (data + graph_size - chunk_lookup < GRAPH_CHUNKLOOKUP_WIDTH) to avoid overflow (we know that "data + graph_size" is sane because that's our mmap, and chunk_lookup is somewhere between "data" and "data + graph_size", so the result is between 0 and graph_size). I dunno. I think I've convinced myself it's a non-issue here, but it may be good to get in the habit of writing these sorts of offset checks in an overflow-proof order. -Peff
Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
On Thu, Dec 06, 2018 at 03:54:46PM -0800, Jonathan Tan wrote: > This makes sense - I thought I shouldn't mention the commit graph in the > code since it seems like a layering violation, but I felt the need to > mention commit graph in a comment, so maybe the need to mention commit > graph in the code is there too. Subsequently, maybe the lookup-for-type > could be replaced by a lookup-in-commit-graph (maybe by using > parse_commit_in_graph() directly), which should be at least slightly > faster. That makes more sense to me. If we don't have a commit graph at all, it's a quick noop. If we do, we might binary search in the list of commits for a non-commit. But that's strictly faster than finding the object's type (which involves a binary search of a larger list, followed by actually accessing the type info). > > In general, it would be nice if we had a more incremental API > > for accessing objects: open, get metadata, then read the data. That > > would make these kinds of optimizations "free". > > Would this be assuming that to read the data, you would (1) first need to > read the metadata, and (2) there would be no redundancy in reading the > two? It seems to me that for loose objects, you would want to perform > all your reads at once, since any read requires opening the file, and > for commit graphs, you just want to read what you want, since the > metadata and the data are in separate places. By metadata here, I don't mean the commit-graph data, but just the object type and size. So I'm imagining an interface more like: - object_open() locates the object, and stores either the pack file/offset or a descriptor to a loose path in an opaque handle struct - object_size() and object_type() on that handle would do what you expect. For loose objects, these would parse the header (the equivalent of unpack_sha1_header()). For packed ones, they'd use the object header in the pack (and chase down the delta bits as needed). - object_contents() would return the full content - object_read() could sequentially read a subset of the file (this could replace the streaming interface we currently have) We have most of the low-level bits for this already, if you poke into what object_info_extended() is doing. We just don't have them packaged in an interface which can persist across multiple calls. With an interface like that, parse_object()'s large-blob check could be something like the patch below. But your case here is a bit more interesting. If we have a commit graph, then we can avoid opening (or even finding!) the on-disk object at all. So I actually think it makes sense to just check the commit-graph first, as discussed above. --- diff --git a/object.c b/object.c index e54160550c..afce58c0bc 100644 --- a/object.c +++ b/object.c @@ -254,23 +254,31 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) const struct object_id *repl = lookup_replace_object(r, oid); void *buffer; struct object *obj; + struct object_handle oh; obj = lookup_object(r, oid->hash); if (obj && obj->parsed) return obj; - if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) || - (!obj && has_object_file(oid) && -oid_object_info(r, oid, NULL) == OBJ_BLOB)) { - if (check_object_signature(repl, NULL, 0, NULL) < 0) { + if (object_open(, oid) < 0) + return NULL; /* missing object */ + + if (object_type() == OBJ_BLOB) { + /* this will call object_read() on 4k chunks */ + if (check_object_signature_stream(, oid)) { error(_("sha1 mismatch %s"), oid_to_hex(oid)); return NULL; } + object_close(); /* we don't care about contents */ parse_blob_buffer(lookup_blob(r, oid), NULL, 0); return lookup_object(r, oid->hash); } - buffer = read_object_file(oid, , ); + type = object_type(); + size = object_size(); + buffer = object_contents(); + object_close(); + if (buffer) { if (check_object_signature(repl, buffer, size, type_name(type)) < 0) { free(buffer);
Re: Retrieving a file in git that was deleted and committed
On Thu, Dec 6, 2018 at 11:26 PM Jeff King wrote: > > On Thu, Dec 06, 2018 at 11:07:00PM -0800, biswaranjan panda wrote: > > > Thanks! Strangely git log --follow does work. > > I suspect it would work even without --follow. When you limit a log > traversal with a pathspec, like: > > git log foo > > that is not about following some continuous stream of content, but > rather just applying that pathspec to the diff of each commit, and > pruning ones where it did not change. So even if there are gaps where > the file did not exist, we continue to apply the pathspec to the older > commits. Ah, of course. Thanks for the clarification, Jeff. And my apologies to Biswaranjan Panda for the incorrect information. > > Tools like git-blame will _not_ work, though, as they really are trying > to track the content as they walk back through history. And Once all of > the content seems to appear from nowhere in your new commit, that seems > like a dead end. > > In theory there could be some machine-readable annotation in the commit > object (or in a note created after the fact) to say "even though 'foo' > is a new file here, it came from $commit:foo". And then git-blame could > keep following the content there. But such a feature does not yet exist. > > -Peff
Re: Retrieving a file in git that was deleted and committed
On Thu, Dec 06, 2018 at 11:07:00PM -0800, biswaranjan panda wrote: > Thanks! Strangely git log --follow does work. I suspect it would work even without --follow. When you limit a log traversal with a pathspec, like: git log foo that is not about following some continuous stream of content, but rather just applying that pathspec to the diff of each commit, and pruning ones where it did not change. So even if there are gaps where the file did not exist, we continue to apply the pathspec to the older commits. Tools like git-blame will _not_ work, though, as they really are trying to track the content as they walk back through history. And Once all of the content seems to appear from nowhere in your new commit, that seems like a dead end. In theory there could be some machine-readable annotation in the commit object (or in a note created after the fact) to say "even though 'foo' is a new file here, it came from $commit:foo". And then git-blame could keep following the content there. But such a feature does not yet exist. -Peff
Re: Retrieving a file in git that was deleted and committed
Thanks! Strangely git log --follow does work. On Thu, Dec 6, 2018 at 10:55 PM Bryan Turner wrote: > > On Thu, Dec 6, 2018 at 10:49 PM biswaranjan panda > wrote: > > > > I have the following scenario: > > > > On a branch A, I deleted a file foo.txt and committed the change. Then > > I did a bunch of other changes. > > Now I want to undelete foo.txt. > > > > One way is to checkout a separate branch B where the file is present. > > Then checkout A. Then do > > git checkout B -- path_to_file > > It doesn't change anything, but note that you don't need to checkout B > first, to restore the file. If you know a commit SHA where the file is > present, "git checkout SHA -- path_to_file" will pull back the file as > it existed at that commit. > > > > > While this does gets the file back, the file shows up as a new file to > > be committed. Once I commit it, git blame doesn't show the old history > > for the file. > > > > I would appreciate if anyone knows how to preserve git blame history > > It's not possible, as far as I'm aware. While the new file has the > same name as the old file, to Git they are two unrelated entries that > happen to reside at the same path. Even things like "git log --follow" > won't consider the file to be related to its previous history. > > Bryan -- Thanks, -Biswa
Re: Retrieving a file in git that was deleted and committed
On Thu, Dec 6, 2018 at 10:49 PM biswaranjan panda wrote: > > I have the following scenario: > > On a branch A, I deleted a file foo.txt and committed the change. Then > I did a bunch of other changes. > Now I want to undelete foo.txt. > > One way is to checkout a separate branch B where the file is present. > Then checkout A. Then do > git checkout B -- path_to_file It doesn't change anything, but note that you don't need to checkout B first, to restore the file. If you know a commit SHA where the file is present, "git checkout SHA -- path_to_file" will pull back the file as it existed at that commit. > > While this does gets the file back, the file shows up as a new file to > be committed. Once I commit it, git blame doesn't show the old history > for the file. > > I would appreciate if anyone knows how to preserve git blame history It's not possible, as far as I'm aware. While the new file has the same name as the old file, to Git they are two unrelated entries that happen to reside at the same path. Even things like "git log --follow" won't consider the file to be related to its previous history. Bryan
Re: [wishlist] git submodule update --reset-hard
Hi Stefan, Thanks for the dialogue! Replies are embedded below. On Thu, 06 Dec 2018, Stefan Beller wrote: > ... > > > What if the branch differs from the sha1 recorded in the superproject? > > git reset --hard itself is an operation which should be done with some > > level of competence in doing "the right thing" by calling it. You > > can hop branches even in current (without any submodules in question) > > repository with it and cause as much chaos as you desire. > Right. > git reset --hard would the branch (as well as the working tree) to the > given sha1, which is confusing as submodules get involved. > The Right Thing as of now is the sha1 as found in the > superprojects gitlink. But as that can be different from any branch > in the submodule, we'd rather detach the HEAD to make it > deterministic. yeap, makes total sense to be the thing do that by default ;-) > There was a proposal to "re-attach HEAD" in the submodule, i.e. > if the branch branch points at the same commit, we don't need > a detached HEAD, but could go with the branch instead. if I got the idea right, if we are talking about any branch, it would also non-deterministic since who knows what left over branch(es) point to that commit. Not sure if I would have used that ;) > > If desired though, a number of prevention mechanisms could be in place (but > > would require option(s) to overcome) to allow submodule to be reset > > --hard'ed > > only when some conditions met (e.g. only to the commit which is among parent > > commits path of the current branch). This way wild hops would be prevented, > > although you might still end up in some feature branch. But since "reset > > --hard" itself doesn't have any safe-guards, I do not really think they > > should > > be implemented here either. > So are you looking for > a) "stay on submodule branch (i.e. HEAD still points at $branch), and > reset --hard" such that the submodule has a clean index and at that $branch > or > b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch > is >set to the gitlink from the superproject, and then a reset --hard >will have the worktree set to it as well. yes! NB "gitlink" -- just now discovered the thing for me. Thought it would be called a subproject echoing what git diff/log -p shows for submodule commits. > (a) is what the referenced submodule.repoLike option implements. sounds like it indeed, thanks for spelling out > I'd understand the desire for (b) as well, as it is a "real" hard reset on > the superproject level, without detaching branches. yeap > > > git reset --hard --recurse-submodules HEAD > > it does indeed some trick(s) but not all seems to be the ones I desire: > > 1. Seems to migrate submodule's .git directories into the top level > > .git/modules > Ah yes, that happens too. This will help once you want to git-rm > a submodule and checkout states before and after. yeap ;-) > > > undesirable in the sense of still having local changes (that is what > > > the above reset with `--recurse` would fix) or changed the branch > > > state? (i.e. is detached but was on a branch before?) > > right -- I meant the local changes and indeed reset --recurse-submodules > > indeed seems to recurse nicely. Then the undesired effect remaining only > > the detached HEAD > For that we may want to revive discussions in > https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/ well, isn't that one requires a branch to be specified in .gitmodules? > > > > git submodule update --recursive > > > > I would end up in the detached HEADs within submodules. > > > > What I want is to retain current branch they are at (or may be possible > > > > "were in"? reflog records might have that information) > > > So something like > > > git submodule foreach --recursive git reset --hard > > > ? > > not quite -- this would just kill all local changes within each submodule, > > not > > to reset it to the desired state, which wouldn't be specified in such > > invocation, and is only known to the repo containing it > With this answer it sounds like you'd want (b) from above. yeap > > > You may be interested in > > > https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/ > > > which introduces a switch `submodule.repoLike [ = true]`, which > > > when set would not detach HEAD in submodules. > > Thanks! looks interesting -- was there more discussion/activity beyond > > those 5 > > posts in the thread? >
Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
On Fri, Dec 07, 2018 at 12:10:05AM +0100, SZEDER Gábor wrote: > On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote: > > Could we just kill them all? > > > > I guess it's a little tricky, because $! is going to give us the pid of > > each subshell. We actually want to kill the test sub-process. That takes > > a few contortions, but the output is nice (every sub-job immediately > > says "ABORTED ...", and the final process does not exit until the whole > > tree is done): > > It's trickier than that: > > - Nowadays our test lib translates SIGINT to exit, but not TERM (or > HUP, in case a user decides to close the terminal window), which > means that if the test runs any daemons in the background, then > those won't be killed when the stress test is aborted. > > This is easy enough to address, and I've been meaning to do this > in an other patch series anyway. Yeah, trapping on more signals seems reasonable (or just propagating INT instead of TERM). I'm more worried about this one: > - With the (implied) '--verbose-log' the process killed in the > background subshell's SIGTERM handler is not the actual test > process, because '--verbose-log' runs the test again in a piped > subshell to save its output: Hmm, yeah. The patch I sent earlier already propagates through the subshell, so this is really just another layer there. I.e., would it be reasonable when we are using --verbose-log (or --tee) for the parent process to propagate signals down to child it spawned? That would fix this case, and it would make things more predictable in general (e.g., a user who manually runs kill). It does feel like a lot of boilerplate to be propagating these signals manually. I think the "right" Unix-y solution here is process groups: each sub-job should get its own group, and then the top-level runner script can kill the whole group. We may run into portability issues, though (setsid is in util-linux, but I don't know if there's an equivalent elsewhere; a small perl or C helper could do it, but I have no idea what that would mean on Windows). > (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1; >echo $? >"$TEST_RESULTS_BASE.exit") | tee -a > "$GIT_TEST_TEE_OUTPUT_FILE" > > That 'kill' kills the "outer" shell process. > And though I get "ABORTED..." immediately and I get back my > prompt right away, the commands involved in the above construct > are still running in the background: > > $ ./t3903-stash.sh --stress=1 > ^CABORTED 0.0 > $ ps a |grep t3903 > 1280 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 > 1281 pts/17 S 0:00 tee -a > <...>/test-results/t3903-stash.stress-0.out > 1282 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 > 4173 pts/17 S+ 0:00 grep t3903 > > I see this behavior with all shells I tried. > I haven't yet started to think it through why this happens :) I think you get ABORTED because we are just watching for the parent-process to exit (and reporting its status). The fact that that the subshell (and the tee) are still running is not important. That all makes sense to me. > Not implying '--verbose-log' but redirecting the test script's > output, like you did in your 'stress' script, seems to work in > dash, ksh, and ks93, but not in Bash v4.3 or later, where, for > whatever reason, the subshell get SIGINT before the SIGTERM would > arrive. > While we could easily handle SIGINT in the subshell with the same > signal handler as SIGTERM, it bothers me that something > fundamental is wrong here. Yeah, I don't understand the SIGINT behavior you're seeing with bash. I can't replicate it with bash 4.4.23 (from Debian unstable). -Peff
Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
On Thu, Dec 06, 2018 at 11:56:01PM +0100, SZEDER Gábor wrote: > > +test_expect_success 'roll those dice' ' > > + case "$(openssl rand -base64 1)" in > > + z*) > > + return 1 > > + esac > > +' > > Wasteful :) > > test $(($$ % 42)) -ne 0 Oh, indeed, that is much more clever. :) -Peff
Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
On 2018.11.28 16:27, Stefan Beller wrote: > This is a resend of sb/submodule-recursive-fetch-gets-the-tip, > with all feedback addressed. As it took some time, I'll send it > without range-diff, but would ask for full review. > > I plan on resending after the next release as this got delayed quite a bit, > which is why I also rebased it to master. > > Thanks, > Stefan I am not very familiar with most of the submodule code, but for what it's worth, this entire series looks good to me. I'll note that most of the commits caused some style complaints, but I'll leave it up to your judgement as to whether they're valid or not. Reviewed-by: Josh Steadmon
Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
Jonathan Tan writes: >> Jonathan Tan writes: >> >> > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct >> > output_state *os) >> >} >> >os->used += readsz; >> > >> > + if (!os->packfile_started) { >> > + os->packfile_started = 1; >> > + if (use_protocol_v2) >> > + packet_write_fmt(1, "packfile\n"); >> >> If we fix this function so that the only byte in the buffer is held >> back without emitted when os->used == 1 as I alluded to, this may >> have to be done a bit later, as with such a change, it is no longer >> guaranteed that send_client_data() will be called after this point. > > I'm not sure what you mean about there being no guarantee that > send_client_data() is not called - in create_pack_file(), there is an > "if (output_state.used > 0)" line (previously "if (0 <= buffered)") that > outputs anything remaining. I was referring to this part of the review on the previous step, which you may not yet have read. OK, this corresponds to the "*cp++ = buffered" in the original just before xread(). > + os->used = 1; > + } else { > + send_client_data(1, os->buffer, os->used); > + os->used = 0; I am not sure if the code is correct when os->used happens to be 1 (shouldn't we hold the byte, skip the call to send_client_data(), and go back to poll() to expect more data?), but this is a faithful code movement and rewrite of the original. The point of this logic is to make sure we always hold back some bytes and do not feed *all* the bytes to the other side by calling "send-client-data" until we made sure the upstream of what we are relaying (pack-objects?) successfully exited, but it looks to me that the "else" clause above ends up flushing everything when os->used is 1, which goes against the whole purpose of the code. And the "fix" I was alluding to was to update that "else" clause to make it a no-op that keeps os->used non-zero, which would not call send-client-data. When that fix happens, the part that early in the function this patch added "now we know we will call send-client-data, so let's say 'here comes packdata' unless we have already said that" is making the decision too early. Depending on the value of os->used when we enter the code and the number of bytes xread() reads from the upstream, we might not call send-client-data yet (namely, when we have no buffered data and we happened to get only one byte). > ... it might be > better if the server can send sideband throughout the whole response - > perhaps that should be investigated first. Yup. It just looked quite crazy, and it is even more crazy to buffer keepalives ;-)
Re: Any way to make git-log to enumerate commits?
Konstantin Khomoutov writes: >> I do not see why the "name each rev relative to HEAD" formatting >> option cannot produce HEAD^2~2 etc. >> ... > My reading was that the OP explicitly wanted to just glance at a single > integer number and use it right away in a subsequent rebase command. > > I mean, use one's own short memory instead of copying and pasting. As everybody pointed out, a single integer number will fundamentally unworkable with distributed nature of Git that inherently makes the history with forks and merges. Besides, there is no way to feed "git log" with "a single integer number", without which "making git-log to enumerate" would not be all that useful ("git show HEAD~4" works but "git show 4" does not). All of these name-rev based suggestions were about using anchoring point with memorable name plus a short path to reach from there, which I think is the closest thing to "a single integer number" and still is usable for the exact purpose. "HEAD~48^2" on an integration branch would be "the tip of the side branch that was merged some 48 commits ago", for example.
Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
Also CC-ing Stolee since I mention multi-pack indices at the end. > This seems like a reasonable thing to do, but I have sort of a > meta-comment. In several places we've started doing this kind of "if > it's this type of object, do X, otherwise do Y" optimization (e.g., > handling large blobs for streaming). > > And in the many cases we end up doubling the effort to do object > lookups: here we do one lookup to get the type, and then if it's not a > commit (or if we don't have a commit graph) we end up parsing it anyway. > > I wonder if we could do better. In this instance, it might make sense > to first see if we actually have a commit graph available (it might not > have this object, of course, but at least we'd expect it to have most > commits). This makes sense - I thought I shouldn't mention the commit graph in the code since it seems like a layering violation, but I felt the need to mention commit graph in a comment, so maybe the need to mention commit graph in the code is there too. Subsequently, maybe the lookup-for-type could be replaced by a lookup-in-commit-graph (maybe by using parse_commit_in_graph() directly), which should be at least slightly faster. > In general, it would be nice if we had a more incremental API > for accessing objects: open, get metadata, then read the data. That > would make these kinds of optimizations "free". Would this be assuming that to read the data, you would (1) first need to read the metadata, and (2) there would be no redundancy in reading the two? It seems to me that for loose objects, you would want to perform all your reads at once, since any read requires opening the file, and for commit graphs, you just want to read what you want, since the metadata and the data are in separate places. > I don't have numbers for how much the extra lookups cost. The lookups > are probably dwarfed by parse_object() in general, so even if we save > only a few full object loads, it may be a win. It just seems a shame > that we may be making the "slow" paths (when our type-specific check > doesn't match) even slower. I agree. I think it will always remain a tradeoff when we have multiple data sources of objects (loose, packed, commit graph - and we can't unify them all, since they each have their uses). Unless the multi-pack index can reference commit graphs as well...then it could be our first point of reference without introducing any inefficiencies...
Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
> > This is on sb/more-repo-in-api because I'm using the repo_parse_commit() > > function. > > This is a mere nicety, not strictly required. > Before we had parse_commit(struct commit *) which would accomplish the > same, (and we'd still have that afterwards as a #define falling back onto > the_repository). As the function get_reference() is not the_repository safe > as it contains a call to is_promisor_object() that is repository > agnostic, I think > it would be fair game to not depend on that series. I am not > complaining, though. Good point - I'll base the next version on master (and add a TODO explaining which functions are not yet converted). > AFAICT oid_object_info doesn't take advantage of the commit graph, > but just looks up the object header, which is still less than completely > parsing it. Then lookup_commit is overly strict, as it may return > NULL as when there still is a type mismatch (I don't think a mismatch > could happen here, as both rely on just the object store, and not the > commit graph.), so this would be just defensive programming for > the sake of it. I dunno. > > struct commit *c; > > if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT && > (c = lookup_commit(revs->repo, oid)) && > !repo_parse_commit(revs->repo, c)) > object = (struct object *) c; > else > object = parse_object(revs->repo, oid); I like this way better - I'll do it in the next version.
Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
> Jonathan Tan writes: > > > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct > > output_state *os) > > } > > os->used += readsz; > > > > + if (!os->packfile_started) { > > + os->packfile_started = 1; > > + if (use_protocol_v2) > > + packet_write_fmt(1, "packfile\n"); > > If we fix this function so that the only byte in the buffer is held > back without emitted when os->used == 1 as I alluded to, this may > have to be done a bit later, as with such a change, it is no longer > guaranteed that send_client_data() will be called after this point. I'm not sure what you mean about there being no guarantee that send_client_data() is not called - in create_pack_file(), there is an "if (output_state.used > 0)" line (previously "if (0 <= buffered)") that outputs anything remaining. > Isn't progress output that goes to the channel #2 pretty much > independent from the payload stream that carries the pkt-line > command like "packfile" plus the raw pack stream? It somehow > feels like an oxymoron to _buffer_ progress indicator, as it > defeats the whole point of progress report to buffer it. Yes, it is - I don't fully like this part of the design. I alluded to a similar issue (keepalive) in the toplevel email [1] and that it might be better if the server can send sideband throughout the whole response - perhaps that should be investigated first. If we had sideband throughout the whole response, we wouldn't need to buffer the progress indicator. [1] https://public-inbox.org/git/cover.1543879256.git.jonathanta...@google.com/
Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
> > +This feature allows servers to serve part of their packfile response as > > URIs. > > +This allows server designs that improve scalability in bandwidth and CPU > > usage > > +(for example, by serving some data through a CDN), and (in the future) > > provides > > +some measure of resumability to clients. > > Without reading the remainder, this makes readers anticipate a few > good things ;-) > > - "part of", so pre-generated constant material can be given from >CDN and then followed-up by "filling the gaps" small packfile, >perhaps? Yes :-) > - The "part of" transmission may not bring the repository up to >date wrt to the "want" objects; would this feature involve "you >asked history up to these commits, but with this pack-uri, you'll >be getting history up to these (somewhat stale) commits"? It could be, but not necessarily. In my current WIP implementation, for example, pack URIs don't give you any commits at all (and thus, no history) - only blobs. Quite a few people first think of the "stale clone then top-up" case, though - I wonder if it would be a good idea to give the blob example in this paragraph in order to put people in the right frame of mind. > > +If the client replies with the following arguments: > > + > > + * packfile-uris > > + * thin-pack > > + * ofs-delta > > "with the following" meaning "with all of the following", or "with > any of the following"? Is there a reason why the server side must > require that the client understands and is willing to accept a > thin-pack when wanting to use packfile-uris? The same question for > the ofs-delta. "All of the following", but from your later comments, we probably don't need this section anyway. > My recommendation is to drop the mention of "thin" and "ofs" from > the above list, and also from the following paragraph. The "it MAY > send" will serve as a practical escape clause to allow a server/CDN > implementation that *ALWAYS* prepares pregenerated material that can > only be digested by clients that supports thin and ofs. Such a server > can send packfile-URIs only when all of the three are given by the > client and be compliant. And such an update to the proposed document > would allow a more diskful server to prepare both thin and non-thin > pregenerated packs and choose which one to give to the client depending > on the capability. That is true - we can just let the server decide. I'll update the patch accordingly, and state that the URIs should point to packs with features like thin-pack and ofs-delta only if the client has declared that it supports them. > > +Clients then should understand that the returned packfile could be > > incomplete, > > +and that it needs to download all the given URIs before the fetch or clone > > is > > +complete. Each URI should point to a Git packfile (which may be a thin > > pack and > > +which may contain offset deltas). > > weaken or remove the (parenthetical comment) in the last sentence, > and replace the beginning of the section with something like > > If the client replies with 'packfile-uris', when the server > sends the packfile, it MAY send a `packfile-uris` section... > > You may steal what I wrote in the above response to help the > server-side folks to decide how to actually implement the "it MAY > send a packfile-uris" part in the document. OK, will do. > OK, this comes back to what I alluded to at the beginning. We could > respond to a full-clone request by feeding a series of packfile-uris > and some ref information, perhaps like this: > > * Grab this packfile and update your remote-tracking refs > and tags to these values; you'd be as if you cloned the > project when it was at v1.0. > > * When you are done with the above, grab this packfile and > update your remote-tracking refs and tags to these values; > you'd be as if you cloned the project when it was at v2.0. > > * When you are done with the above, grab this packfile and > update your remote-tracking refs and tags to these values; > you'd be as if you cloned the project when it was at v3.0. > > ... > > * When you are done with the above, here is the remaining > packdata to bring you fully up to date with your original > "want"s. > > and before fully reading the proposal, I anticipated that it was > what you were going to describe. The major difference is "up to the > packdata given to you so far, you'd be as if you fetched these" ref > information, which would allow you to be interrupted and then simply > resume, without having to remember the set of packfile-uris yet to > be processed across a fetch/clone failure. If you sucessfully fetch > packfile for ..v1.0, you can update the remote-tracking refs to > match as if you fetched back when that was the most recent state of > the project, and then if you failed while transferring packfile for > v1.0..v2.0,
Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote: > Could we just kill them all? > > I guess it's a little tricky, because $! is going to give us the pid of > each subshell. We actually want to kill the test sub-process. That takes > a few contortions, but the output is nice (every sub-job immediately > says "ABORTED ...", and the final process does not exit until the whole > tree is done): It's trickier than that: - Nowadays our test lib translates SIGINT to exit, but not TERM (or HUP, in case a user decides to close the terminal window), which means that if the test runs any daemons in the background, then those won't be killed when the stress test is aborted. This is easy enough to address, and I've been meaning to do this in an other patch series anyway. - With the (implied) '--verbose-log' the process killed in the background subshell's SIGTERM handler is not the actual test process, because '--verbose-log' runs the test again in a piped subshell to save its output: (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1; echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE" That 'kill' kills the "outer" shell process. And though I get "ABORTED..." immediately and I get back my prompt right away, the commands involved in the above construct are still running in the background: $ ./t3903-stash.sh --stress=1 ^CABORTED 0.0 $ ps a |grep t3903 1280 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 1281 pts/17 S 0:00 tee -a <...>/test-results/t3903-stash.stress-0.out 1282 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 4173 pts/17 S+ 0:00 grep t3903 I see this behavior with all shells I tried. I haven't yet started to think it through why this happens :) Not implying '--verbose-log' but redirecting the test script's output, like you did in your 'stress' script, seems to work in dash, ksh, and ks93, but not in Bash v4.3 or later, where, for whatever reason, the subshell get SIGINT before the SIGTERM would arrive. While we could easily handle SIGINT in the subshell with the same signal handler as SIGTERM, it bothers me that something fundamental is wrong here. Furthermore, then we should perhaps forbid '--stress' in combination with '--verbose-log' or '--tee'. > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 9b7f687396..357dead3ff 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -98,8 +98,9 @@ done,*) > mkdir -p "$(dirname "$TEST_RESULTS_BASE")" > stressfail="$TEST_RESULTS_BASE.stress-failed" > rm -f "$stressfail" > - trap 'echo aborted >"$stressfail"' TERM INT HUP > + trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' > TERM INT HUP > > + job_pids= > job_nr=0 > while test $job_nr -lt "$job_count" > do > @@ -108,10 +109,13 @@ done,*) > GIT_TEST_STRESS_JOB_NR=$job_nr > export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR > > + trap 'kill $test_pid 2>/dev/null' TERM > + > cnt=0 > while ! test -e "$stressfail" > do > - if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 > + $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & > test_pid=$! > + if wait > then > printf >&2 "OK %2d.%d\n" > $GIT_TEST_STRESS_JOB_NR $cnt > elif test -f "$stressfail" && > @@ -124,16 +128,11 @@ done,*) > fi > cnt=$(($cnt + 1)) > done > - ) & > + ) & job_pids="$job_pids $!" > job_nr=$(($job_nr + 1)) > done > > - job_nr=0 > - while test $job_nr -lt "$job_count" > - do > - wait > - job_nr=$(($job_nr + 1)) > - done > + wait > > if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" > then >
Re: [BUG REPORT] Git does not correctly replay bisect log
On Thu, Dec 6, 2018 at 6:30 PM Lukáš Krejčí wrote: > > I am talking about `git bisect replay`. The shell script, as far as I > can see, only updates the references (ref/bisect/*) and never checks if > the revisions marked as 'good' are ancestors of the 'bad' one. > Therefore, $GIT_DIR/BISECT_ANCESTORS_OK file is never created. Indeed `git bisect replay` first only updates the references (ref/bisect/*) according to all the "git bisect {good,bad}" instructions it finds in the log it is passed. After doing that though, before exiting, it calls `bisect_auto_next` which calls `bisect_next` which calls `git bisect--helper --next-all` which checks the merge bases. I think it is a bug. `git bisect replay` is right to only update the references (ref/bisect/*) and not to compute and checkout the best commit to test at each step except at the end, but it should probably just create the $GIT_DIR/BISECT_ANCESTORS_OK file if more than one bisection step has been performed (because the merge bases are checked as part of the first bisection step only). > The first time the ancestors are checked is in the helper (`git-bisect- > -help --next-all`) that has only limited information from refs/bisect*. `git-bisect--helper --next-all` knows how to get refs/bisect* information, otherwise it couldn't decide which is the next best commit to test. Thanks for your help in debugging this, Christian.
Re: git, monorepos, and access control
Johannes, Thanks for your feedback. I'm not looking closely at submodules, as it's my understanding that VFSForGit does not support them. A VFS would be a killer feature for us. If VFSForGit were to support submodules, we'd look at them. They would provide access control in a way that's clearly nonabusive. I hear you on the drawbacks. AMD today looks more like the 100 independent repos you describe, except we don't have automation at the delivery arcs. Integrations are manual and thus not particularly frequent. I'm probably biased toward favoring a monorepo, which I've seen applied at a former employer, versus continuous delivery. That's due to lack of personal familiarity with CD -- not any real objections. Thanks, John On 12/06/2018 03:08 PM, Johannes Schindelin wrote: > Hi, > > On Wed, 5 Dec 2018, Jeff King wrote: > >> The model that fits more naturally with how Git is implemented would be >> to use submodules. There you leak the hash of the commit from the >> private submodule, but that's probably obscure enough (and if you're >> really worried, you can add a random nonce to the commit messages in the >> submodule to make their hashes unguessable). > I hear myself frequently saying: "Friends don't let friends use > submodules". It's almost like: "Some people think their problem is solved > by using submodules. Only now they have two problems." > > There are big reasons, after all, why some companies go for monorepos: it > is not for lack of trying to go with submodules, it is the problems that > were incurred by trying to treat entire repositories the same as single > files (or even trees): they are just too different. > > In a previous life, I also tried to go for submodules, was burned, and had > to restart the whole thing. We ended up with something that might work in > this instance, too, although our use case was not need-to-know type of > encapsulation. What we went for was straight up modularization. > > What I mean is that we split the project up into over 100 individual > projects that are now all maintained in individual repositories, and they > are connected completely outside of Git, via a dependency management > system (in this case, Maven, although that is probably too Java-centric > for AMD's needs). > > I just wanted to throw that out here: if you can split up your project > into individual projects, it might make sense not to maintain them as > submodules but instead as individual repositories whose artifacts are > uploaded into a central, versioned artifact store (Maven, NuGet, etc). And > those artifacts would then be retrieved by the projects that need them. > > I figure that that scheme might work for you better than submodules: I > could imagine that you need to make the build artifacts available even to > people who are not permitted to look at the corresponding source code, > anyway. > > Ciao, > Johannes
Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
On Wed, Dec 05, 2018 at 04:36:26PM -0500, Jeff King wrote: > The signal interrupts the first wait. Ah, of course. I'm ashamed to say that this is not the first time I forget about that... > > Bash 4.3 or later are strange: I get back the shell prompt immediately > > after ctrl-C as well, so it doesn't appear to be waiting for all > > remaining jobs to finish either, but! I don't get any of the progress > > output from those jobs to mess up my next command. > > Interesting. My bash 4.4 seems to behave the same as dash. It almost > sounds like the SIGINT is getting passed to the subshells for you. > Probably not really worth digging into, though. The subshell does indeed get SIGINT. I don't know why that happens, to my understanding that should not happen. > In case anybody is playing with this and needed to simulate a stress > failure, here's what I used: > > diff --git a/t/t-basic.sh b/t/t-basic.sh > index b6566003dd..a370cd9977 100755 > --- a/t/t-basic.sh > +++ b/t/t-basic.sh > @@ -1171,4 +1171,11 @@ test_expect_success 'very long name in the index > handled sanely' ' > test $len = 4098 > ' > > +test_expect_success 'roll those dice' ' > + case "$(openssl rand -base64 1)" in > + z*) > + return 1 > + esac > +' Wasteful :) test $(($$ % 42)) -ne 0
Re: [WIP RFC 1/5] Documentation: order protocol v2 sections
> > The git command line expects Git servers to follow a specific order of > > "Command line"? It sounds like you are talking about the order of > command line arguments and options, but apparently that is not what > you are doing. Is it "The git over-the-wire protocol"? I meant to say the current Git implementation, as opposed to what is written in the specification. I'll replace it with "The current C Git implementation". > Earlier, we said that shallow-info is not given when packfile is not > there. That is captured in the updated EBNF above. We don't have a > corresponding removal of a bullet point for wanted-refs section below > but probably that is because the original did not have corresponding > bullet point to begin with. That's because the corresponding bullet point had other information. Quoted in full below: > * This section is only included if the client has requested a > ref using a 'want-ref' line and if a packfile section is also > included in the response. I could reword it to "If a packfile section is included in the response, this section is only included if the client has requested a ref using a 'want-ref' line", but I don't think that is significantly clearer.
Re: enhancement: support for author.email and author.name in "git config"
Hi William, On Thu, 6 Dec 2018, William Hubbs wrote: > On Thu, Dec 06, 2018 at 07:38:08PM +0100, Martin Ågren wrote: > > Hi William, > > > > [...] > > > > This idea was floated a couple of months ago [1]. Junio seemed to find > > the request sensible and outlined a design. No patches materialized, as > > far as I know, but that could be because Eric suggested a tool called > > direnv. Maybe that would work for you. > > Yes, this design would solve the issue. There *is* a way to get what you want that is super easy and will definitely work: if you sit down and do it ;-) Please let us know if you need any additional information before you can start. Ciao, Johannes
Re: git, monorepos, and access control
On Thu, Dec 6, 2018 at 12:09 PM Johannes Schindelin wrote: > > Hi, > > On Wed, 5 Dec 2018, Jeff King wrote: > > > The model that fits more naturally with how Git is implemented would be > > to use submodules. There you leak the hash of the commit from the > > private submodule, but that's probably obscure enough (and if you're > > really worried, you can add a random nonce to the commit messages in the > > submodule to make their hashes unguessable). > > I hear myself frequently saying: "Friends don't let friends use > submodules". It's almost like: "Some people think their problem is solved > by using submodules. Only now they have two problems." Blaming tools for their lack of evolution/development is not necessarily the right approach. I recall having patches rejected on this very mailing list that fixed obvious but minor good things like whitespaces and coding style, because it *might* produce merge conflicts. Would that situation warrant me to blame the lacks in the merge algorithm, or could you imagine a better way out? (No need to answer, it's purely to demonstrate that blaming tooling is not always the right approach; only sometimes it may be) > There are big reasons, after all, why some companies go for monorepos: it > is not for lack of trying to go with submodules, it is the problems that > were incurred by trying to treat entire repositories the same as single > files (or even trees): they are just too different. We could change that in more places. One example you might think of is the output of git-status that displays changed files. And in case of submodules it would just show "submodule changes", which we already differentiate into "dirty tree" and "different sha1 at HEAD". Instead we could have the output of all changed files recursively in the superprojects git-status output. Another example is the diff machinery, which already knows some basics such as embedding submodule logs or actual diffs. > In a previous life, I also tried to go for submodules, was burned, and had > to restart the whole thing. We ended up with something that might work in > this instance, too, although our use case was not need-to-know type of > encapsulation. What we went for was straight up modularization. So this is a "Fix the data instead of the tool", which seems to be a local optimization (i.e. you only have to do it once, such that it is cheaper to do than fixing the tool for that workgroup) ... and because everyone does that the tool never gets fixed. > What I mean is that we split the project up into over 100 individual > projects that are now all maintained in individual repositories, and they > are connected completely outside of Git, via a dependency management > system (in this case, Maven, although that is probably too Java-centric > for AMD's needs). Once you have the dependency management system in place, you will encounter the rare case of still wanting to change things across repository boundaries at the same time. Submodules offer that, which is why Android wants to migrate off of the repo tool, and there it seems natural to go for submodules. > I just wanted to throw that out here: if you can split up your project > into individual projects, it might make sense not to maintain them as > submodules but instead as individual repositories whose artifacts are > uploaded into a central, versioned artifact store (Maven, NuGet, etc). And > those artifacts would then be retrieved by the projects that need them. This is cool and industry standard. But once you happen to run in a bug that involves 2 new artifacts (but each of the new artifacts work fine on their own), then you'd wish for something like "git-bisect but across repositories". Submodules (in theory) allow for fine grained bisection across these repository boundaries, I would think. > I figure that that scheme might work for you better than submodules: I > could imagine that you need to make the build artifacts available even to > people who are not permitted to look at the corresponding source code, > anyway. This is a sensible suggestion, as they probably don't want to ramp up development on submodules. :-)
Re: [RFC] cherry-pick notes to find out cherry-picks from the origin
Hello, Jeff. So, this is what I currently have. It still does the same thing but a lot more generic in terms of both interface and implementation. * All core logics are implemented as core helpers / features. * Trailer parsing and reverse-mapping in trailer_rev_xrefs_*(). * Note refs which start with xref- (cross-reference) are recognized by notes core. When notes are added, a dedicated combine_notes function is used to remove duplicates and curl unreachable commits. When xref- notes are formatted for printing, it automatically follows and prints nested xrefs. * note-cherry-picks is replaced with reverse-trailer-xrefs which can use other trailers, note refs and tags. --xref-cherry-picks option makes it use the cherry-pick presets. Please note that the patch is still a bit rough. I'm polishing and documenting. Please let me know what you think. Thanks. --- Makefile|1 builtin.h |1 builtin/reverse-trailer-xrefs.c | 148 git.c |1 notes.c | 245 +++- notes.h | 10 + object.c|4 object.h|6 trailer.c | 102 trailer.h | 26 10 files changed, 540 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index 1a44c811a..3c23ecf9d 100644 --- a/Makefile +++ b/Makefile @@ -1086,6 +1086,7 @@ BUILTIN_OBJS += builtin/multi-pack-index.o BUILTIN_OBJS += builtin/mv.o BUILTIN_OBJS += builtin/name-rev.o BUILTIN_OBJS += builtin/notes.o +BUILTIN_OBJS += builtin/reverse-trailer-xrefs.o BUILTIN_OBJS += builtin/pack-objects.o BUILTIN_OBJS += builtin/pack-redundant.o BUILTIN_OBJS += builtin/pack-refs.o diff --git a/builtin.h b/builtin.h index 6538932e9..51089e258 100644 --- a/builtin.h +++ b/builtin.h @@ -195,6 +195,7 @@ extern int cmd_multi_pack_index(int argc, const char **argv, const char *prefix) extern int cmd_mv(int argc, const char **argv, const char *prefix); extern int cmd_name_rev(int argc, const char **argv, const char *prefix); extern int cmd_notes(int argc, const char **argv, const char *prefix); +extern int cmd_reverse_trailer_xrefs(int argc, const char **argv, const char *prefix); extern int cmd_pack_objects(int argc, const char **argv, const char *prefix); extern int cmd_pack_redundant(int argc, const char **argv, const char *prefix); extern int cmd_patch_id(int argc, const char **argv, const char *prefix); diff --git a/builtin/reverse-trailer-xrefs.c b/builtin/reverse-trailer-xrefs.c new file mode 100644 index 0..b2879be6c --- /dev/null +++ b/builtin/reverse-trailer-xrefs.c @@ -0,0 +1,148 @@ +#include "builtin.h" +#include "cache.h" +#include "strbuf.h" +#include "repository.h" +#include "config.h" +#include "commit.h" +#include "blob.h" +#include "notes.h" +#include "notes-utils.h" +#include "trailer.h" +#include "revision.h" +#include "list-objects.h" +#include "object-store.h" +#include "parse-options.h" + +static const char * const reverse_trailer_xrefs_usage[] = { + N_("git reverse_trailer_xrefs [] [...]"), + NULL +}; + +static const char cherry_picked_prefix[] = "(cherry picked from commit "; +static int verbose; + +static void clear_trailer_xref_note(struct commit *commit, void *data) +{ + struct notes_tree *tree = data; + int status; + + status = remove_note(tree, commit->object.oid.hash); + + if (verbose) { + if (status) + fprintf(stderr, "Object %s has no note\n", + oid_to_hex(>object.oid)); + else + fprintf(stderr, "Removing note for object %s\n", + oid_to_hex(>object.oid)); + } +} + +static void record_trailer_xrefs(struct commit *commit, void *data) +{ + trailer_rev_xrefs_record(data, commit); +} + +static int note_trailer_xrefs(struct notes_tree *tree, + struct commit *from_commit, struct object_array *to_objs, + const char *tag) +{ + char from_hex[GIT_MAX_HEXSZ + 1]; + struct strbuf note = STRBUF_INIT; + struct object_id note_oid; + int i, ret; + + oid_to_hex_r(from_hex, _commit->object.oid); + + for (i = 0; i < to_objs->nr; i++) { + const char *hex = to_objs->objects[i].name; + + if (tag) + strbuf_addf(, "%s: %s\n", tag, hex); + else + strbuf_addf(, "%s\n", tag); + if (verbose) + fprintf(stderr, "Adding note %s -> %s\n", from_hex, hex); + } + + ret = write_object_file(note.buf, note.len, blob_type, _oid); + strbuf_release(); + if (ret) + return ret; + + ret = add_note(tree,
Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
On Tue, Dec 4, 2018 at 7:10 PM Junio C Hamano wrote: > > Stefan Beller writes: > > > This is a resend of sb/submodule-recursive-fetch-gets-the-tip, > > with all feedback addressed. As it took some time, I'll send it > > without range-diff, but would ask for full review. > > Is that a "resend" or reroll/update (or whatever word that does not > imply "just sending the same thing again")? As you noticed, it is an actual update. I started to use resend as DScho seems very unhappy about the word reroll claiming we'd be the only Software community that uses the term reroll for an iteration of a change. I see how resend could sound like retransmission without change. > child_process_init(cp); > - cp->dir = strbuf_detach(_path, NULL); > - prepare_submodule_repo_env(>env_array); > + cp->dir = xstrdup(repo->worktree); > + prepare_submodule_repo_env(>env_array); > > Hmph, I offhand do not see there would be any difference if you > assigned to cp->dir before or after preparing the repo env, but is > there a reason these two must be done in this updated order that I > am missing? Very similar changes appear multiple times in this > range-diff. Jonathan Tan asked for it to be "diff friendly". This -of course- is range-diff unfriendly. > [...] you seem to be OK with a lot of the changes, I did not find an actionable suggestion. Thanks for still queuing topics during -rc time, Stefan
Re: [wishlist] git submodule update --reset-hard
On Thu, Dec 6, 2018 at 1:25 PM Yaroslav Halchenko wrote: > > > On Thu, 06 Dec 2018, Stefan Beller wrote: > > > On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko > > wrote: > > > > Dear Git Gurus, > > > > I wondered what would be your take on my wishlist request to add > > > --reset-hard option, which would be very similar to regular "update" which > > > checks out necessary commit, but I want it to remain in the branch. > > > What if the branch differs from the sha1 recorded in the superproject? > > git reset --hard itself is an operation which should be done with some > level of competence in doing "the right thing" by calling it. You > can hop branches even in current (without any submodules in question) > repository with it and cause as much chaos as you desire. Right. git reset --hard would the branch (as well as the working tree) to the given sha1, which is confusing as submodules get involved. The Right Thing as of now is the sha1 as found in the superprojects gitlink. But as that can be different from any branch in the submodule, we'd rather detach the HEAD to make it deterministic. There was a proposal to "re-attach HEAD" in the submodule, i.e. if the branch branch points at the same commit, we don't need a detached HEAD, but could go with the branch instead. > If desired though, a number of prevention mechanisms could be in place (but > would require option(s) to overcome) to allow submodule to be reset --hard'ed > only when some conditions met (e.g. only to the commit which is among parent > commits path of the current branch). This way wild hops would be prevented, > although you might still end up in some feature branch. But since "reset > --hard" itself doesn't have any safe-guards, I do not really think they should > be implemented here either. So are you looking for a) "stay on submodule branch (i.e. HEAD still points at $branch), and reset --hard" such that the submodule has a clean index and at that $branch or b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch is set to the gitlink from the superproject, and then a reset --hard will have the worktree set to it as well. (a) is what the referenced submodule.repoLike option implements. I'd understand the desire for (b) as well, as it is a "real" hard reset on the superproject level, without detaching branches. > > git reset --hard --recurse-submodules HEAD > it does indeed some trick(s) but not all seems to be the ones I desire: > > 1. Seems to migrate submodule's .git directories into the top level > .git/modules Ah yes, that happens too. This will help once you want to git-rm a submodule and checkout states before and after. > > undesirable in the sense of still having local changes (that is what > > the above reset with `--recurse` would fix) or changed the branch > > state? (i.e. is detached but was on a branch before?) > > right -- I meant the local changes and indeed reset --recurse-submodules > indeed seems to recurse nicely. Then the undesired effect remaining only > the detached HEAD For that we may want to revive discussions in https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/ > > > git submodule update --recursive > > > > I would end up in the detached HEADs within submodules. > > > > What I want is to retain current branch they are at (or may be possible > > > "were in"? reflog records might have that information) > > > So something like > > > git submodule foreach --recursive git reset --hard > > > ? > > not quite -- this would just kill all local changes within each submodule, > not > to reset it to the desired state, which wouldn't be specified in such > invocation, and is only known to the repo containing it With this answer it sounds like you'd want (b) from above. > > You may be interested in > > https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/ > > which introduces a switch `submodule.repoLike [ = true]`, which > > when set would not detach HEAD in submodules. > > Thanks! looks interesting -- was there more discussion/activity beyond those 5 > posts in the thread? Unfortunately there was not. > This feature might indeed come handy but if I got it right, it is somewhat > complimentary to just having submodule update --reset-hard . E.g. submodules > might be in different branches (if I am not tracking based on branch names), > so > I would not want a recursive checkout with -b|-B. But we would indeed benefit > from such functionality, since this difficulty of managing branches of > submodules I think would be elevated with it! (e.g. in one
Re: [wishlist] git submodule update --reset-hard
On Thu, 06 Dec 2018, Stefan Beller wrote: > On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko > wrote: > > Dear Git Gurus, > > I wondered what would be your take on my wishlist request to add > > --reset-hard option, which would be very similar to regular "update" which > > checks out necessary commit, but I want it to remain in the branch. > What if the branch differs from the sha1 recorded in the superproject? git reset --hard itself is an operation which should be done with some level of competence in doing "the right thing" by calling it. You can hop branches even in current (without any submodules in question) repository with it and cause as much chaos as you desire. If desired though, a number of prevention mechanisms could be in place (but would require option(s) to overcome) to allow submodule to be reset --hard'ed only when some conditions met (e.g. only to the commit which is among parent commits path of the current branch). This way wild hops would be prevented, although you might still end up in some feature branch. But since "reset --hard" itself doesn't have any safe-guards, I do not really think they should be implemented here either. > > Rationale: In DataLad we heavily rely on submodules, and we have established > > easy ways to do some manipulations across full hierarchies of them. E.g. a > > single command could introduce a good number of commits across deep > > hierarchy > > of submodules, e.g. while committing changes within deep submodule, while > > also > > doing all necessary commits in the repositories leading to that submodule so > > the entire tree of them stays in a "clean" state. The difficulty comes when > > there is a need to just "forget" some changes. The obvious way is to e.g. > >git reset --hard PREVIOUS_STATE > git reset --hard --recurse-submodules HEAD > would do the trick it does indeed some trick(s) but not all seems to be the ones I desire: 1. Seems to migrate submodule's .git directories into the top level .git/modules $> git reset --hard --recurse-submodules HEAD^^^ Migrating git directory of 'famface' from '/tmp/gobbini/famface/.git' to '/tmp/gobbini/.git/modules/famface' Migrating git directory of 'famface/data' from '/tmp/gobbini/famface/data/.git' to '/tmp/gobbini/.git/modules/famface/modules/data' Migrating git directory of 'famface/data/scripts/mridefacer' from '/tmp/gobbini/famface/data/scripts/mridefacer/.git' to '/tmp/gobbini/.git/modules/famface/modules/data/modules/scripts/mridefacer' HEAD is now at 9b4296d [DATALAD] aggregated meta data we might eventually adopt this default already for years model (git annex seems to be ok, in that it then replaces .git symlink file with the actual symlink .git -> ../../.git/modules/... So things seems to keep working for annex) 2. It still does the detached HEAD for me $> git submodule status --recursive 2569ab436501a832d35afbbe9cc20ffeb6077eb1 famface (2569ab4) f1e8c9b8b025c311424283b9711efc6bc906ba2b famface/data (BIDS-v1.0.1) 49b0fe42696724c2a8492f999736056e51b77358 famface/data/scripts/mridefacer (49b0fe4) > > in the top level repository. But that leaves all the submodules now in > > the undesired state. If I do > undesirable in the sense of still having local changes (that is what > the above reset with `--recurse` would fix) or changed the branch > state? (i.e. is detached but was on a branch before?) right -- I meant the local changes and indeed reset --recurse-submodules indeed seems to recurse nicely. Then the undesired effect remaining only the detached HEAD > > git submodule update --recursive > > I would end up in the detached HEADs within submodules. > > What I want is to retain current branch they are at (or may be possible > > "were in"? reflog records might have that information) > So something like > git submodule foreach --recursive git reset --hard > ? not quite -- this would just kill all local changes within each submodule, not to reset it to the desired state, which wouldn't be specified in such invocation, and is only known to the repo containing it > You may be interested in > https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/ > which introduces a switch `submodule.repoLike [ = true]`, which > when set would not detach HEAD in submodules. Thanks! looks interesting -- was there more discussion/activity beyond those 5 posts in the thread? https://public-inbox.org/git/87h8i9ift4@evledraar.gmail.com/#r This feature might indeed come handy but if I got it right, it is somewhat complimentary to just having submodule update --reset-hard . E.g. submodules might be in different branches (if I am not tracking based on branch names), so I would not want a recursive checkout with -b|-B. But we would indeed benefit from such functionality, since this difficulty of managing
Re: git, monorepos, and access control
Hi, On Wed, 5 Dec 2018, Jeff King wrote: > The model that fits more naturally with how Git is implemented would be > to use submodules. There you leak the hash of the commit from the > private submodule, but that's probably obscure enough (and if you're > really worried, you can add a random nonce to the commit messages in the > submodule to make their hashes unguessable). I hear myself frequently saying: "Friends don't let friends use submodules". It's almost like: "Some people think their problem is solved by using submodules. Only now they have two problems." There are big reasons, after all, why some companies go for monorepos: it is not for lack of trying to go with submodules, it is the problems that were incurred by trying to treat entire repositories the same as single files (or even trees): they are just too different. In a previous life, I also tried to go for submodules, was burned, and had to restart the whole thing. We ended up with something that might work in this instance, too, although our use case was not need-to-know type of encapsulation. What we went for was straight up modularization. What I mean is that we split the project up into over 100 individual projects that are now all maintained in individual repositories, and they are connected completely outside of Git, via a dependency management system (in this case, Maven, although that is probably too Java-centric for AMD's needs). I just wanted to throw that out here: if you can split up your project into individual projects, it might make sense not to maintain them as submodules but instead as individual repositories whose artifacts are uploaded into a central, versioned artifact store (Maven, NuGet, etc). And those artifacts would then be retrieved by the projects that need them. I figure that that scheme might work for you better than submodules: I could imagine that you need to make the build artifacts available even to people who are not permitted to look at the corresponding source code, anyway. Ciao, Johannes
Re: enhancement: support for author.email and author.name in "git config"
On Thu, Dec 06, 2018 at 07:38:08PM +0100, Martin Ågren wrote: > Hi William, > > [...] > > This idea was floated a couple of months ago [1]. Junio seemed to find > the request sensible and outlined a design. No patches materialized, as > far as I know, but that could be because Eric suggested a tool called > direnv. Maybe that would work for you. Yes, this design would solve the issue. I'll take a look at direnv, but it would be nice to have native git support for this. :-) Thanks, William > [1] > https://public-inbox.org/git/0f66ad7a-2289-2cce-6533-a27e19945...@rasmusvillemoes.dk/ > > Martin
Re: BUG: CR marker ^M doesn't show up in '-' lines of diffs when the ending of the removed line is CR+LF
Am 06.12.18 um 01:58 schrieb Junio C Hamano: > Frank Schäfer writes: > >> Just to be sure that I'm not missing anything here: >> What's your definition of "LF in repository, CRLF in working tree" in >> terms of config parameters ? > :::Documentation/config/core.txt::: > > core.autocrlf:: > Setting this variable to "true" is the same as setting > the `text` attribute to "auto" on all files and core.eol to "crlf". > Set to true if you want to have `CRLF` line endings in your > working directory and the repository has LF line endings. > This variable can be set to 'input', > in which case no output conversion is performed. Argh... crap. I was missing that I have to set the "text" attribute manually... Thats why core.eol=crlf didn't make a difference in my tests. :/ Let me thoroughly re-validate all cases. I will likely not find the time for that before Monday, but I think that break could be helpful. ;) Regards, Frank
Re: [RFC PATCH] Introduce "precious" file concept
On Tue, Nov 27, 2018 at 1:56 PM Jacob Keller wrote: > Personally, I would rather err on the side which requires the least > interaction from users to avoid silently clobbering an ignored file. > > Either Duy's solution with a sort of "untracked" reflog, or the > garbage/trashable notion. The "untracked reflog" is partially functional now [1] if you want to have a look. I'm not going to post the series until post-2.20, but if you do look, I suggest the first patch that lays out the design and a plumbing command to manage it. Basically you'll do git backup-log --id=worktree log then pick up the version you like with "git backup-log cat" and do whatever you want with it. High level UI is not there and will be a topic of discussion. The precious/trashable/garbage notion can be used to suppress making backups. [1] https://gitlab.com/pclouds/git/commits/backup-log -- Duy
Re: enhancement: support for author.email and author.name in "git config"
Hi William, On Thu, 6 Dec 2018 at 19:18, William Hubbs wrote: > We are in a situation where we would like to use author information that is > different from committer information when we commit to certain > repositories. [...] > [...] I would like to propose the addition of author.email and > author.name settings to the git-config system. > > Additionally you could add committer.name and committer.email, but the > only reason I bring the committer variations up is consistency since I > see you also have GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment > variables. This idea was floated a couple of months ago [1]. Junio seemed to find the request sensible and outlined a design. No patches materialized, as far as I know, but that could be because Eric suggested a tool called direnv. Maybe that would work for you. [1] https://public-inbox.org/git/0f66ad7a-2289-2cce-6533-a27e19945...@rasmusvillemoes.dk/ Martin
Re: [wishlist] git submodule update --reset-hard
On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko wrote: > > Dear Git Gurus, > > I wondered what would be your take on my wishlist request to add > --reset-hard option, which would be very similar to regular "update" which > checks out necessary commit, but I want it to remain in the branch. What if the branch differs from the sha1 recorded in the superproject? > Rationale: In DataLad we heavily rely on submodules, and we have established > easy ways to do some manipulations across full hierarchies of them. E.g. a > single command could introduce a good number of commits across deep hierarchy > of submodules, e.g. while committing changes within deep submodule, while also > doing all necessary commits in the repositories leading to that submodule so > the entire tree of them stays in a "clean" state. The difficulty comes when > there is a need to just "forget" some changes. The obvious way is to e.g. > >git reset --hard PREVIOUS_STATE git reset --hard --recurse-submodules HEAD would do the trick > in the top level repository. But that leaves all the submodules now in > the undesired state. If I do undesirable in the sense of still having local changes (that is what the above reset with `--recurse` would fix) or changed the branch state? (i.e. is detached but was on a branch before?) > git submodule update --recursive > > I would end up in the detached HEADs within submodules. > > What I want is to retain current branch they are at (or may be possible > "were in"? reflog records might have that information) So something like git submodule foreach --recursive git reset --hard ? You may be interested in https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/ which introduces a switch `submodule.repoLike [ = true]`, which when set would not detach HEAD in submodules. Can you say more about the first question above: Would you typically have situations where the submodule branch is out of sync with the superproject and how do you deal with that? Adding another mode to `git submodule update` sounds reasonable to me, too. Stefan
Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others
On Thu, Dec 6, 2018 at 6:58 AM Phillip Wood wrote: > > So is there some "must be at least two consecutive lines" condition for > > not-plain, or is something else going on here? > > To be considered a block has to have 20 alphanumeric characters - see > commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars", > 2017-08-15). This stops things like random '}' lines being marked as > moved on their own. This is spot on. All but the "plain" mode use the concept of "blocks" of code (there is even one mode called "blocks", which adds to the confusion). > It might be better to use some kind of frequency > information (a bit like python's difflib junk parameter) instead so that > (fairly) unique short lines also get marked properly. Yes that is what I was initially thinking about. However to have good information, you'd need to index a whole lot (the whole repository, i.e. all text blobs in existence?) to get an accurate picture of frequency information, which I'd prefer to call entropy as I come from a background familiar with https://en.wikipedia.org/wiki/Information_theory, I am not sure where 'frequency information' comes from -- it sounds like the same concept. Of course it is too expensive to run an operation O(repository size) just for this diff, so maybe we could get away with some smaller corpus to build up this information on what is sufficient for coloring. When only looking at the given diff, I would imagine that each line would not carry a whole lot of information as its characters occur rather frequently compared to the rest of the diff. Best, Stefan
Re: [BUG REPORT] Git does not correctly replay bisect log
On Thu, 2018-12-06 at 17:31 +0100, Christian Couder wrote: > > When Git replays the bisect log, it only updates refs/bisect/bad, > > refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in > > .git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies > > that all good commits are ancestors of the bad commit, and if not, the > > message "Bisecting: a merge base must be tested" is printed and the > > branch is switched to the merge base of the bad and all the good > > commits. > > I am not sure if you are talking about running `git bisect replay` or > sourcing the log in the above. I am talking about `git bisect replay`. The shell script, as far as I can see, only updates the references (ref/bisect/*) and never checks if the revisions marked as 'good' are ancestors of the 'bad' one. Therefore, $GIT_DIR/BISECT_ANCESTORS_OK file is never created. The first time the ancestors are checked is in the helper (`git-bisect- -help --next-all`) that has only limited information from refs/bisect*.
Re: [BUG REPORT] Git does not correctly replay bisect log
Hi, On Thu, Dec 6, 2018 at 3:43 PM Lukáš Krejčí wrote: > > Hello again, > > after looking into this today, I'm not sure if this can be considered a > bug - it's just that I expected Git to check out the exact commit to > test that was there before resetting the bisect. That made me uncertain > whether Git restored the correct state. > > When I looked at what Git actually does, it became clear that the > behavior is not incorrect but perhaps a bit surprising. Yeah, I agree. I suspect, but I am not sure, that the difference of behavior is because in one case we only check merge bases once at the beginning (maybe because the BISECT_ANCESTORS_OK file always exists) while in the other case we check them more than once during the bisection. I haven't had time to look closely at this, but I would like to. > When Git replays the bisect log, it only updates refs/bisect/bad, > refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in > .git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies > that all good commits are ancestors of the bad commit, and if not, the > message "Bisecting: a merge base must be tested" is printed and the > branch is switched to the merge base of the bad and all the good > commits. I am not sure if you are talking about running `git bisect replay` or sourcing the log in the above. > Basically, some state is lost because Git "forgot" the first good > commit from the log already was an ancestor of the first bad one. The BISECT_ANCESTORS_OK file should be there to avoid forgetting that we already checked the merge bases. > In other words, it's as if I just started the bisect with the following > commands and just pasted the whole bisect log to .git/BISECT_LOG: > > $ git bisect start > $ git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703 > $ git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d > $ git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0 > Bisecting: a merge base must be tested > [1e4b044d22517cae7047c99038abb23243ca] Linux 4.18-rc4 Yeah, when we start a new bisection the BISECT_ANCESTORS_OK file should be erased if it exists, while it shouldn't be erased when we are already in the middle of an existing bisection. [...] > And indeed, marking the merge base as good switches to the correct > commit after the bisect. Marking it as bad will fail, so at least you > can't make a mistake after replaying the bisect log: > $ git bisect bad > The merge base 1e4b044d22517cae7047c99038abb23243ca is bad. > This means the bug has been fixed between > 1e4b044d22517cae7047c99038abb23243ca and > [94710cac0ef4ee177a63b5227664b38c95bbf703 > 958f338e96f874a0d29442396d6adf9c1e17aa2d]. Yeah, I think this works as expected. > Once again, I'm sorry for the noise. I guess it wasn't clear from the > man page that something like this could happen and that made me think > that this was a bug. No reason to be sorry, there might still be a bug related to the BISECT_ANCESTORS_OK file or something. I hope I can take a look at this more closely soon. Thanks for your report and your work on this, Christian.
Re: gitweb: local configuration not found
Hello! > Yeah, it does look indirect. Despite what you said, it also would > support users giving an absolute path via GITWEB_CONFIG. > > With "use File::Spec", perhaps something like this? Yes, this looks right. Martin
Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others
Hi Ævar On 06/12/2018 13:54, Ævar Arnfjörð Bjarmason wrote: Let's ignore how bad this patch is for git.git, and just focus on how diff.colorMoved treats it: diff --git a/builtin/add.c b/builtin/add.c index f65c172299..d1155322ef 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -6,5 +6,3 @@ #include "cache.h" -#include "config.h" #include "builtin.h" -#include "lockfile.h" #include "dir.h" diff --git a/builtin/am.c b/builtin/am.c index 8f27f3375b..eded15aa8a 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -6,3 +6,2 @@ #include "cache.h" -#include "config.h" #include "builtin.h" diff --git a/builtin/blame.c b/builtin/blame.c index 06a7163ffe..44a754f190 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -8,3 +8,2 @@ #include "cache.h" -#include "config.h" #include "color.h" diff --git a/cache.h b/cache.h index ca36b44ee0..ea8d60b94a 100644 --- a/cache.h +++ b/cache.h @@ -4,2 +4,4 @@ #include "git-compat-util.h" +#include "config.h" +#include "new.h" #include "strbuf.h" This is a common thing that's useful to have highlighted, e.g. we move includes of config.h to some common file, so I want to se all the deleted config.h lines as moved into the cache.h line, and then the "lockfile.h" I removed while I was at it plain remove, and the new "new.h" plain added. Exactly that is what you get with diff.colorMoved=plain, but the default of diff.colorMoved=zebra gets confused by this and highlights no moves at all, same or "blocks" and "dimmed-zebra". So at first I thought this had something to do with the many->one detection, but it seems to be simpler, we just don't detect a move of 1-line with anything but plain, e.g. this works as expected in all modes and detects the many->one: diff --git a/builtin/add.c b/builtin/add.c index f65c172299..f4fda75890 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -5,4 +5,2 @@ */ -#include "cache.h" -#include "config.h" #include "builtin.h" diff --git a/builtin/branch.c b/builtin/branch.c index 0c55f7f065..52e39924d3 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -7,4 +7,2 @@ -#include "cache.h" -#include "config.h" #include "color.h" diff --git a/cache.h b/cache.h index ca36b44ee0..d4146dbf8a 100644 --- a/cache.h +++ b/cache.h @@ -3,2 +3,4 @@ +#include "cache.h" +#include "config.h" #include "git-compat-util.h" So is there some "must be at least two consecutive lines" condition for not-plain, or is something else going on here? To be considered a block has to have 20 alphanumeric characters - see commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars", 2017-08-15). This stops things like random '}' lines being marked as moved on their own. It might be better to use some kind of frequency information (a bit like python's difflib junk parameter) instead so that (fairly) unique short lines also get marked properly. Best Wishes Phillip
Re: [BUG REPORT] Git does not correctly replay bisect log
Hello again, after looking into this today, I'm not sure if this can be considered a bug - it's just that I expected Git to check out the exact commit to test that was there before resetting the bisect. That made me uncertain whether Git restored the correct state. When I looked at what Git actually does, it became clear that the behavior is not incorrect but perhaps a bit surprising. When Git replays the bisect log, it only updates refs/bisect/bad, refs/bisect/good-*, refs/bisect/skip-* and reconstructs the log in .git/BISECT_LOG. After that check_good_are_ancestors_of_bad() verifies that all good commits are ancestors of the bad commit, and if not, the message "Bisecting: a merge base must be tested" is printed and the branch is switched to the merge base of the bad and all the good commits. Basically, some state is lost because Git "forgot" the first good commit from the log already was an ancestor of the first bad one. In other words, it's as if I just started the bisect with the following commands and just pasted the whole bisect log to .git/BISECT_LOG: $ git bisect start $ git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703 $ git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d $ git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0 Bisecting: a merge base must be tested [1e4b044d22517cae7047c99038abb23243ca] Linux 4.18-rc4 (here's the full bisect log again for reference) git bisect start # bad: [5b394b2ddf0347bef56e50c69a58773c94343ff3] Linux 4.19-rc1 git bisect bad 5b394b2ddf0347bef56e50c69a58773c94343ff3 # good: [94710cac0ef4ee177a63b5227664b38c95bbf703] Linux 4.18 git bisect good 94710cac0ef4ee177a63b5227664b38c95bbf703 # bad: [54dbe75bbf1e189982516de179147208e90b5e45] Merge tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm git bisect bad 54dbe75bbf1e189982516de179147208e90b5e45 # bad: [0a957467c5fd46142bc9c52758ffc552d4c5e2f7] x86: i8259: Add missing include file git bisect bad 0a957467c5fd46142bc9c52758ffc552d4c5e2f7 # good: [958f338e96f874a0d29442396d6adf9c1e17aa2d] Merge branch 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 958f338e96f874a0d29442396d6adf9c1e17aa2d # bad: [2c20443ec221dcb76484b30933593e8ecd836bbd] Merge tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 2c20443ec221dcb76484b30933593e8ecd836bbd # bad: [c2fc71c9b74c1e87336a27dba1a5edc69d2690f1] Merge tag 'mtd/for-4.19' of git://git.infradead.org/linux-mtd git bisect bad c2fc71c9b74c1e87336a27dba1a5edc69d2690f1 # bad: [b86d865cb1cae1e61527ea0b8977078bbf694328] blkcg: Make blkg_root_lookup() work for queues in bypass mode git bisect bad b86d865cb1cae1e61527ea0b8977078bbf694328 # bad: [1b0d274523df5ef1caedc834da055ff721e4d4f0] nvmet: don't use uuid_le type git bisect bad 1b0d274523df5ef1caedc834da055ff721e4d4f0 And indeed, marking the merge base as good switches to the correct commit after the bisect. Marking it as bad will fail, so at least you can't make a mistake after replaying the bisect log: $ git bisect bad The merge base 1e4b044d22517cae7047c99038abb23243ca is bad. This means the bug has been fixed between 1e4b044d22517cae7047c99038abb23243ca and [94710cac0ef4ee177a63b5227664b38c95bbf703 958f338e96f874a0d29442396d6adf9c1e17aa2d]. Once again, I'm sorry for the noise. I guess it wasn't clear from the man page that something like this could happen and that made me think that this was a bug.
Re: [PATCH 2/2] commit-graph: fix buffer read-overflow
On 12/5/2018 5:32 PM, Josh Steadmon wrote: + if (chunk_lookup + GRAPH_CHUNKLOOKUP_WIDTH > data + graph_size) { + error(_("chunk lookup table entry missing; graph file may be incomplete")); + free(graph); + return NULL; + } Something I forgot earlier: there are several tests in t5318-commit-graph.sh that use 'git commit-graph verify' to ensure we hit these error conditions on a corrupted commit-graph file. Could you try adding a test there that looks for this error message? Thanks, -Stolee
Re: [PATCH v2] l10n: update German translation
Hi, thanks for your great work! Just two remarks: #: midx.c:407 -#, fuzzy, c-format +#, c-format msgid "failed to add packfile '%s'" -msgstr "Fehler beim Lesen der Reihenfolgedatei '%s'." +msgstr "Fehler beim Hinzufügen von Packdatei'%s'." A Space is missing: "Fehler beim Hinzufügen von Packdatei '%s'." #: run-command.c:1229 -#, fuzzy, c-format +#, c-format msgid "cannot create async thread: %s" -msgstr "kann Thread nicht erzeugen: %s" +msgstr "Kann Thread für async nicht erzeugen: %s" I think we should use "Konnte" here. Best regards, Phillip
Re: Any way to make git-log to enumerate commits?
On Thu, Dec 06, 2018 at 09:31:36AM +0900, Junio C Hamano wrote: > >> It would be great if git-log has a formatting option to insert an > >> index of the current commit since HEAD. > >> > >> It would allow after quitting the git-log to immediately fire up "git > >> rebase -i HEAD~index" instead of "git rebase -i > >> go-copy-paste-this-long-number-id". > > > > This may have little sense in a general case as the history maintained > > by Git is a graph, not a single line. Hence your prospective approach > > would only work for cases like `git log` called with the > > "--first-parent" command-line option. > > I do not see why the "name each rev relative to HEAD" formatting > option cannot produce HEAD^2~2 etc. > > It would be similar to "git log | git name-rev --stdin" but I do not > offhand recall if we had a way to tell name-rev to use only HEAD as > the anchoring point. My reading was that the OP explicitly wanted to just glance at a single integer number and use it right away in a subsequent rebase command. I mean, use one's own short memory instead of copying and pasting. The way I decided to format the reference in my sketch script — using HEAD~ — is just a byproduct of the fact I was aware both of the "gitrevisions" manual page and the fact `git name-rev` exists (though I regretfully was not aware it's able to process a stream of `git log`). Hence while getting fancy names for revisions would be technically correct but less error-prone for retyping from memory ;-)
Re: git, monorepos, and access control
On Thu, Dec 06, 2018 at 10:17:24AM +0100, Ævar Arnfjörð Bjarmason wrote: > > The other major user of that feature I can think of is LFS. There Git > > ends up diffing the LFS pointers, not the big files. Which arguably is > > the wrong thing (you'd prefer to see the actual file contents diffed), > > but I think nobody cares in practice because large files generally don't > > have readable diffs anyway. > > I don't use this either, but I can imagine people who use binary files > via clean/smudge would be well served by dumping out textual metadata of > the file for diffing instead of showing nothing. > > E.g. for a video file I might imagine having lines like: > > duration-seconds: 123 > camera-model: Shiny Thingamabob > > Then when you check in a new file your "git diff" will show (using > normal diff view) that: > >- duration-seconds: 123 >+ duration-seconds: 321 > camera-model: Shiny Thingamabob I think that's orthogonal to clean/smudge, though. Neither the in-repo nor on-disk formats are going to show that kind of output. For that you'd want a separate textconv filter (and fun fact: showing exif data was actually the original use case for which I wrote textconv). If you are using something like LFS, using textconv on top is a little trickier, because we'd always feed the filter the LFS pointer file, not the actual data contents. Doing the "reversal" that Junio suggested would fix that. Or with the code as it is, you can simply define your filter to convert the LFS pointer data into the real content. I don't really use LFS, but it looks like: [diff "mp4"] textconv = git lfs smudge | extract-metadata would probably work. -Peff
Re: git, monorepos, and access control
On Thu, Dec 06 2018, Jeff King wrote: > On Thu, Dec 06, 2018 at 10:08:57AM +0900, Junio C Hamano wrote: > >> Jeff King writes: >> >> > In my opinion this feature is so contrary to Git's general assumptions >> > that it's likely to create a ton of information leaks of the supposedly >> > protected data. >> > ... >> >> Yup, with s/implemented/designed/, I agree all you said here >> (snipped). > > Heh, yeah, I actually scratched my head over what word to use. I think > Git _could_ be written in a way that is both compatible with existing > repositories (i.e., is still recognizably Git) and is careful about > object access control. But either way, what we have now is not close to > that. > >> > Sorry I don't have a more positive response. What you want to do is >> > perfectly reasonable, but I just think it's a mismatch with how Git >> > works (and because of the security impact, one missed corner case >> > renders the whole thing useless). >> >> Yup, again. >> >> Storing source files encrypted and decrypting with smudge filter >> upon checkout (and those without the access won't get keys and will >> likely to use sparse checkout to exclude these priviledged sources) >> is probably the only workaround that does not involve submodules. >> Viewing "diff" and "log -p" would still be a challenge, which >> probably could use the same filter as smudge for textconv. > > I suspect there are going to be some funny corner cases there. I use: > > [diff "gpg"] > textconv = gpg -qd --no-tty > > which works pretty well, but it's for files which are _never_ decrypted > by Git. So they're encrypted in the working tree too, and I don't use > clean/smudge filters. > > If the files are already decrypted in the working tree, then running > them through gpg again would be the wrong thing. I guess for a diff > against the working tree, we would always do a "clean" operation to > produce the encrypted text, and then decrypt the result using textconv. > Which would work, but is rather slow. > >> I wonder (and this is the primary reason why I am responding to you) >> if it is common enough wish to use the same filter for smudge and >> textconv? So far, our stance (which can be judged from the way the >> clean/smudge filters are named) has been that the in-repo >> representation is the canonical, and the representation used in the >> checkout is ephemeral, and that is why we run "diff", "grep", >> etc. over the in-repo representation, but the "encrypted in repo, >> decrypted in checkout" abuse would be helped by an option to do the >> reverse---find changes and look substrings in the representation >> used in the checkout. I am not sure if there are other use cases >> that is helped by such an option. > > Hmm. Yeah, I agree with your line of reasoning here. I'm not sure how > common it is. This is the first I can recall it. And personally, I have > never really used clean/smudge filters myself, beyond some toy > experiments. > > The other major user of that feature I can think of is LFS. There Git > ends up diffing the LFS pointers, not the big files. Which arguably is > the wrong thing (you'd prefer to see the actual file contents diffed), > but I think nobody cares in practice because large files generally don't > have readable diffs anyway. I don't use this either, but I can imagine people who use binary files via clean/smudge would be well served by dumping out textual metadata of the file for diffing instead of showing nothing. E.g. for a video file I might imagine having lines like: duration-seconds: 123 camera-model: Shiny Thingamabob Then when you check in a new file your "git diff" will show (using normal diff view) that: - duration-seconds: 123 + duration-seconds: 321 camera-model: Shiny Thingamabob etc.
Re: git, monorepos, and access control
On Wed, Dec 05, 2018 at 11:42:09PM +, Coiner, John wrote: > > For instance, Git is very eager to try to find delta-compression > > opportunities between objects, even if they don't have any relationship > > within the tree structure. So imagine I want to know the contents of > > tree X. I push up a tree Y similar to X, then fetch it back, falsely > > claiming to have X but not Y. If the server generates a delta, that may > > reveal information about X (which you can then iterate to send Y', and > > so on, treating the server as an oracle until you've guessed the content > > of X). > Another good point. I wouldn't have thought of either of these attacks. > You're scaring me (appropriately) about the risks of adding security to > a previously-unsecured interface. Let me push on the smudge/clean > approach and maybe that will bear fruit. If you do look into that approach, check out how git-lfs works. In fact, you might even be able to build around lfs itself. It's already putting placeholder objects into the repository, and then faulting them in from external storage. All you would need to do is lock down access to that external storage, which is typically accessed via http. (That all assumes you're OK with sharing the actual filenames with everybody, and just restricting access to the blob contents. There's no way to clean/smudge a whole subtree. For that you'd have to use submodules). -Peff
Re: git, monorepos, and access control
On Thu, Dec 06, 2018 at 10:08:57AM +0900, Junio C Hamano wrote: > Jeff King writes: > > > In my opinion this feature is so contrary to Git's general assumptions > > that it's likely to create a ton of information leaks of the supposedly > > protected data. > > ... > > Yup, with s/implemented/designed/, I agree all you said here > (snipped). Heh, yeah, I actually scratched my head over what word to use. I think Git _could_ be written in a way that is both compatible with existing repositories (i.e., is still recognizably Git) and is careful about object access control. But either way, what we have now is not close to that. > > Sorry I don't have a more positive response. What you want to do is > > perfectly reasonable, but I just think it's a mismatch with how Git > > works (and because of the security impact, one missed corner case > > renders the whole thing useless). > > Yup, again. > > Storing source files encrypted and decrypting with smudge filter > upon checkout (and those without the access won't get keys and will > likely to use sparse checkout to exclude these priviledged sources) > is probably the only workaround that does not involve submodules. > Viewing "diff" and "log -p" would still be a challenge, which > probably could use the same filter as smudge for textconv. I suspect there are going to be some funny corner cases there. I use: [diff "gpg"] textconv = gpg -qd --no-tty which works pretty well, but it's for files which are _never_ decrypted by Git. So they're encrypted in the working tree too, and I don't use clean/smudge filters. If the files are already decrypted in the working tree, then running them through gpg again would be the wrong thing. I guess for a diff against the working tree, we would always do a "clean" operation to produce the encrypted text, and then decrypt the result using textconv. Which would work, but is rather slow. > I wonder (and this is the primary reason why I am responding to you) > if it is common enough wish to use the same filter for smudge and > textconv? So far, our stance (which can be judged from the way the > clean/smudge filters are named) has been that the in-repo > representation is the canonical, and the representation used in the > checkout is ephemeral, and that is why we run "diff", "grep", > etc. over the in-repo representation, but the "encrypted in repo, > decrypted in checkout" abuse would be helped by an option to do the > reverse---find changes and look substrings in the representation > used in the checkout. I am not sure if there are other use cases > that is helped by such an option. Hmm. Yeah, I agree with your line of reasoning here. I'm not sure how common it is. This is the first I can recall it. And personally, I have never really used clean/smudge filters myself, beyond some toy experiments. The other major user of that feature I can think of is LFS. There Git ends up diffing the LFS pointers, not the big files. Which arguably is the wrong thing (you'd prefer to see the actual file contents diffed), but I think nobody cares in practice because large files generally don't have readable diffs anyway. -Peff
Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
Jeff King writes: > Each "wait" will try to collect all processes, but may be interrupted by > a signal. So the correct number is actually "1 plus the number of times > the user hits ^C". Yeah and that is not bounded. It is OK not to catch multiple ^C that races with what we do, so having ane extra wait in the clean-up procedure after receiving a signal like you suggested would be both good enough and the cleanest solution, I think. Thanks.
Re: [WIP RFC 4/5] upload-pack: refactor writing of "packfile" line
Jonathan Tan writes: > @@ -126,6 +129,12 @@ static int read_pack_objects_stdout(int outfd, struct > output_state *os) > } > os->used += readsz; > > + if (!os->packfile_started) { > + os->packfile_started = 1; > + if (use_protocol_v2) > + packet_write_fmt(1, "packfile\n"); If we fix this function so that the only byte in the buffer is held back without emitted when os->used == 1 as I alluded to, this may have to be done a bit later, as with such a change, it is no longer guaranteed that send_client_data() will be called after this point. > + } > + > if (os->used > 1) { > send_client_data(1, os->buffer, os->used - 1); > os->buffer[0] = os->buffer[os->used - 1]; > +static void flush_progress_buf(struct strbuf *progress_buf) > +{ > + if (!progress_buf->len) > + return; > + send_client_data(2, progress_buf->buf, progress_buf->len); > + strbuf_reset(progress_buf); > +} Interesting. > static void create_pack_file(const struct object_array *have_obj, > - const struct object_array *want_obj) > + const struct object_array *want_obj, > + int use_protocol_v2) > { > struct child_process pack_objects = CHILD_PROCESS_INIT; > struct output_state output_state = {0}; > @@ -260,9 +278,13 @@ static void create_pack_file(const struct object_array > *have_obj, >*/ > sz = xread(pack_objects.err, progress, > sizeof(progress)); > - if (0 < sz) > - send_client_data(2, progress, sz); > - else if (sz == 0) { > + if (0 < sz) { > + if (output_state.packfile_started) > + send_client_data(2, progress, sz); > + else > + strbuf_add(_state.progress_buf, > +progress, sz); Isn't progress output that goes to the channel #2 pretty much independent from the payload stream that carries the pkt-line command like "packfile" plus the raw pack stream? It somehow feels like an oxymoron to _buffer_ progress indicator, as it defeats the whole point of progress report to buffer it.
Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
On Thu, Dec 06, 2018 at 09:22:23AM +0900, Junio C Hamano wrote: > > So the right number of waits is either "1" or "2". Looping means we do > > too many (which is mostly a harmless noop) or too few (in the off chance > > that you have only a single job ;) ). So it works out in practice. > > Well, if you time your ^C perfectly, no number of waits is right, I > am afraid. You spawn N processes and while looping N times waiting > for them, you can ^C out of wait before these N processes all die, > no? Each "wait" will try to collect all processes, but may be interrupted by a signal. So the correct number is actually "1 plus the number of times the user hits ^C". I had assumed the user was just hitting it once, though putting the wait into the trap means we do that "1 plus" thing anyway. I could also see an argument that subsequent ^C's should exit immediately, but I think we are getting well into the realm of over-engineering. -Peff
Re: [PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph
Josh Steadmon writes: > diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c > new file mode 100644 > index 00..420851d0d2 > --- /dev/null > +++ b/fuzz-commit-graph.c > @@ -0,0 +1,18 @@ > +#include "object-store.h" > +#include "commit-graph.h" > + > +struct commit_graph *parse_commit_graph(void *graph_map, int fd, > + size_t graph_size); > + > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); > + > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) > +{ > + struct commit_graph *g; > + > + g = parse_commit_graph((void *) data, -1, size); > + if (g) > + free(g); As it is perfectly OK to free(NULL), please lose "if (g)" and a level of indentation; otherwise, "make coccicheck" would complain. Thanks. > + return 0; > +}
Re: [PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph
Junio C Hamano writes: >> +if (graph_size < GRAPH_MIN_SIZE) >> +return NULL; >> + > > The load_commit_graph_one() grabbed graph_map out of xmmap() so it > is guaranteed to be non-NULL, but we need to check graph_map != NULL > when we're calling this directly from the fuzz tests, exactly in the > same spirit that we check graph_size above. Besides, these are to > make sure future callers won't misuse the API. Insert "Please check graph_map != NULL here, too." before the above paragraph. >> data = (const unsigned char *)graph_map; > > And the reset of the function is the same as the original modulo > jumping to the cleanup_fail label has been replaced with returning > NULL. > > Looks good. Of course, s/reset/rest/ is what I meant.
Re: [PATCH 1/2] commit-graph, fuzz: Add fuzzer for commit-graph
Josh Steadmon writes: > @@ -108,27 +106,61 @@ struct commit_graph *load_commit_graph_one(const char > *graph_file) > die(_("graph file %s is too small"), graph_file); > } > graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); > + ret = parse_commit_graph(graph_map, fd, graph_size); OK, assuming that the new helper returns NULL from all places in the original that would have jumped to the cleanup-fail label, this would do the same thing (it may not be the right thing to exit, but that is OK for the purpose of this change). > + if (ret == NULL) { > + munmap(graph_map, graph_size); > + close(fd); > + exit(1); > + } > + > + return ret; > +} > + > +/* > + * This function is intended to be used only from load_commit_graph_one() or > in > + * fuzz tests. > + */ Hmph, is that necessary to say? "Right now, it has only these two callers" is sometimes handy for those without good static analysis tools, like "git grep" ;-), but I do not think of a reason why a new caller we'll add in the future, who happens to have the data of the graph file in-core, should not be allowed to call this function. > +struct commit_graph *parse_commit_graph(void *graph_map, int fd, > + size_t graph_size) > +{ > + const unsigned char *data, *chunk_lookup; > + uint32_t i; > + struct commit_graph *graph; > + uint64_t last_chunk_offset; > + uint32_t last_chunk_id; > + uint32_t graph_signature; > + unsigned char graph_version, hash_version; > + > + /* > + * This should already be checked in load_commit_graph_one, but we still > + * need a check here for when we're calling parse_commit_graph directly > + * from fuzz tests. We can omit the error message in that case. > + */ In the same spirit, I am dubious of the longer-term value of this comment. As an explanation of why this conversion is correct in the proposed log message for this change, it perfectly makes sense, though. > + if (graph_size < GRAPH_MIN_SIZE) > + return NULL; > + The load_commit_graph_one() grabbed graph_map out of xmmap() so it is guaranteed to be non-NULL, but we need to check graph_map != NULL when we're calling this directly from the fuzz tests, exactly in the same spirit that we check graph_size above. Besides, these are to make sure future callers won't misuse the API. > data = (const unsigned char *)graph_map; And the reset of the function is the same as the original modulo jumping to the cleanup_fail label has been replaced with returning NULL. Looks good. > ... > - > -cleanup_fail: > - munmap(graph_map, graph_size); > - close(fd); > - exit(1); > } > > static void prepare_commit_graph_one(struct repository *r, const char > *obj_dir) > diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c > new file mode 100644 > index 00..420851d0d2 > --- /dev/null > +++ b/fuzz-commit-graph.c > @@ -0,0 +1,18 @@ > +#include "object-store.h" > +#include "commit-graph.h" > + > +struct commit_graph *parse_commit_graph(void *graph_map, int fd, > + size_t graph_size); > + > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size); > + > +int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) > +{ > + struct commit_graph *g; > + > + g = parse_commit_graph((void *) data, -1, size); > + if (g) > + free(g); > + > + return 0; > +}
Re: [PATCH v3] list-objects.c: don't segfault for missing cmdline objects
Matthew DeVore writes: > When a command is invoked with both --exclude-promisor-objects, > --objects-edge-aggressive, and a missing object on the command line, > the rev_info.cmdline array could get a NULL pointer for the value of > an 'item' field. Prevent dereferencing of a NULL pointer in that > situation. > > Properly handle --ignore-missing. If it is not passed, die when an > object is missing. Otherwise, just silently ignore it. > > Signed-off-by: Matthew DeVore Thanks for an update. Will replace. > --- > revision.c | 2 ++ > t/t0410-partial-clone.sh | 16 ++-- > 2 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/revision.c b/revision.c > index 13e0519c02..293303b67d 100644 > --- a/revision.c > +++ b/revision.c > @@ -1729,6 +1729,8 @@ int handle_revision_arg(const char *arg_, struct > rev_info *revs, int flags, unsi > if (!cant_be_filename) > verify_non_filename(revs->prefix, arg); > object = get_reference(revs, arg, , flags ^ local_flags); > + if (!object) > + return revs->ignore_missing ? 0 : -1; > add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags); > add_pending_object_with_path(revs, object, arg, oc.mode, oc.path); > free(oc.path); > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index ba3887f178..169f7f10a7 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -349,7 +349,7 @@ test_expect_success 'rev-list stops traversal at promisor > commit, tree, and blob > grep $(git -C repo rev-parse bar) out # sanity check that some walking > was done > ' > > -test_expect_success 'rev-list accepts missing and promised objects on > command line' ' > +test_expect_success 'rev-list dies for missing objects on cmd line' ' > rm -rf repo && > test_create_repo repo && > test_commit -C repo foo && > @@ -366,7 +366,19 @@ test_expect_success 'rev-list accepts missing and > promised objects on command li > > git -C repo config core.repositoryformatversion 1 && > git -C repo config extensions.partialclone "arbitrary string" && > - git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" > "$TREE" "$BLOB" > + > + for OBJ in "$COMMIT" "$TREE" "$BLOB"; do > + test_must_fail git -C repo rev-list --objects \ > + --exclude-promisor-objects "$OBJ" && > + test_must_fail git -C repo rev-list --objects-edge-aggressive \ > + --exclude-promisor-objects "$OBJ" && > + > + # Do not die or crash when --ignore-missing is passed. > + git -C repo rev-list --ignore-missing --objects \ > + --exclude-promisor-objects "$OBJ" && > + git -C repo rev-list --ignore-missing --objects-edge-aggressive > \ > + --exclude-promisor-objects "$OBJ" > + done > ' > > test_expect_success 'gc repacks promisor objects separately from > non-promisor objects' '