Re: index-pack outside of repository?
Jeff Kingwrites: > Ah, I only checked that it did not do anything terrible like write into > ".git". But it looks like it still looks at the git_dir value as part of > the collision check. > > Here's a patch, on top of the rest of the series. Thanks for a quick turnaround, as always. > -- >8 -- > Subject: [PATCH] index-pack: skip collision check when not in repository > > You can run "git index-pack path/to/foo.pack" outside of a > repository to generate an index file, or just to verify the > contents. There's no point in doing a collision check, since > we obviously do not have any objects to collide with. > > The current code will blindly look in .git/objects based on > the result of setup_git_env(). That effectively gives us the > right answer (since we won't find any objects), but it's a > waste of time, and it conflicts with our desire to > eventually get rid of the "fallback to .git" behavior of > setup_git_env(). > > Signed-off-by: Jeff King > --- > I didn't do a test, as this doesn't really have any user-visible > behavior change. > > I guess technically if you had a non-repo with > ".git/objects/12/3456..." in it we would erroneously read it, but that's > kind of bizarre. The interesting test is that when merged with > jk/no-looking-at-dotgit-outside-repo-final, the test in t5300 doesn't > die. Yes. > > builtin/index-pack.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index d450a6ada2..f4b87c6c9f 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -787,13 +787,15 @@ static void sha1_object(const void *data, struct > object_entry *obj_entry, > const unsigned char *sha1) > { > void *new_data = NULL; > - int collision_test_needed; > + int collision_test_needed = 0; > > assert(data || obj_entry); > > - read_lock(); > - collision_test_needed = has_sha1_file_with_flags(sha1, HAS_SHA1_QUICK); > - read_unlock(); > + if (startup_info->have_repository) { > + read_lock(); > + collision_test_needed = has_sha1_file_with_flags(sha1, > HAS_SHA1_QUICK); > + read_unlock(); > + } > > if (collision_test_needed && !data) { > read_lock();
Re: index-pack outside of repository?
On Fri, Dec 16, 2016 at 10:54:13AM -0800, Junio C Hamano wrote: > I am tempted to suggest an intermediate step that comes before > b1ef400eec ("setup_git_env: avoid blind fall-back to ".git"", > 2016-10-20), which is the attached, and publish that as part of an > official release. That way, we'll see what is broken without > hurting people too much (unless they or their scripts care about > extra message given to the standard error stream). I suspect that > released Git has a slightly larger user base than what is cooked on > 'next'. > > environment.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/environment.c b/environment.c > index 0935ec696e..88f857331e 100644 > --- a/environment.c > +++ b/environment.c > @@ -167,8 +167,11 @@ static void setup_git_env(void) > const char *replace_ref_base; > > git_dir = getenv(GIT_DIR_ENVIRONMENT); > - if (!git_dir) > + if (!git_dir) { > + if (!startup_info->have_repository) > + warning("BUG: please report this at > git@vger.kernel.org"); > git_dir = DEFAULT_GIT_DIR_ENVIRONMENT; > + } Yes, I think this is a nice way to ease into the change. I wish I had thought of it when doing the original series, and we could have shipped it in v2.11. :) -Peff
Re: index-pack outside of repository?
On Fri, Dec 16, 2016 at 10:01:49AM -0800, Junio C Hamano wrote: > >> I think 2/3 is a good change to ensure we get a reasonable error for > >> "index-pack --stdin", and 3/3 is a very good cleanup. Both of them > >> of course are enabled by 1/3. > >> > >> We still fail "nongit git index-pack tmp.pack" with a BUG: though. > > > > Wait. > > > > This only happens with a stalled-and-to-be-discarded topic on 'pu'. > > Please don't waste time digging it (yet). > > Don't wait ;-). My mistake. We can see t5300 broken with this > change and b1ef400eec ("setup_git_env: avoid blind fall-back to > ".git"", 2016-10-20) without anything else. We still need to > address it. Ah, I only checked that it did not do anything terrible like write into ".git". But it looks like it still looks at the git_dir value as part of the collision check. Here's a patch, on top of the rest of the series. -- >8 -- Subject: [PATCH] index-pack: skip collision check when not in repository You can run "git index-pack path/to/foo.pack" outside of a repository to generate an index file, or just to verify the contents. There's no point in doing a collision check, since we obviously do not have any objects to collide with. The current code will blindly look in .git/objects based on the result of setup_git_env(). That effectively gives us the right answer (since we won't find any objects), but it's a waste of time, and it conflicts with our desire to eventually get rid of the "fallback to .git" behavior of setup_git_env(). Signed-off-by: Jeff King--- I didn't do a test, as this doesn't really have any user-visible behavior change. I guess technically if you had a non-repo with ".git/objects/12/3456..." in it we would erroneously read it, but that's kind of bizarre. The interesting test is that when merged with jk/no-looking-at-dotgit-outside-repo-final, the test in t5300 doesn't die. builtin/index-pack.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index d450a6ada2..f4b87c6c9f 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -787,13 +787,15 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, const unsigned char *sha1) { void *new_data = NULL; - int collision_test_needed; + int collision_test_needed = 0; assert(data || obj_entry); - read_lock(); - collision_test_needed = has_sha1_file_with_flags(sha1, HAS_SHA1_QUICK); - read_unlock(); + if (startup_info->have_repository) { + read_lock(); + collision_test_needed = has_sha1_file_with_flags(sha1, HAS_SHA1_QUICK); + read_unlock(); + } if (collision_test_needed && !data) { read_lock(); -- 2.11.0.348.g960a0b554
Re: index-pack outside of repository?
Junio C Hamanowrites: > Jeff King writes: > ... >> I'm actually wondering if the way it calls die() in 'next' is a pretty >> reasonable way for things to work in general. It happens when we lazily >> try to ask for the repository directory. So we don't have to replicate >> logic to say "are we going to need a repo"; at the moment we need it, we >> notice we don't have it and die. The only problem is that it says "BUG" >> and not "this operation must be run in a git repository". > > Isn't what we currently have is a good way to discover which > codepaths we missed to add a check to issue the latter error? I am tempted to suggest an intermediate step that comes before b1ef400eec ("setup_git_env: avoid blind fall-back to ".git"", 2016-10-20), which is the attached, and publish that as part of an official release. That way, we'll see what is broken without hurting people too much (unless they or their scripts care about extra message given to the standard error stream). I suspect that released Git has a slightly larger user base than what is cooked on 'next'. environment.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/environment.c b/environment.c index 0935ec696e..88f857331e 100644 --- a/environment.c +++ b/environment.c @@ -167,8 +167,11 @@ static void setup_git_env(void) const char *replace_ref_base; git_dir = getenv(GIT_DIR_ENVIRONMENT); - if (!git_dir) + if (!git_dir) { + if (!startup_info->have_repository) + warning("BUG: please report this at git@vger.kernel.org"); git_dir = DEFAULT_GIT_DIR_ENVIRONMENT; + } gitfile = read_gitfile(git_dir); git_dir = xstrdup(gitfile ? gitfile : git_dir); if (get_common_dir(, git_dir))
Re: index-pack outside of repository?
Junio C Hamanowrites: > Junio C Hamano writes: > >> Jeff King writes: >> >>> On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote: >>> But if this case really is just "if (from_stdin)" that's quite easy, too. >>> >>> So here is that patch (with some associated refactoring and cleanups). >>> This is conceptually independent of >>> jk/no-looking-at-dotgit-outside-repo-final, >>> though it should be fine to merge with that topic. The BUG will actually >>> pass the new test, because it calls die, too. I wonder if we should die >>> with a unique error code on BUGs, and catch them in test_must_fail >>> similar to the way we catch signal death. >>> >>> [1/3]: t5000: extract nongit function to test-lib-functions.sh >>> [2/3]: index-pack: complain when --stdin is used outside of a repo >>> [3/3]: t: use nongit() function where applicable >> >> I think 2/3 is a good change to ensure we get a reasonable error for >> "index-pack --stdin", and 3/3 is a very good cleanup. Both of them >> of course are enabled by 1/3. >> >> We still fail "nongit git index-pack tmp.pack" with a BUG: though. > > Wait. > > This only happens with a stalled-and-to-be-discarded topic on 'pu'. > Please don't waste time digging it (yet). Don't wait ;-). My mistake. We can see t5300 broken with this change and b1ef400eec ("setup_git_env: avoid blind fall-back to ".git"", 2016-10-20) without anything else. We still need to address it.
Re: index-pack outside of repository?
Jeff Kingwrites: > On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote: > >> But if this case really is just "if (from_stdin)" that's quite easy, >> too. > > So here is that patch (with some associated refactoring and cleanups). > This is conceptually independent of > jk/no-looking-at-dotgit-outside-repo-final, > though it should be fine to merge with that topic. The BUG will actually > pass the new test, because it calls die, too. I wonder if we should die > with a unique error code on BUGs, and catch them in test_must_fail > similar to the way we catch signal death. > > [1/3]: t5000: extract nongit function to test-lib-functions.sh > [2/3]: index-pack: complain when --stdin is used outside of a repo > [3/3]: t: use nongit() function where applicable I think 2/3 is a good change to ensure we get a reasonable error for "index-pack --stdin", and 3/3 is a very good cleanup. Both of them of course are enabled by 1/3. We still fail "nongit git index-pack tmp.pack" with a BUG: though.
Re: index-pack outside of repository?
Junio C Hamanowrites: > Jeff King writes: > >> On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote: >> >>> But if this case really is just "if (from_stdin)" that's quite easy, >>> too. >> >> So here is that patch (with some associated refactoring and cleanups). >> This is conceptually independent of >> jk/no-looking-at-dotgit-outside-repo-final, >> though it should be fine to merge with that topic. The BUG will actually >> pass the new test, because it calls die, too. I wonder if we should die >> with a unique error code on BUGs, and catch them in test_must_fail >> similar to the way we catch signal death. >> >> [1/3]: t5000: extract nongit function to test-lib-functions.sh >> [2/3]: index-pack: complain when --stdin is used outside of a repo >> [3/3]: t: use nongit() function where applicable > > I think 2/3 is a good change to ensure we get a reasonable error for > "index-pack --stdin", and 3/3 is a very good cleanup. Both of them > of course are enabled by 1/3. > > We still fail "nongit git index-pack tmp.pack" with a BUG: though. Wait. This only happens with a stalled-and-to-be-discarded topic on 'pu'. Please don't waste time digging it (yet).
Re: index-pack outside of repository?
On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote: > But if this case really is just "if (from_stdin)" that's quite easy, > too. So here is that patch (with some associated refactoring and cleanups). This is conceptually independent of jk/no-looking-at-dotgit-outside-repo-final, though it should be fine to merge with that topic. The BUG will actually pass the new test, because it calls die, too. I wonder if we should die with a unique error code on BUGs, and catch them in test_must_fail similar to the way we catch signal death. [1/3]: t5000: extract nongit function to test-lib-functions.sh [2/3]: index-pack: complain when --stdin is used outside of a repo [3/3]: t: use nongit() function where applicable builtin/index-pack.c | 2 ++ t/t1308-config-set.sh| 10 ++ t/t5000-tar-tree.sh | 14 -- t/t5300-pack-object.sh | 15 +++ t/t9100-git-svn-basic.sh | 17 ++--- t/t9902-completion.sh| 7 +-- t/test-lib-functions.sh | 14 ++ 7 files changed, 36 insertions(+), 43 deletions(-) -Peff
Re: index-pack outside of repository?
On Thu, Dec 15, 2016 at 04:13:38PM -0800, Junio C Hamano wrote: > > So I think complaining to the user is the right thing to do here. I > > started to write a patch to have index-pack notice when it needs a repo > > and doesn't have one, but the logic is actually a bit unclear. Do we > > need to complain early _just_ when --stdin is specified, or does that > > miss somes cases? Likewise, are there cases where --stdin can operate > > without a repo? I couldn't think of any. > > I think there are two and only two major modes; --stdin wants to put > the result in the repository it is working on, while the other mode > takes a filename to deposit the result in, so the latter does not > technically need a repository. OK. That's easy to check for, then. Reverse-engineering that logic from the actual calls in index-pack.c:final() is complicated. But certainly basing it on --stdin is what I would have expected. > > That strategy _might_ be a problem for some programs, which would want > > to notice the issue early before doing work. But it seems like a > > reasonable outcome for index-pack. Thoughts? > > That is, once we know which codepaths should require a repository, I > think it is reasonable to add a check that is done earlier than the > place where we currently try to see where we have one (which could > be deep in the callchain). But we are all human and can miss things, > so the BUG() thing is probably fine. We are cooking it exactly because > we would want to find such corner cases we missed, no? Right, that was my original intent in adding the BUG(): to catch unhandled cases, and then do the appropriate thing earlier. I was just questioning whether the appropriate thing in some cases might be dying at the BUG(), just with a more friendly message. That has the benefit of being very easy to implement, and never wrong (e.g., forbidding a case that actually _doesn't_ need to look at the repo). But if this case really is just "if (from_stdin)" that's quite easy, too. -Peff
Re: index-pack outside of repository?
Jeff Kingwrites: > In older versions of git will just blindly write into > ".git/objects/pack", even though there's no repository there. > > So I think complaining to the user is the right thing to do here. I > started to write a patch to have index-pack notice when it needs a repo > and doesn't have one, but the logic is actually a bit unclear. Do we > need to complain early _just_ when --stdin is specified, or does that > miss somes cases? Likewise, are there cases where --stdin can operate > without a repo? I couldn't think of any. I think there are two and only two major modes; --stdin wants to put the result in the repository it is working on, while the other mode takes a filename to deposit the result in, so the latter does not technically need a repository. > I'm actually wondering if the way it calls die() in 'next' is a pretty > reasonable way for things to work in general. It happens when we lazily > try to ask for the repository directory. So we don't have to replicate > logic to say "are we going to need a repo"; at the moment we need it, we > notice we don't have it and die. The only problem is that it says "BUG" > and not "this operation must be run in a git repository". Isn't what we currently have is a good way to discover which codepaths we missed to add a check to issue the latter error? > That strategy _might_ be a problem for some programs, which would want > to notice the issue early before doing work. But it seems like a > reasonable outcome for index-pack. Thoughts? That is, once we know which codepaths should require a repository, I think it is reasonable to add a check that is done earlier than the place where we currently try to see where we have one (which could be deep in the callchain). But we are all human and can miss things, so the BUG() thing is probably fine. We are cooking it exactly because we would want to find such corner cases we missed, no?
index-pack outside of repository?
Running git on 'next', you can trigger a BUG: $ cd /some/repo $ git pack-objects --all --stdout /tmp/foo.pack $ cd /not-a-git-repo $ git index-pack --stdin