Re: index-pack outside of repository?

2016-12-16 Thread Junio C Hamano
Jeff King  writes:

> Ah, I only checked that it did not do anything terrible like write into
> ".git". But it looks like it still looks at the git_dir value as part of
> the collision check.
>
> Here's a patch, on top of the rest of the series.

Thanks for a quick turnaround, as always.

> -- >8 --
> Subject: [PATCH] index-pack: skip collision check when not in repository
>
> You can run "git index-pack path/to/foo.pack" outside of a
> repository to generate an index file, or just to verify the
> contents. There's no point in doing a collision check, since
> we obviously do not have any objects to collide with.
>
> The current code will blindly look in .git/objects based on
> the result of setup_git_env(). That effectively gives us the
> right answer (since we won't find any objects), but it's a
> waste of time, and it conflicts with our desire to
> eventually get rid of the "fallback to .git" behavior of
> setup_git_env().
>
> Signed-off-by: Jeff King 
> ---
> I didn't do a test, as this doesn't really have any user-visible
> behavior change.
>
> I guess technically if you had a non-repo with
> ".git/objects/12/3456..." in it we would erroneously read it, but that's
> kind of bizarre. The interesting test is that when merged with
> jk/no-looking-at-dotgit-outside-repo-final, the test in t5300 doesn't
> die.

Yes.

>
>  builtin/index-pack.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index d450a6ada2..f4b87c6c9f 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -787,13 +787,15 @@ static void sha1_object(const void *data, struct 
> object_entry *obj_entry,
>   const unsigned char *sha1)
>  {
>   void *new_data = NULL;
> - int collision_test_needed;
> + int collision_test_needed = 0;
>  
>   assert(data || obj_entry);
>  
> - read_lock();
> - collision_test_needed = has_sha1_file_with_flags(sha1, HAS_SHA1_QUICK);
> - read_unlock();
> + if (startup_info->have_repository) {
> + read_lock();
> + collision_test_needed = has_sha1_file_with_flags(sha1, 
> HAS_SHA1_QUICK);
> + read_unlock();
> + }
>  
>   if (collision_test_needed && !data) {
>   read_lock();


Re: index-pack outside of repository?

2016-12-16 Thread Jeff King
On Fri, Dec 16, 2016 at 10:54:13AM -0800, Junio C Hamano wrote:

> I am tempted to suggest an intermediate step that comes before
> b1ef400eec ("setup_git_env: avoid blind fall-back to ".git"",
> 2016-10-20), which is the attached, and publish that as part of an
> official release.  That way, we'll see what is broken without
> hurting people too much (unless they or their scripts care about
> extra message given to the standard error stream).  I suspect that
> released Git has a slightly larger user base than what is cooked on
> 'next'.
> 
>  environment.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/environment.c b/environment.c
> index 0935ec696e..88f857331e 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -167,8 +167,11 @@ static void setup_git_env(void)
>   const char *replace_ref_base;
>  
>   git_dir = getenv(GIT_DIR_ENVIRONMENT);
> - if (!git_dir)
> + if (!git_dir) {
> + if (!startup_info->have_repository)
> + warning("BUG: please report this at 
> git@vger.kernel.org");
>   git_dir = DEFAULT_GIT_DIR_ENVIRONMENT;
> + }

Yes, I think this is a nice way to ease into the change. I wish I had
thought of it when doing the original series, and we could have shipped
it in v2.11. :)

-Peff


Re: index-pack outside of repository?

2016-12-16 Thread Jeff King
On Fri, Dec 16, 2016 at 10:01:49AM -0800, Junio C Hamano wrote:

> >> I think 2/3 is a good change to ensure we get a reasonable error for
> >> "index-pack --stdin", and 3/3 is a very good cleanup.  Both of them
> >> of course are enabled by 1/3.
> >>
> >> We still fail "nongit git index-pack tmp.pack" with a BUG: though.
> >
> > Wait.  
> >
> > This only happens with a stalled-and-to-be-discarded topic on 'pu'.
> > Please don't waste time digging it (yet).
> 
> Don't wait ;-).  My mistake.  We can see t5300 broken with this
> change and b1ef400eec ("setup_git_env: avoid blind fall-back to
> ".git"", 2016-10-20) without anything else.  We still need to
> address it.

Ah, I only checked that it did not do anything terrible like write into
".git". But it looks like it still looks at the git_dir value as part of
the collision check.

Here's a patch, on top of the rest of the series.

-- >8 --
Subject: [PATCH] index-pack: skip collision check when not in repository

You can run "git index-pack path/to/foo.pack" outside of a
repository to generate an index file, or just to verify the
contents. There's no point in doing a collision check, since
we obviously do not have any objects to collide with.

The current code will blindly look in .git/objects based on
the result of setup_git_env(). That effectively gives us the
right answer (since we won't find any objects), but it's a
waste of time, and it conflicts with our desire to
eventually get rid of the "fallback to .git" behavior of
setup_git_env().

Signed-off-by: Jeff King 
---
I didn't do a test, as this doesn't really have any user-visible
behavior change. I guess technically if you had a non-repo with
".git/objects/12/3456..." in it we would erroneously read it, but that's
kind of bizarre. The interesting test is that when merged with
jk/no-looking-at-dotgit-outside-repo-final, the test in t5300 doesn't
die.

 builtin/index-pack.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d450a6ada2..f4b87c6c9f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -787,13 +787,15 @@ static void sha1_object(const void *data, struct 
object_entry *obj_entry,
const unsigned char *sha1)
 {
void *new_data = NULL;
-   int collision_test_needed;
+   int collision_test_needed = 0;
 
assert(data || obj_entry);
 
-   read_lock();
-   collision_test_needed = has_sha1_file_with_flags(sha1, HAS_SHA1_QUICK);
-   read_unlock();
+   if (startup_info->have_repository) {
+   read_lock();
+   collision_test_needed = has_sha1_file_with_flags(sha1, 
HAS_SHA1_QUICK);
+   read_unlock();
+   }
 
if (collision_test_needed && !data) {
read_lock();
-- 
2.11.0.348.g960a0b554



Re: index-pack outside of repository?

2016-12-16 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
> ...
>> I'm actually wondering if the way it calls die() in 'next' is a pretty
>> reasonable way for things to work in general. It happens when we lazily
>> try to ask for the repository directory. So we don't have to replicate
>> logic to say "are we going to need a repo"; at the moment we need it, we
>> notice we don't have it and die. The only problem is that it says "BUG"
>> and not "this operation must be run in a git repository".
>
> Isn't what we currently have is a good way to discover which
> codepaths we missed to add a check to issue the latter error?

I am tempted to suggest an intermediate step that comes before
b1ef400eec ("setup_git_env: avoid blind fall-back to ".git"",
2016-10-20), which is the attached, and publish that as part of an
official release.  That way, we'll see what is broken without
hurting people too much (unless they or their scripts care about
extra message given to the standard error stream).  I suspect that
released Git has a slightly larger user base than what is cooked on
'next'.

 environment.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/environment.c b/environment.c
index 0935ec696e..88f857331e 100644
--- a/environment.c
+++ b/environment.c
@@ -167,8 +167,11 @@ static void setup_git_env(void)
const char *replace_ref_base;
 
git_dir = getenv(GIT_DIR_ENVIRONMENT);
-   if (!git_dir)
+   if (!git_dir) {
+   if (!startup_info->have_repository)
+   warning("BUG: please report this at 
git@vger.kernel.org");
git_dir = DEFAULT_GIT_DIR_ENVIRONMENT;
+   }
gitfile = read_gitfile(git_dir);
git_dir = xstrdup(gitfile ? gitfile : git_dir);
if (get_common_dir(, git_dir))


Re: index-pack outside of repository?

2016-12-16 Thread Junio C Hamano
Junio C Hamano  writes:

> Junio C Hamano  writes:
>
>> Jeff King  writes:
>>
>>> On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote:
>>>
 But if this case really is just "if (from_stdin)" that's quite easy,
 too.
>>>
>>> So here is that patch (with some associated refactoring and cleanups).
>>> This is conceptually independent of 
>>> jk/no-looking-at-dotgit-outside-repo-final,
>>> though it should be fine to merge with that topic. The BUG will actually
>>> pass the new test, because it calls die, too. I wonder if we should die
>>> with a unique error code on BUGs, and catch them in test_must_fail
>>> similar to the way we catch signal death.
>>>
>>>   [1/3]: t5000: extract nongit function to test-lib-functions.sh
>>>   [2/3]: index-pack: complain when --stdin is used outside of a repo
>>>   [3/3]: t: use nongit() function where applicable
>>
>> I think 2/3 is a good change to ensure we get a reasonable error for
>> "index-pack --stdin", and 3/3 is a very good cleanup.  Both of them
>> of course are enabled by 1/3.
>>
>> We still fail "nongit git index-pack tmp.pack" with a BUG: though.
>
> Wait.  
>
> This only happens with a stalled-and-to-be-discarded topic on 'pu'.
> Please don't waste time digging it (yet).

Don't wait ;-).  My mistake.  We can see t5300 broken with this
change and b1ef400eec ("setup_git_env: avoid blind fall-back to
".git"", 2016-10-20) without anything else.  We still need to
address it.


Re: index-pack outside of repository?

2016-12-16 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote:
>
>> But if this case really is just "if (from_stdin)" that's quite easy,
>> too.
>
> So here is that patch (with some associated refactoring and cleanups).
> This is conceptually independent of 
> jk/no-looking-at-dotgit-outside-repo-final,
> though it should be fine to merge with that topic. The BUG will actually
> pass the new test, because it calls die, too. I wonder if we should die
> with a unique error code on BUGs, and catch them in test_must_fail
> similar to the way we catch signal death.
>
>   [1/3]: t5000: extract nongit function to test-lib-functions.sh
>   [2/3]: index-pack: complain when --stdin is used outside of a repo
>   [3/3]: t: use nongit() function where applicable

I think 2/3 is a good change to ensure we get a reasonable error for
"index-pack --stdin", and 3/3 is a very good cleanup.  Both of them
of course are enabled by 1/3.

We still fail "nongit git index-pack tmp.pack" with a BUG: though.






Re: index-pack outside of repository?

2016-12-16 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote:
>>
>>> But if this case really is just "if (from_stdin)" that's quite easy,
>>> too.
>>
>> So here is that patch (with some associated refactoring and cleanups).
>> This is conceptually independent of 
>> jk/no-looking-at-dotgit-outside-repo-final,
>> though it should be fine to merge with that topic. The BUG will actually
>> pass the new test, because it calls die, too. I wonder if we should die
>> with a unique error code on BUGs, and catch them in test_must_fail
>> similar to the way we catch signal death.
>>
>>   [1/3]: t5000: extract nongit function to test-lib-functions.sh
>>   [2/3]: index-pack: complain when --stdin is used outside of a repo
>>   [3/3]: t: use nongit() function where applicable
>
> I think 2/3 is a good change to ensure we get a reasonable error for
> "index-pack --stdin", and 3/3 is a very good cleanup.  Both of them
> of course are enabled by 1/3.
>
> We still fail "nongit git index-pack tmp.pack" with a BUG: though.

Wait.  

This only happens with a stalled-and-to-be-discarded topic on 'pu'.
Please don't waste time digging it (yet).


Re: index-pack outside of repository?

2016-12-15 Thread Jeff King
On Thu, Dec 15, 2016 at 08:37:28PM -0500, Jeff King wrote:

> But if this case really is just "if (from_stdin)" that's quite easy,
> too.

So here is that patch (with some associated refactoring and cleanups).
This is conceptually independent of jk/no-looking-at-dotgit-outside-repo-final,
though it should be fine to merge with that topic. The BUG will actually
pass the new test, because it calls die, too. I wonder if we should die
with a unique error code on BUGs, and catch them in test_must_fail
similar to the way we catch signal death.

  [1/3]: t5000: extract nongit function to test-lib-functions.sh
  [2/3]: index-pack: complain when --stdin is used outside of a repo
  [3/3]: t: use nongit() function where applicable

 builtin/index-pack.c |  2 ++
 t/t1308-config-set.sh| 10 ++
 t/t5000-tar-tree.sh  | 14 --
 t/t5300-pack-object.sh   | 15 +++
 t/t9100-git-svn-basic.sh | 17 ++---
 t/t9902-completion.sh|  7 +--
 t/test-lib-functions.sh  | 14 ++
 7 files changed, 36 insertions(+), 43 deletions(-)

-Peff


Re: index-pack outside of repository?

2016-12-15 Thread Jeff King
On Thu, Dec 15, 2016 at 04:13:38PM -0800, Junio C Hamano wrote:

> > So I think complaining to the user is the right thing to do here. I
> > started to write a patch to have index-pack notice when it needs a repo
> > and doesn't have one, but the logic is actually a bit unclear.  Do we
> > need to complain early _just_ when --stdin is specified, or does that
> > miss somes cases?  Likewise, are there cases where --stdin can operate
> > without a repo? I couldn't think of any.
> 
> I think there are two and only two major modes; --stdin wants to put
> the result in the repository it is working on, while the other mode
> takes a filename to deposit the result in, so the latter does not
> technically need a repository.

OK. That's easy to check for, then.  Reverse-engineering that logic from
the actual calls in index-pack.c:final() is complicated.  But certainly
basing it on --stdin is what I would have expected.

> > That strategy _might_ be a problem for some programs, which would want
> > to notice the issue early before doing work. But it seems like a
> > reasonable outcome for index-pack. Thoughts?
> 
> That is, once we know which codepaths should require a repository, I
> think it is reasonable to add a check that is done earlier than the
> place where we currently try to see where we have one (which could
> be deep in the callchain).  But we are all human and can miss things,
> so the BUG() thing is probably fine.  We are cooking it exactly because
> we would want to find such corner cases we missed, no?

Right, that was my original intent in adding the BUG(): to catch
unhandled cases, and then do the appropriate thing earlier. I was just
questioning whether the appropriate thing in some cases might be dying
at the BUG(), just with a more friendly message. That has the benefit of
being very easy to implement, and never wrong (e.g., forbidding a case
that actually _doesn't_ need to look at the repo).

But if this case really is just "if (from_stdin)" that's quite easy,
too.

-Peff


Re: index-pack outside of repository?

2016-12-15 Thread Junio C Hamano
Jeff King  writes:

> In older versions of git will just blindly write into
> ".git/objects/pack", even though there's no repository there.
>
> So I think complaining to the user is the right thing to do here. I
> started to write a patch to have index-pack notice when it needs a repo
> and doesn't have one, but the logic is actually a bit unclear.  Do we
> need to complain early _just_ when --stdin is specified, or does that
> miss somes cases?  Likewise, are there cases where --stdin can operate
> without a repo? I couldn't think of any.

I think there are two and only two major modes; --stdin wants to put
the result in the repository it is working on, while the other mode
takes a filename to deposit the result in, so the latter does not
technically need a repository.

> I'm actually wondering if the way it calls die() in 'next' is a pretty
> reasonable way for things to work in general. It happens when we lazily
> try to ask for the repository directory. So we don't have to replicate
> logic to say "are we going to need a repo"; at the moment we need it, we
> notice we don't have it and die. The only problem is that it says "BUG"
> and not "this operation must be run in a git repository".

Isn't what we currently have is a good way to discover which
codepaths we missed to add a check to issue the latter error?

> That strategy _might_ be a problem for some programs, which would want
> to notice the issue early before doing work. But it seems like a
> reasonable outcome for index-pack. Thoughts?

That is, once we know which codepaths should require a repository, I
think it is reasonable to add a check that is done earlier than the
place where we currently try to see where we have one (which could
be deep in the callchain).  But we are all human and can miss things,
so the BUG() thing is probably fine.  We are cooking it exactly because
we would want to find such corner cases we missed, no?


index-pack outside of repository?

2016-12-15 Thread Jeff King
Running git on 'next', you can trigger a BUG:

  $ cd /some/repo
  $ git pack-objects --all --stdout /tmp/foo.pack
  $ cd /not-a-git-repo
  $ git index-pack --stdin