Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-03-02 Thread Duy Nguyen
On Tue, Feb 27, 2018 at 6:09 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> Yes that's the intention. But after writing cover letter for v2 and
>> sending it out, it looks to me that this thing must stay until all our
>> code is converted to using the_hash_algo (I don't know if there are
>> more to convert or it's finished already). So an alternative is we do
>> the opposite: default to GIT_HASH_SHA1, but when an env variable is
>> set, reset it back to NULL. This env variable will _always_ be set by
>> the test suite to help us catch problems.
>
> If we were to do the "do not blindly initialize and always
> initialize explicitly for debugging" (which I doubt is a good
> direction to go in), then what you outlined above certainly is a
> less evil way to do so than what the patch under discussion did, I
> would think.

So v3 [1] should be in good condition to go to 'pu', I think?

[1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/
-- 
Duy


Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-26 Thread Junio C Hamano
Duy Nguyen  writes:

> Yes that's the intention. But after writing cover letter for v2 and
> sending it out, it looks to me that this thing must stay until all our
> code is converted to using the_hash_algo (I don't know if there are
> more to convert or it's finished already). So an alternative is we do
> the opposite: default to GIT_HASH_SHA1, but when an env variable is
> set, reset it back to NULL. This env variable will _always_ be set by
> the test suite to help us catch problems.

If we were to do the "do not blindly initialize and always
initialize explicitly for debugging" (which I doubt is a good
direction to go in), then what you outlined above certainly is a
less evil way to do so than what the patch under discussion did, I
would think.


Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread Duy Nguyen
On Sat, Feb 24, 2018 at 5:29 AM, brian m. carlson
 wrote:
>> @@ -40,5 +41,8 @@ int main(int argc, const char **argv)
>>
>>   restore_sigpipe_to_default();
>>
>> + if (getenv("GIT_HASH_FIXUP"))
>> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
>
> I'm lukewarm on adding this environment variable, but considering our
> history here, we had probably better.  We can always remove it after a
> few releases.

Yes that's the intention. But after writing cover letter for v2 and
sending it out, it looks to me that this thing must stay until all our
code is converted to using the_hash_algo (I don't know if there are
more to convert or it's finished already). So an alternative is we do
the opposite: default to GIT_HASH_SHA1, but when an env variable is
set, reset it back to NULL. This env variable will _always_ be set by
the test suite to help us catch problems.
-- 
Duy


Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread Brandon Williams
On 02/23, brian m. carlson wrote:
> On Fri, Feb 23, 2018 at 04:56:40PM +0700, Nguyễn Thái Ngọc Duy wrote:
> > diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> > index 7e3e1a461c..8ee935504e 100644
> > --- a/builtin/index-pack.c
> > +++ b/builtin/index-pack.c
> > @@ -1673,6 +1673,11 @@ int cmd_index_pack(int argc, const char **argv, 
> > const char *prefix)
> > if (prefix && chdir(prefix))
> > die(_("Cannot come back to cwd"));
> >  
> > +   if (!the_hash_algo) {
> > +   warning(_("Running without a repository, assuming SHA-1 hash"));
> > +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> > +   }
> 
> Is this warning going to be visible to users in the normal course of
> operation?  If so, people are probably going to find this bothersome or
> alarming.
> 
> > for (i = 1; i < argc; i++) {
> > const char *arg = argv[i];
> >  
> > diff --git a/common-main.c b/common-main.c
> > index 6a689007e7..12aec36794 100644
> > --- a/common-main.c
> > +++ b/common-main.c
> > @@ -1,6 +1,7 @@
> >  #include "cache.h"
> >  #include "exec_cmd.h"
> >  #include "attr.h"
> > +#include "repository.h"
> >  
> >  /*
> >   * Many parts of Git have subprograms communicate via pipe, expect the
> > @@ -40,5 +41,8 @@ int main(int argc, const char **argv)
> >  
> > restore_sigpipe_to_default();
> >  
> > +   if (getenv("GIT_HASH_FIXUP"))
> > +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> 
> I'm lukewarm on adding this environment variable, but considering our
> history here, we had probably better.  We can always remove it after a
> few releases.

Yeah I don't know what the best thing to do here would be.  I mean we
could always just init the hash_algo for the_repository here so that we
don't ever have to worry about it, but I don't know if even that is the
right approach.

> 
> > return cmd_main(argc, argv);
> >  }
> > diff --git a/diff-no-index.c b/diff-no-index.c
> > index 0ed5f0f496..f038f665bc 100644
> > --- a/diff-no-index.c
> > +++ b/diff-no-index.c
> > @@ -241,6 +241,11 @@ void diff_no_index(struct rev_info *revs,
> > struct strbuf replacement = STRBUF_INIT;
> > const char *prefix = revs->prefix;
> >  
> > +   if (!the_hash_algo) {
> > +   warning(_("Running without a repository, assuming SHA-1 hash"));
> > +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> > +   }
> 
> Again, same concern.  I can imagine scripts that will blow up loudly if
> git diff --no-index spews things to standard error.
> 
> I'm not opposed to making this more visible, but I wonder if maybe it
> should only be visible to developers or such.  The only way I can think
> of doing is that is with an advice options, but maybe there's a better
> way.
> -- 
> brian m. carlson / brian with sandals: Houston, Texas, US
> https://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: https://keybase.io/bk2204



-- 
Brandon Williams


Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread brian m. carlson
On Fri, Feb 23, 2018 at 04:56:40PM +0700, Nguyễn Thái Ngọc Duy wrote:
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 7e3e1a461c..8ee935504e 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1673,6 +1673,11 @@ int cmd_index_pack(int argc, const char **argv, const 
> char *prefix)
>   if (prefix && chdir(prefix))
>   die(_("Cannot come back to cwd"));
>  
> + if (!the_hash_algo) {
> + warning(_("Running without a repository, assuming SHA-1 hash"));
> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> + }

Is this warning going to be visible to users in the normal course of
operation?  If so, people are probably going to find this bothersome or
alarming.

>   for (i = 1; i < argc; i++) {
>   const char *arg = argv[i];
>  
> diff --git a/common-main.c b/common-main.c
> index 6a689007e7..12aec36794 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -1,6 +1,7 @@
>  #include "cache.h"
>  #include "exec_cmd.h"
>  #include "attr.h"
> +#include "repository.h"
>  
>  /*
>   * Many parts of Git have subprograms communicate via pipe, expect the
> @@ -40,5 +41,8 @@ int main(int argc, const char **argv)
>  
>   restore_sigpipe_to_default();
>  
> + if (getenv("GIT_HASH_FIXUP"))
> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);

I'm lukewarm on adding this environment variable, but considering our
history here, we had probably better.  We can always remove it after a
few releases.

>   return cmd_main(argc, argv);
>  }
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 0ed5f0f496..f038f665bc 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -241,6 +241,11 @@ void diff_no_index(struct rev_info *revs,
>   struct strbuf replacement = STRBUF_INIT;
>   const char *prefix = revs->prefix;
>  
> + if (!the_hash_algo) {
> + warning(_("Running without a repository, assuming SHA-1 hash"));
> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> + }

Again, same concern.  I can imagine scripts that will blow up loudly if
git diff --no-index spews things to standard error.

I'm not opposed to making this more visible, but I wonder if maybe it
should only be visible to developers or such.  The only way I can think
of doing is that is with an advice options, but maybe there's a better
way.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread Junio C Hamano
Stefan Beller  writes:

> On Fri, Feb 23, 2018 at 11:50 AM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>> I wonder if there is yet another missing case in the enumeration of
>>> the previous patch:
>>> Some commands are able to operate on GIT_OBJECT_DIR instead
>>> of GIT_DIR (git repack?), which may not even explore the full git directory,
>>> and so doesn't know about the hash value.
>>
>> ... because GIT_DIR/config is not known?  "repack" is not one of
>> them, though---it needs to at least use refs as the starting point
>> so a standalone OBJECT_DIR is insufficient.
>
> Yes, I could have worded this as a question:
> Is there any command that operates on GIT_OBJECT_DIR
> without trying to discover GIT_DIR ?

If somebody finds one that would be a good argument not to pursue
the approach.  Lack of response to the question would not amount to
that much---it is possible all people who bothered to think of
overlooked something obvious, though.  "When I asked around nobody
thought of a possibly way for this to cause breakages, so let's
declare it is safe to do so and do it" is not how we want to do
things X-<.



Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread Stefan Beller
On Fri, Feb 23, 2018 at 11:50 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> I wonder if there is yet another missing case in the enumeration of
>> the previous patch:
>> Some commands are able to operate on GIT_OBJECT_DIR instead
>> of GIT_DIR (git repack?), which may not even explore the full git directory,
>> and so doesn't know about the hash value.
>
> ... because GIT_DIR/config is not known?  "repack" is not one of
> them, though---it needs to at least use refs as the starting point
> so a standalone OBJECT_DIR is insufficient.

Yes, I could have worded this as a question:
Is there any command that operates on GIT_OBJECT_DIR
without trying to discover GIT_DIR ?


Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread Junio C Hamano
Stefan Beller  writes:

> I wonder if there is yet another missing case in the enumeration of
> the previous patch:
> Some commands are able to operate on GIT_OBJECT_DIR instead
> of GIT_DIR (git repack?), which may not even explore the full git directory,
> and so doesn't know about the hash value.

... because GIT_DIR/config is not known?  "repack" is not one of
them, though---it needs to at least use refs as the starting point
so a standalone OBJECT_DIR is insufficient.


Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"

2018-02-23 Thread Stefan Beller
On Fri, Feb 23, 2018 at 1:56 AM, Nguyễn Thái Ngọc Duy  wrote:
> This reverts commit e26f7f19b6c7485f04234946a59ab8f4fd21d6d1. The root
> problem, git clone not setting up the_hash_algo, has been fixed in the
> previous patch.
>
> As a result of the revert, some code paths that use the_hash_algo
> without initialization is revealed and fixed here. It's basically
> commands that are allowed to run without a repository. The fix here is
> not the best. We probably could figure out the hash algorithm from input
> somehow.
>
> Since this is a dangerous move and could potentially break stuff after
> release (and leads to workaround like the reverted commit), the
> workaround technically remains, but is hidden behind a new environment
> variable GIT_HASH_FIXUP. This should let the users continue to use git
> while we fix the problem.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/index-pack.c | 5 +
>  common-main.c| 4 
>  diff-no-index.c  | 5 +
>  repository.c | 2 +-
>  t/helper/test-dump-split-index.c | 2 ++
>  5 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 7e3e1a461c..8ee935504e 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1673,6 +1673,11 @@ int cmd_index_pack(int argc, const char **argv, const 
> char *prefix)
> if (prefix && chdir(prefix))
> die(_("Cannot come back to cwd"));
>
> +   if (!the_hash_algo) {
> +   warning(_("Running without a repository, assuming SHA-1 
> hash"));
> +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +   }
> +
> for (i = 1; i < argc; i++) {
> const char *arg = argv[i];
>
> diff --git a/common-main.c b/common-main.c
> index 6a689007e7..12aec36794 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -1,6 +1,7 @@
>  #include "cache.h"
>  #include "exec_cmd.h"
>  #include "attr.h"
> +#include "repository.h"
>
>  /*
>   * Many parts of Git have subprograms communicate via pipe, expect the
> @@ -40,5 +41,8 @@ int main(int argc, const char **argv)
>
> restore_sigpipe_to_default();
>
> +   if (getenv("GIT_HASH_FIXUP"))
> +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +
> return cmd_main(argc, argv);
>  }
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 0ed5f0f496..f038f665bc 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -241,6 +241,11 @@ void diff_no_index(struct rev_info *revs,
> struct strbuf replacement = STRBUF_INIT;
> const char *prefix = revs->prefix;
>
> +   if (!the_hash_algo) {
> +   warning(_("Running without a repository, assuming SHA-1 
> hash"));
> +   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +   }
> +
> diff_setup(>diffopt);
> for (i = 1; i < argc - 2; ) {
> int j;
> diff --git a/repository.c b/repository.c
> index 4ffbe9bc94..0d715f4fdb 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -5,7 +5,7 @@
>
>  /* The main repository */
>  static struct repository the_repo = {
> -   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
> _algos[GIT_HASH_SHA1], 0, 0
> +   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, _index, 
> NULL, 0, 0

I am not sure I agree with this defense in depth, because it would add
a lot to maintenance burden.
Specifically this part. The series that I sent out usually clashes
here as this is currently a hot area
of the code touched by many different series in flight.

However this is the long term correct thing to do? We assume no algorithm until
the repository can tell us from its config (or we default to sha1 if there is no
configuration present).

I wonder if there is yet another missing case in the enumeration of
the previous patch:
Some commands are able to operate on GIT_OBJECT_DIR instead
of GIT_DIR (git repack?), which may not even explore the full git directory,
and so doesn't know about the hash value.

In the cover letter you reference my series, but comparing the diffstats
(and looking through the patches), I would only expect this one place
to have merge conflicts, which ought to be easy to resolve.
(In my series I break the initializer into multiple lines to help the
future, too)

After some thought, I like this series.

Thanks,
Stefan