Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
On Tue, Feb 27, 2018 at 6:09 AM, Junio C Hamano wrote: > Duy Nguyen writes: > >> Yes that's the intention. But after writing cover letter for v2 and >> sending it out, it looks to me that this thing must stay until all our >> code is converted to using the_hash_algo (I don't know if there are >> more to convert or it's finished already). So an alternative is we do >> the opposite: default to GIT_HASH_SHA1, but when an env variable is >> set, reset it back to NULL. This env variable will _always_ be set by >> the test suite to help us catch problems. > > If we were to do the "do not blindly initialize and always > initialize explicitly for debugging" (which I doubt is a good > direction to go in), then what you outlined above certainly is a > less evil way to do so than what the patch under discussion did, I > would think. So v3 [1] should be in good condition to go to 'pu', I think? [1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/ -- Duy
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
Duy Nguyen writes: > Yes that's the intention. But after writing cover letter for v2 and > sending it out, it looks to me that this thing must stay until all our > code is converted to using the_hash_algo (I don't know if there are > more to convert or it's finished already). So an alternative is we do > the opposite: default to GIT_HASH_SHA1, but when an env variable is > set, reset it back to NULL. This env variable will _always_ be set by > the test suite to help us catch problems. If we were to do the "do not blindly initialize and always initialize explicitly for debugging" (which I doubt is a good direction to go in), then what you outlined above certainly is a less evil way to do so than what the patch under discussion did, I would think.
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
On Sat, Feb 24, 2018 at 5:29 AM, brian m. carlson wrote: >> @@ -40,5 +41,8 @@ int main(int argc, const char **argv) >> >> restore_sigpipe_to_default(); >> >> + if (getenv("GIT_HASH_FIXUP")) >> + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > I'm lukewarm on adding this environment variable, but considering our > history here, we had probably better. We can always remove it after a > few releases. Yes that's the intention. But after writing cover letter for v2 and sending it out, it looks to me that this thing must stay until all our code is converted to using the_hash_algo (I don't know if there are more to convert or it's finished already). So an alternative is we do the opposite: default to GIT_HASH_SHA1, but when an env variable is set, reset it back to NULL. This env variable will _always_ be set by the test suite to help us catch problems. -- Duy
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
On 02/23, brian m. carlson wrote: > On Fri, Feb 23, 2018 at 04:56:40PM +0700, Nguyễn Thái Ngọc Duy wrote: > > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > > index 7e3e1a461c..8ee935504e 100644 > > --- a/builtin/index-pack.c > > +++ b/builtin/index-pack.c > > @@ -1673,6 +1673,11 @@ int cmd_index_pack(int argc, const char **argv, > > const char *prefix) > > if (prefix && chdir(prefix)) > > die(_("Cannot come back to cwd")); > > > > + if (!the_hash_algo) { > > + warning(_("Running without a repository, assuming SHA-1 hash")); > > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > + } > > Is this warning going to be visible to users in the normal course of > operation? If so, people are probably going to find this bothersome or > alarming. > > > for (i = 1; i < argc; i++) { > > const char *arg = argv[i]; > > > > diff --git a/common-main.c b/common-main.c > > index 6a689007e7..12aec36794 100644 > > --- a/common-main.c > > +++ b/common-main.c > > @@ -1,6 +1,7 @@ > > #include "cache.h" > > #include "exec_cmd.h" > > #include "attr.h" > > +#include "repository.h" > > > > /* > > * Many parts of Git have subprograms communicate via pipe, expect the > > @@ -40,5 +41,8 @@ int main(int argc, const char **argv) > > > > restore_sigpipe_to_default(); > > > > + if (getenv("GIT_HASH_FIXUP")) > > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > I'm lukewarm on adding this environment variable, but considering our > history here, we had probably better. We can always remove it after a > few releases. Yeah I don't know what the best thing to do here would be. I mean we could always just init the hash_algo for the_repository here so that we don't ever have to worry about it, but I don't know if even that is the right approach. > > > return cmd_main(argc, argv); > > } > > diff --git a/diff-no-index.c b/diff-no-index.c > > index 0ed5f0f496..f038f665bc 100644 > > --- a/diff-no-index.c > > +++ b/diff-no-index.c > > @@ -241,6 +241,11 @@ void diff_no_index(struct rev_info *revs, > > struct strbuf replacement = STRBUF_INIT; > > const char *prefix = revs->prefix; > > > > + if (!the_hash_algo) { > > + warning(_("Running without a repository, assuming SHA-1 hash")); > > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > + } > > Again, same concern. I can imagine scripts that will blow up loudly if > git diff --no-index spews things to standard error. > > I'm not opposed to making this more visible, but I wonder if maybe it > should only be visible to developers or such. The only way I can think > of doing is that is with an advice options, but maybe there's a better > way. > -- > brian m. carlson / brian with sandals: Houston, Texas, US > https://www.crustytoothpaste.net/~bmc | My opinion only > OpenPGP: https://keybase.io/bk2204 -- Brandon Williams
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
On Fri, Feb 23, 2018 at 04:56:40PM +0700, Nguyễn Thái Ngọc Duy wrote: > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index 7e3e1a461c..8ee935504e 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -1673,6 +1673,11 @@ int cmd_index_pack(int argc, const char **argv, const > char *prefix) > if (prefix && chdir(prefix)) > die(_("Cannot come back to cwd")); > > + if (!the_hash_algo) { > + warning(_("Running without a repository, assuming SHA-1 hash")); > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + } Is this warning going to be visible to users in the normal course of operation? If so, people are probably going to find this bothersome or alarming. > for (i = 1; i < argc; i++) { > const char *arg = argv[i]; > > diff --git a/common-main.c b/common-main.c > index 6a689007e7..12aec36794 100644 > --- a/common-main.c > +++ b/common-main.c > @@ -1,6 +1,7 @@ > #include "cache.h" > #include "exec_cmd.h" > #include "attr.h" > +#include "repository.h" > > /* > * Many parts of Git have subprograms communicate via pipe, expect the > @@ -40,5 +41,8 @@ int main(int argc, const char **argv) > > restore_sigpipe_to_default(); > > + if (getenv("GIT_HASH_FIXUP")) > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); I'm lukewarm on adding this environment variable, but considering our history here, we had probably better. We can always remove it after a few releases. > return cmd_main(argc, argv); > } > diff --git a/diff-no-index.c b/diff-no-index.c > index 0ed5f0f496..f038f665bc 100644 > --- a/diff-no-index.c > +++ b/diff-no-index.c > @@ -241,6 +241,11 @@ void diff_no_index(struct rev_info *revs, > struct strbuf replacement = STRBUF_INIT; > const char *prefix = revs->prefix; > > + if (!the_hash_algo) { > + warning(_("Running without a repository, assuming SHA-1 hash")); > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + } Again, same concern. I can imagine scripts that will blow up loudly if git diff --no-index spews things to standard error. I'm not opposed to making this more visible, but I wonder if maybe it should only be visible to developers or such. The only way I can think of doing is that is with an advice options, but maybe there's a better way. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
Stefan Beller writes: > On Fri, Feb 23, 2018 at 11:50 AM, Junio C Hamano wrote: >> Stefan Beller writes: >> >>> I wonder if there is yet another missing case in the enumeration of >>> the previous patch: >>> Some commands are able to operate on GIT_OBJECT_DIR instead >>> of GIT_DIR (git repack?), which may not even explore the full git directory, >>> and so doesn't know about the hash value. >> >> ... because GIT_DIR/config is not known? "repack" is not one of >> them, though---it needs to at least use refs as the starting point >> so a standalone OBJECT_DIR is insufficient. > > Yes, I could have worded this as a question: > Is there any command that operates on GIT_OBJECT_DIR > without trying to discover GIT_DIR ? If somebody finds one that would be a good argument not to pursue the approach. Lack of response to the question would not amount to that much---it is possible all people who bothered to think of overlooked something obvious, though. "When I asked around nobody thought of a possibly way for this to cause breakages, so let's declare it is safe to do so and do it" is not how we want to do things X-<.
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
On Fri, Feb 23, 2018 at 11:50 AM, Junio C Hamano wrote: > Stefan Beller writes: > >> I wonder if there is yet another missing case in the enumeration of >> the previous patch: >> Some commands are able to operate on GIT_OBJECT_DIR instead >> of GIT_DIR (git repack?), which may not even explore the full git directory, >> and so doesn't know about the hash value. > > ... because GIT_DIR/config is not known? "repack" is not one of > them, though---it needs to at least use refs as the starting point > so a standalone OBJECT_DIR is insufficient. Yes, I could have worded this as a question: Is there any command that operates on GIT_OBJECT_DIR without trying to discover GIT_DIR ?
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
Stefan Beller writes: > I wonder if there is yet another missing case in the enumeration of > the previous patch: > Some commands are able to operate on GIT_OBJECT_DIR instead > of GIT_DIR (git repack?), which may not even explore the full git directory, > and so doesn't know about the hash value. ... because GIT_DIR/config is not known? "repack" is not one of them, though---it needs to at least use refs as the starting point so a standalone OBJECT_DIR is insufficient.
Re: [PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
On Fri, Feb 23, 2018 at 1:56 AM, Nguyễn Thái Ngọc Duy wrote: > This reverts commit e26f7f19b6c7485f04234946a59ab8f4fd21d6d1. The root > problem, git clone not setting up the_hash_algo, has been fixed in the > previous patch. > > As a result of the revert, some code paths that use the_hash_algo > without initialization is revealed and fixed here. It's basically > commands that are allowed to run without a repository. The fix here is > not the best. We probably could figure out the hash algorithm from input > somehow. > > Since this is a dangerous move and could potentially break stuff after > release (and leads to workaround like the reverted commit), the > workaround technically remains, but is hidden behind a new environment > variable GIT_HASH_FIXUP. This should let the users continue to use git > while we fix the problem. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > builtin/index-pack.c | 5 + > common-main.c| 4 > diff-no-index.c | 5 + > repository.c | 2 +- > t/helper/test-dump-split-index.c | 2 ++ > 5 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index 7e3e1a461c..8ee935504e 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -1673,6 +1673,11 @@ int cmd_index_pack(int argc, const char **argv, const > char *prefix) > if (prefix && chdir(prefix)) > die(_("Cannot come back to cwd")); > > + if (!the_hash_algo) { > + warning(_("Running without a repository, assuming SHA-1 > hash")); > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + } > + > for (i = 1; i < argc; i++) { > const char *arg = argv[i]; > > diff --git a/common-main.c b/common-main.c > index 6a689007e7..12aec36794 100644 > --- a/common-main.c > +++ b/common-main.c > @@ -1,6 +1,7 @@ > #include "cache.h" > #include "exec_cmd.h" > #include "attr.h" > +#include "repository.h" > > /* > * Many parts of Git have subprograms communicate via pipe, expect the > @@ -40,5 +41,8 @@ int main(int argc, const char **argv) > > restore_sigpipe_to_default(); > > + if (getenv("GIT_HASH_FIXUP")) > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + > return cmd_main(argc, argv); > } > diff --git a/diff-no-index.c b/diff-no-index.c > index 0ed5f0f496..f038f665bc 100644 > --- a/diff-no-index.c > +++ b/diff-no-index.c > @@ -241,6 +241,11 @@ void diff_no_index(struct rev_info *revs, > struct strbuf replacement = STRBUF_INIT; > const char *prefix = revs->prefix; > > + if (!the_hash_algo) { > + warning(_("Running without a repository, assuming SHA-1 > hash")); > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + } > + > diff_setup(&revs->diffopt); > for (i = 1; i < argc - 2; ) { > int j; > diff --git a/repository.c b/repository.c > index 4ffbe9bc94..0d715f4fdb 100644 > --- a/repository.c > +++ b/repository.c > @@ -5,7 +5,7 @@ > > /* The main repository */ > static struct repository the_repo = { > - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, &the_index, > &hash_algos[GIT_HASH_SHA1], 0, 0 > + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, &the_index, > NULL, 0, 0 I am not sure I agree with this defense in depth, because it would add a lot to maintenance burden. Specifically this part. The series that I sent out usually clashes here as this is currently a hot area of the code touched by many different series in flight. However this is the long term correct thing to do? We assume no algorithm until the repository can tell us from its config (or we default to sha1 if there is no configuration present). I wonder if there is yet another missing case in the enumeration of the previous patch: Some commands are able to operate on GIT_OBJECT_DIR instead of GIT_DIR (git repack?), which may not even explore the full git directory, and so doesn't know about the hash value. In the cover letter you reference my series, but comparing the diffstats (and looking through the patches), I would only expect this one place to have merge conflicts, which ought to be easy to resolve. (In my series I break the initializer into multiple lines to help the future, too) After some thought, I like this series. Thanks, Stefan
[PATCH 2/2] Revert "repository: pre-initialize hash algo pointer"
This reverts commit e26f7f19b6c7485f04234946a59ab8f4fd21d6d1. The root problem, git clone not setting up the_hash_algo, has been fixed in the previous patch. As a result of the revert, some code paths that use the_hash_algo without initialization is revealed and fixed here. It's basically commands that are allowed to run without a repository. The fix here is not the best. We probably could figure out the hash algorithm from input somehow. Since this is a dangerous move and could potentially break stuff after release (and leads to workaround like the reverted commit), the workaround technically remains, but is hidden behind a new environment variable GIT_HASH_FIXUP. This should let the users continue to use git while we fix the problem. Signed-off-by: Nguyễn Thái Ngọc Duy --- builtin/index-pack.c | 5 + common-main.c| 4 diff-no-index.c | 5 + repository.c | 2 +- t/helper/test-dump-split-index.c | 2 ++ 5 files changed, 17 insertions(+), 1 deletion(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 7e3e1a461c..8ee935504e 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1673,6 +1673,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) if (prefix && chdir(prefix)) die(_("Cannot come back to cwd")); + if (!the_hash_algo) { + warning(_("Running without a repository, assuming SHA-1 hash")); + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + } + for (i = 1; i < argc; i++) { const char *arg = argv[i]; diff --git a/common-main.c b/common-main.c index 6a689007e7..12aec36794 100644 --- a/common-main.c +++ b/common-main.c @@ -1,6 +1,7 @@ #include "cache.h" #include "exec_cmd.h" #include "attr.h" +#include "repository.h" /* * Many parts of Git have subprograms communicate via pipe, expect the @@ -40,5 +41,8 @@ int main(int argc, const char **argv) restore_sigpipe_to_default(); + if (getenv("GIT_HASH_FIXUP")) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + return cmd_main(argc, argv); } diff --git a/diff-no-index.c b/diff-no-index.c index 0ed5f0f496..f038f665bc 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -241,6 +241,11 @@ void diff_no_index(struct rev_info *revs, struct strbuf replacement = STRBUF_INIT; const char *prefix = revs->prefix; + if (!the_hash_algo) { + warning(_("Running without a repository, assuming SHA-1 hash")); + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + } + diff_setup(&revs->diffopt); for (i = 1; i < argc - 2; ) { int j; diff --git a/repository.c b/repository.c index 4ffbe9bc94..0d715f4fdb 100644 --- a/repository.c +++ b/repository.c @@ -5,7 +5,7 @@ /* The main repository */ static struct repository the_repo = { - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, &the_index, &hash_algos[GIT_HASH_SHA1], 0, 0 + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, &the_index, NULL, 0, 0 }; struct repository *the_repository = &the_repo; diff --git a/t/helper/test-dump-split-index.c b/t/helper/test-dump-split-index.c index e44430b699..b759fe3fc1 100644 --- a/t/helper/test-dump-split-index.c +++ b/t/helper/test-dump-split-index.c @@ -12,6 +12,8 @@ int cmd_main(int ac, const char **av) struct split_index *si; int i; + setup_git_directory(); + do_read_index(&the_index, av[1], 1); printf("own %s\n", sha1_to_hex(the_index.sha1)); si = the_index.split_index; -- 2.16.1.435.g8f24da2e1a