Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root
On Tue, Feb 19, 2013 at 06:47:23PM -0800, Junio C Hamano wrote: Adam Spiers g...@adamspiers.org writes: Remove a sweep-the-issue-under-the-rug conditional in check-ignore that avoided to pass an empty string to the callchain while at it. It is a valid question to ask for check-ignore if the top-level is set to be ignored by default, even though the answer is most likely no, if only because there is currently no way to specify such an Hmm, I see very little difference between the use of most likely and the use of the words much and typically which you previously considered a sure sign that the design of the fix is iffy. Your patch were The reason why feeding empty string upsets ^^ patches were, or patch was? It's not clear which patch(es) you're referring to. hash_name() were not investigated; by punting the '.' as input, and ignoring the possibility that such a question might make sense, I can work around the segfault. I don't see how explicitly referring to the possibility can be counted as ignoring it. I do not even question if hash_name() that misbehaves on an empty string is a bug. Just make sure we do not tickle the function with a problematic input. Presumably the I here refers to anthropomorphized commit message rather than to me personally, since I did question hash_name()'s behaviour several times already. The patch you are responding to declares that hash_name() should work sensibly on an empty string, and that is the _only_ necessary change for the fix. We could keep path[0], but even without it, by fixing the hash_name(), we will no longer segfault. Yes, and as already stated, I agree that is a good thing. My most likely is about the special case ' path[0]' produces correct result, Sorry, I can't understand this. You are paraphrasing something and placing it inside quotes, but I can't find the corresponding source. I presumed it refers to this extract of your proposed patch's commit message: Remove a sweep-the-issue-under-the-rug conditional in check-ignore that avoided to pass an empty string to the callchain while at it. It is a valid question to ask for check-ignore if the top-level is set to be ignored by default, even though the answer is most likely no but I can't reconcile this extract with the paraphrase the special case ' path[0]' produces correct result. and it is likely to stay so in the near future until we update .gitignore format to allow users to say 'ignore the top by default', which is not likely to happen soon. It is not about the nature of the fix at all. Still do not see the difference? I think I *might* be beginning to see you were getting at, although my understanding is still clouded by the ambiguities detailed above. Is your point that the use of words like 'much' and 'typically' are a sure sign of iffy design _when_used_to_talk_about_fixes_ but not necessarily in other contexts? If so then it makes a bit more sense to me, even though I tend to disagree with such broadly sweeping generalizations, especially when the qualifying context is missing. That aside, your idea of looking out for bad smells not only in code but also in the spoken language contained by commit messages and design discussions is an interesting one. I will try to bear that technique in mind more consciously in the future, and see how well it serves me. The removal of the path[0] is about allowing such a question whose likeliness may be remote. In the current .gitignore format, you may not be able to say ignore the whole thing by default, so in that sense, the answer to the question this part of the code is asking when given . may always be no. Keeping the path[0] will optimize for that case. And unusual thing to ask below is to judge if answering such a question is worth optimizing for (the verdict is no, it is not a common thing to do). Yes, I understand and agree with these paragraphs. entry in the .gitignore file. But it is an unusual thing to ask and it is not worth optimizing for it by special casing at the top level of the call chain. Although I agree with your proposed patch's sentiment of avoiding sweeping this corner case under the rug, 'check-ignore .' still wouldn't match anything if for example './' was a supported mechanism for ignoring the top level. It indicates that there may be more bugs (that may not result in segv) somewhere in check-ignore codepath, if (1) echo ./ .gitignore were to say ignore everything in the tree by default., and (2) the real ignore check does work that way, but (3) git check-ignore . says we do not ignore that one. Yes, I think we are saying exactly the same thing here, although if It indicates that [...] refers to your proposed patch's commit message then I don't think it indicates the possibility of these bugs in the most obvious or explicit way. Such a bug may come from some code that is not
[PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root
Fix a corner case where check-ignore would segfault when run with the '.' argument from the top level of a repository, due to prefix_path() converting '.' into the empty string. It doesn't make much sense to call check-ignore from the top level with '.' as a parameter, since the top-level directory would never typically be ignored, but of course it should not segfault in this case. The existing code attempted to check for this case but failed due to using the wrong variable. Instead we move the check to last_exclude_matching_path(), in case other callers present or future have a similar issue. Signed-off-by: Adam Spiers g...@adamspiers.org --- builtin/check-ignore.c | 2 +- dir.c | 8 t/t0008-ignores.sh | 5 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c index 709535c..0240f99 100644 --- a/builtin/check-ignore.c +++ b/builtin/check-ignore.c @@ -89,7 +89,7 @@ static int check_ignore(const char *prefix, const char **pathspec) ? strlen(prefix) : 0, path); full_path = check_path_for_gitlink(full_path); die_if_path_beyond_symlink(full_path, prefix); - if (!seen[i] path[0]) { + if (!seen[i]) { exclude = last_exclude_matching_path(check, full_path, -1, dtype); if (exclude) { diff --git a/dir.c b/dir.c index 57394e4..1ae0b90 100644 --- a/dir.c +++ b/dir.c @@ -828,6 +828,14 @@ struct exclude *last_exclude_matching_path(struct path_exclude_check *check, struct exclude *exclude; /* +* name could be the empty string, e.g. if check-ignore was +* invoked from the top level with '.', prefix_path() will +* convert it into . +*/ + if (!*name) + return NULL; + + /* * we allow the caller to pass namelen as an optimization; it * must match the length of the name, as we eventually call * is_excluded() on the whole name string. diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh index ebe7c70..9c1bde1 100755 --- a/t/t0008-ignores.sh +++ b/t/t0008-ignores.sh @@ -138,6 +138,7 @@ test_expect_success 'setup' ' cat -\EOF .gitignore one ignored-* + top-level-dir/ EOF for dir in . a do @@ -177,6 +178,10 @@ test_expect_success 'setup' ' # # test invalid inputs +test_expect_success_multi '. corner-case' '' ' + test_check_ignore . 1 +' + test_expect_success_multi 'empty command line' '' ' test_check_ignore 128 stderr_contains fatal: no path specified -- 1.8.2.rc0.18.g543d1e4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root
Adam Spiers g...@adamspiers.org writes: Fix a corner case where check-ignore would segfault when run with the '.' argument from the top level of a repository, due to prefix_path() converting '.' into the empty string. The description does not match what I understand is happening from the original report, though. The above is more like this, no? When check-ignore is run with the '.' argument from the top level of a repository, it fed an empty string to hash_name() in name-hash.c and caused a segfault, as the function kept reading forever past the end of the string. A point to note is that it is not cleaer why it is a corner case to ask about a pathspec .. It is a valid question Is the whole tree ignored by default?, isn't it? It doesn't make much sense to call check-ignore from the top level with '.' as a parameter, since the top-level directory would never typically be ignored, And this sounds like a really bad excuse. If it were it does not make *any* sense ... because the top level is *never* ignored, then the patch is a perfectly fine optimization that happens to work around the problem, but the use of much and typically is a sure sign that the design of the fix is iffy. It also shows that the patch is not a fix, but is sweeping the problem under the rug, if there were a valid use case to set the top level to be ignored. I wonder what would happen if we removed that path[0] check from the caller, not add the assume the top is never ignored workaround, and do something like this to the location that causes segv, so that it can give an answer when asked to hash an empty string? Does the callchain that goes down to this function have other places that assume they will never see an empty string, like this function does, which I _think_ is the real issue? name-hash.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/name-hash.c b/name-hash.c index d8d25c2..942c459 100644 --- a/name-hash.c +++ b/name-hash.c @@ -24,11 +24,11 @@ static unsigned int hash_name(const char *name, int namelen) { unsigned int hash = 0x123; - do { + while (namelen--) { unsigned char c = *name++; c = icase_hash(c); hash = hash*101 + c; - } while (--namelen); + } return hash; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root
Junio C Hamano gits...@pobox.com writes: And this sounds like a really bad excuse. If it were it does not make *any* sense ... because the top level is *never* ignored, then the patch is a perfectly fine optimization that happens to work around the problem, but the use of much and typically is a sure sign that the design of the fix is iffy. It also shows that the patch is not a fix, but is sweeping the problem under the rug, if there were a valid use case to set the top level to be ignored. I wonder what would happen if we removed that path[0] check from the caller, not add the assume the top is never ignored workaround, and do something like this to the location that causes segv, so that it can give an answer when asked to hash an empty string? Does the callchain that goes down to this function have other places that assume they will never see an empty string, like this function does, which I _think_ is the real issue? I started to suspect that may be the right approach. Why not do this? -- 8 -- From: Junio C Hamano gits...@pobox.com Date: Tue, 19 Feb 2013 11:56:44 -0800 Subject: [PATCH] name-hash: allow hashing an empty string Usually we do not pass an empty string to the function hash_name() because we almost always ask for hash values for a path that is a candidate to be added to the index. However, check-ignore (and most likely check-attr, but I didn't check) apparently has a callchain to ask the hash value for an empty path when it was given a . from the top-level directory to ask Is the path . excluded by default? Make sure that hash_name() does not overrun the end of the given pathname even when it is empty. Remove a sweep-the-issue-under-the-rug conditional in check-ignore that avoided to pass an empty string to the callchain while at it. It is a valid question to ask for check-ignore if the top-level is set to be ignored by default, even though the answer is most likely no, if only because there is currently no way to specify such an entry in the .gitignore file. But it is an unusual thing to ask and it is not worth optimizing for it by special casing at the top level of the call chain. Signed-off-by: Adam Spiers g...@adamspiers.org Signed-off-by: Junio C Hamano gits...@pobox.com --- builtin/check-ignore.c | 2 +- name-hash.c| 4 ++-- t/t0008-ignores.sh | 5 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c index 709535c..0240f99 100644 --- a/builtin/check-ignore.c +++ b/builtin/check-ignore.c @@ -89,7 +89,7 @@ static int check_ignore(const char *prefix, const char **pathspec) ? strlen(prefix) : 0, path); full_path = check_path_for_gitlink(full_path); die_if_path_beyond_symlink(full_path, prefix); - if (!seen[i] path[0]) { + if (!seen[i]) { exclude = last_exclude_matching_path(check, full_path, -1, dtype); if (exclude) { diff --git a/name-hash.c b/name-hash.c index d8d25c2..942c459 100644 --- a/name-hash.c +++ b/name-hash.c @@ -24,11 +24,11 @@ static unsigned int hash_name(const char *name, int namelen) { unsigned int hash = 0x123; - do { + while (namelen--) { unsigned char c = *name++; c = icase_hash(c); hash = hash*101 + c; - } while (--namelen); + } return hash; } diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh index ebe7c70..9c1bde1 100755 --- a/t/t0008-ignores.sh +++ b/t/t0008-ignores.sh @@ -138,6 +138,7 @@ test_expect_success 'setup' ' cat -\EOF .gitignore one ignored-* + top-level-dir/ EOF for dir in . a do @@ -177,6 +178,10 @@ test_expect_success 'setup' ' # # test invalid inputs +test_expect_success_multi '. corner-case' '' ' + test_check_ignore . 1 +' + test_expect_success_multi 'empty command line' '' ' test_check_ignore 128 stderr_contains fatal: no path specified -- 1.8.2.rc0.89.g6e4b41d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root
On Tue, Feb 19, 2013 at 7:59 PM, Junio C Hamano gits...@pobox.com wrote: Adam Spiers g...@adamspiers.org writes: Fix a corner case where check-ignore would segfault when run with the '.' argument from the top level of a repository, due to prefix_path() converting '.' into the empty string. The description does not match what I understand is happening from the original report, though. Why not? The above is more like this, no? When check-ignore is run with the '.' argument from the top level of a repository, it fed an empty string to hash_name() in name-hash.c and caused a segfault, as the function kept reading forever past the end of the string. The only difference I can see between the two is that yours has changed the phrase order and gives a bit more information. I don't see any disagreement between them though. A point to note is that it is not cleaer why it is a corner case to ask about a pathspec .. It is a valid question Is the whole tree ignored by default?, isn't it? It's probably valid, but I'm not the best judge of that. It doesn't make much sense to me why anyone would do that, and it seems very unlikely it would be a common use case, therefore it's a corner case. The next sentence then qualifies that: It doesn't make much sense to call check-ignore from the top level with '.' as a parameter, since the top-level directory would never typically be ignored, And this sounds like a really bad excuse. An excuse for what? The final part of that sentence which you trimmed made it clear that it was not an excuse for the segfault. Your choice of wording here sounds more like a personal attack than a technical discussion - presumably unintentional, so I'll choose not to take offense. If it were it does not make *any* sense ... because the top level is *never* ignored, then the patch is a perfectly fine optimization that happens to work around the problem, but the use of much and typically is a sure sign that the design of the fix is iffy. It also shows that the patch is not a fix, but is sweeping the problem under the rug, if there were a valid use case to set the top level to be ignored. There *may* be a valid use case to set the top level to be ignored. *I* can't think of one personally, therefore it doesn't make sense to me to do so, but my intention was to avoid imposing my own personal judgement on everyone else by preventing people from doing that. However, in that case I now realise that check-ignore should report a match, and my proposed patches do not do that. Perhaps that is what you meant by sweeping the problem under the rug? I wonder what would happen if we removed that path[0] check from the caller, not add the assume the top is never ignored workaround, and do something like this to the location that causes segv, so that it can give an answer when asked to hash an empty string? name-hash.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/name-hash.c b/name-hash.c index d8d25c2..942c459 100644 --- a/name-hash.c +++ b/name-hash.c @@ -24,11 +24,11 @@ static unsigned int hash_name(const char *name, int namelen) { unsigned int hash = 0x123; - do { + while (namelen--) { unsigned char c = *name++; c = icase_hash(c); hash = hash*101 + c; - } while (--namelen); + } return hash; } Yep, that makes sense - that's what I meant when I said I was wondering whether hash_name() should do stricter input validation. Does the callchain that goes down to this function have other places that assume they will never see an empty string, like this function does, which I _think_ is the real issue? Good question. In the absence of a proper audit for similar issues, it definitely makes sense to defensively program hash_name() (and any other low-level functions which accept path strings). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root
On Tue, Feb 19, 2013 at 02:03:01PM -0800, Junio C Hamano wrote: I started to suspect that may be the right approach. Why not do this? -- 8 -- From: Junio C Hamano gits...@pobox.com Date: Tue, 19 Feb 2013 11:56:44 -0800 Subject: [PATCH] name-hash: allow hashing an empty string Usually we do not pass an empty string to the function hash_name() because we almost always ask for hash values for a path that is a candidate to be added to the index. However, check-ignore (and most likely check-attr, but I didn't check) apparently has a callchain to ask the hash value for an empty path when it was given a . from the top-level directory to ask Is the path . excluded by default? According to a single gdb run, 'git check-attr -a .' does not hit hash_name() for me. However that naive experiment doesn't rule out the possibility of it happening if the right attributes are set, but I don't know enough about that code to comment. Make sure that hash_name() does not overrun the end of the given pathname even when it is empty. Remove a sweep-the-issue-under-the-rug conditional in check-ignore that avoided to pass an empty string to the callchain while at it. It is a valid question to ask for check-ignore if the top-level is set to be ignored by default, even though the answer is most likely Hmm, I see very little difference between the use of most likely and the use of the words much and typically which you previously considered a sure sign that the design of the fix is iffy. no, if only because there is currently no way to specify such an entry in the .gitignore file. But it is an unusual thing to ask and it is not worth optimizing for it by special casing at the top level of the call chain. Although I agree with your proposed patch's sentiment of avoiding sweeping this corner case under the rug, 'check-ignore .' still wouldn't match anything if for example './' was a supported mechanism for ignoring the top level. So, modulo the obvious advantages that defensive coding in hash_name() bring, I'm struggling to see a significant difference between any of the three patches proposed so far. All of them fix the segfault, and all would require the same amount of extra work in order to support matching against './'. But I don't have any strong objections either, so you have my approval for whichever patch you prefer to take. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html