Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root

2013-02-20 Thread Adam Spiers
On Tue, Feb 19, 2013 at 06:47:23PM -0800, Junio C Hamano wrote:
 Adam Spiers g...@adamspiers.org writes:
  Remove a sweep-the-issue-under-the-rug conditional in check-ignore
  that avoided to pass an empty string to the callchain while at it.
  It is a valid question to ask for check-ignore if the top-level is
  set to be ignored by default, even though the answer is most likely
  no, if only because there is currently no way to specify such an
 
  Hmm, I see very little difference between the use of most likely and
  the use of the words much and typically which you previously
  considered a sure sign that the design of the fix is iffy.
 
 Your patch were The reason why feeding empty string upsets
   ^^
patches were, or patch was?  It's not clear which patch(es) you're
referring to.

 hash_name() were not investigated; by punting the '.' as input, and
 ignoring the possibility that such a question might make sense, I
 can work around the segfault.

I don't see how explicitly referring to the possibility can be counted
as ignoring it.

 I do not even question if hash_name()
 that misbehaves on an empty string is a bug. Just make sure we do
 not tickle the function with a problematic input.

Presumably the I here refers to anthropomorphized commit message
rather than to me personally, since I did question hash_name()'s
behaviour several times already.

 The patch you are responding to declares that hash_name() should
 work sensibly on an empty string, and that is the _only_ necessary
 change for the fix.  We could keep  path[0], but even without
 it, by fixing the hash_name(), we will no longer segfault.

Yes, and as already stated, I agree that is a good thing.

 My most likely is about the special case ' path[0]' produces
 correct result,

Sorry, I can't understand this.  You are paraphrasing something and
placing it inside  quotes, but I can't find the corresponding
source.  I presumed it refers to this extract of your proposed patch's
commit message:

   Remove a sweep-the-issue-under-the-rug conditional in check-ignore
that avoided to pass an empty string to the callchain while at it.
It is a valid question to ask for check-ignore if the top-level is
set to be ignored by default, even though the answer is most
likely no

but I can't reconcile this extract with the paraphrase the special
case ' path[0]' produces correct result.

 and it is likely to stay so in the near future until
 we update .gitignore format to allow users to say 'ignore the top by
 default', which is not likely to happen soon.  It is not about the
 nature of the fix at all.

 Still do not see the difference?

I think I *might* be beginning to see you were getting at, although my
understanding is still clouded by the ambiguities detailed above.  Is
your point that the use of words like 'much' and 'typically' are a
sure sign of iffy design _when_used_to_talk_about_fixes_ but not
necessarily in other contexts?  If so then it makes a bit more sense
to me, even though I tend to disagree with such broadly sweeping
generalizations, especially when the qualifying context is missing.

That aside, your idea of looking out for bad smells not only in code
but also in the spoken language contained by commit messages and
design discussions is an interesting one.  I will try to bear that
technique in mind more consciously in the future, and see how well it
serves me.

 The removal of the  path[0] is about allowing such a question
 whose likeliness may be remote.  In the current .gitignore format,
 you may not be able to say ignore the whole thing by default, so
 in that sense, the answer to the question this part of the code is
 asking when given . may always be no.  Keeping the  path[0]
 will optimize for that case.
 
 And unusual thing to ask below is to judge if answering such a
 question is worth optimizing for (the verdict is no, it is not a
 common thing to do).

Yes, I understand and agree with these paragraphs.

  entry in the .gitignore file. But it is an unusual thing to ask and
  it is not worth optimizing for it by special casing at the top level
  of the call chain.
 
  Although I agree with your proposed patch's sentiment of avoiding
  sweeping this corner case under the rug, 'check-ignore .' still
  wouldn't match anything if for example './' was a supported mechanism
  for ignoring the top level.
 
 It indicates that there may be more bugs (that may not result in
 segv) somewhere in check-ignore codepath, if (1)
 
   echo ./ .gitignore

 were to say ignore everything in the tree by default., and (2) the
 real ignore check does work that way, but (3)
 
   git check-ignore .
 
 says we do not ignore that one.

Yes, I think we are saying exactly the same thing here, although if
It indicates that [...] refers to your proposed patch's commit
message then I don't think it indicates the possibility of these bugs
in the most obvious or explicit way.

 Such a bug may come from some
 code that is not 

[PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root

2013-02-19 Thread Adam Spiers
Fix a corner case where check-ignore would segfault when run with the
'.' argument from the top level of a repository, due to prefix_path()
converting '.' into the empty string.  It doesn't make much sense to
call check-ignore from the top level with '.' as a parameter, since
the top-level directory would never typically be ignored, but of
course it should not segfault in this case.  The existing code
attempted to check for this case but failed due to using the wrong
variable.  Instead we move the check to last_exclude_matching_path(),
in case other callers present or future have a similar issue.

Signed-off-by: Adam Spiers g...@adamspiers.org
---
 builtin/check-ignore.c | 2 +-
 dir.c  | 8 
 t/t0008-ignores.sh | 5 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 709535c..0240f99 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -89,7 +89,7 @@ static int check_ignore(const char *prefix, const char 
**pathspec)
? strlen(prefix) : 0, path);
full_path = check_path_for_gitlink(full_path);
die_if_path_beyond_symlink(full_path, prefix);
-   if (!seen[i]  path[0]) {
+   if (!seen[i]) {
exclude = last_exclude_matching_path(check, full_path,
 -1, dtype);
if (exclude) {
diff --git a/dir.c b/dir.c
index 57394e4..1ae0b90 100644
--- a/dir.c
+++ b/dir.c
@@ -828,6 +828,14 @@ struct exclude *last_exclude_matching_path(struct 
path_exclude_check *check,
struct exclude *exclude;
 
/*
+* name could be the empty string, e.g. if check-ignore was
+* invoked from the top level with '.', prefix_path() will
+* convert it into .
+*/
+   if (!*name)
+   return NULL;
+
+   /*
 * we allow the caller to pass namelen as an optimization; it
 * must match the length of the name, as we eventually call
 * is_excluded() on the whole name string.
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index ebe7c70..9c1bde1 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -138,6 +138,7 @@ test_expect_success 'setup' '
cat -\EOF .gitignore 
one
ignored-*
+   top-level-dir/
EOF
for dir in . a
do
@@ -177,6 +178,10 @@ test_expect_success 'setup' '
 #
 # test invalid inputs
 
+test_expect_success_multi '. corner-case' '' '
+   test_check_ignore . 1
+'
+
 test_expect_success_multi 'empty command line' '' '
test_check_ignore  128 
stderr_contains fatal: no path specified
-- 
1.8.2.rc0.18.g543d1e4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root

2013-02-19 Thread Junio C Hamano
Adam Spiers g...@adamspiers.org writes:

 Fix a corner case where check-ignore would segfault when run with the
 '.' argument from the top level of a repository, due to prefix_path()
 converting '.' into the empty string.

The description does not match what I understand is happening from
the original report, though.  The above is more like this, no?

When check-ignore is run with the '.' argument from the top level of
a repository, it fed an empty string to hash_name() in name-hash.c
and caused a segfault, as the function kept reading forever past the
end of the string.

A point to note is that it is not cleaer why it is a corner case to
ask about a pathspec ..  It is a valid question Is the whole tree
ignored by default?, isn't it?

 It doesn't make much sense to
 call check-ignore from the top level with '.' as a parameter, since
 the top-level directory would never typically be ignored,

And this sounds like a really bad excuse.  If it were it does not
make *any* sense ... because the top level is *never* ignored, then
the patch is a perfectly fine optimization that happens to work
around the problem, but the use of much and typically is a sure
sign that the design of the fix is iffy.  It also shows that the
patch is not a fix, but is sweeping the problem under the rug, if
there were a valid use case to set the top level to be ignored.

I wonder what would happen if we removed that  path[0] check
from the caller, not add the assume the top is never ignored
workaround, and do something like this to the location that causes
segv, so that it can give an answer when asked to hash an empty
string?

Does the callchain that goes down to this function have other places
that assume they will never see an empty string, like this function
does, which I _think_ is the real issue?

 name-hash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index d8d25c2..942c459 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -24,11 +24,11 @@ static unsigned int hash_name(const char *name, int namelen)
 {
unsigned int hash = 0x123;
 
-   do {
+   while (namelen--) {
unsigned char c = *name++;
c = icase_hash(c);
hash = hash*101 + c;
-   } while (--namelen);
+   }
return hash;
 }
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root

2013-02-19 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 And this sounds like a really bad excuse.  If it were it does not
 make *any* sense ... because the top level is *never* ignored, then
 the patch is a perfectly fine optimization that happens to work
 around the problem, but the use of much and typically is a sure
 sign that the design of the fix is iffy.  It also shows that the
 patch is not a fix, but is sweeping the problem under the rug, if
 there were a valid use case to set the top level to be ignored.

 I wonder what would happen if we removed that  path[0] check
 from the caller, not add the assume the top is never ignored
 workaround, and do something like this to the location that causes
 segv, so that it can give an answer when asked to hash an empty
 string?

 Does the callchain that goes down to this function have other places
 that assume they will never see an empty string, like this function
 does, which I _think_ is the real issue?

I started to suspect that may be the right approach.  Why not do this?

-- 8 --
From: Junio C Hamano gits...@pobox.com
Date: Tue, 19 Feb 2013 11:56:44 -0800
Subject: [PATCH] name-hash: allow hashing an empty string

Usually we do not pass an empty string to the function hash_name()
because we almost always ask for hash values for a path that is a
candidate to be added to the index. However, check-ignore (and most
likely check-attr, but I didn't check) apparently has a callchain
to ask the hash value for an empty path when it was given a . from
the top-level directory to ask Is the path . excluded by default?

Make sure that hash_name() does not overrun the end of the given
pathname even when it is empty.

Remove a sweep-the-issue-under-the-rug conditional in check-ignore
that avoided to pass an empty string to the callchain while at it.
It is a valid question to ask for check-ignore if the top-level is
set to be ignored by default, even though the answer is most likely
no, if only because there is currently no way to specify such an
entry in the .gitignore file. But it is an unusual thing to ask and
it is not worth optimizing for it by special casing at the top level
of the call chain.

Signed-off-by: Adam Spiers g...@adamspiers.org
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/check-ignore.c | 2 +-
 name-hash.c| 4 ++--
 t/t0008-ignores.sh | 5 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 709535c..0240f99 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -89,7 +89,7 @@ static int check_ignore(const char *prefix, const char 
**pathspec)
? strlen(prefix) : 0, path);
full_path = check_path_for_gitlink(full_path);
die_if_path_beyond_symlink(full_path, prefix);
-   if (!seen[i]  path[0]) {
+   if (!seen[i]) {
exclude = last_exclude_matching_path(check, full_path,
 -1, dtype);
if (exclude) {
diff --git a/name-hash.c b/name-hash.c
index d8d25c2..942c459 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -24,11 +24,11 @@ static unsigned int hash_name(const char *name, int namelen)
 {
unsigned int hash = 0x123;
 
-   do {
+   while (namelen--) {
unsigned char c = *name++;
c = icase_hash(c);
hash = hash*101 + c;
-   } while (--namelen);
+   }
return hash;
 }
 
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index ebe7c70..9c1bde1 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -138,6 +138,7 @@ test_expect_success 'setup' '
cat -\EOF .gitignore 
one
ignored-*
+   top-level-dir/
EOF
for dir in . a
do
@@ -177,6 +178,10 @@ test_expect_success 'setup' '
 #
 # test invalid inputs
 
+test_expect_success_multi '. corner-case' '' '
+   test_check_ignore . 1
+'
+
 test_expect_success_multi 'empty command line' '' '
test_check_ignore  128 
stderr_contains fatal: no path specified
-- 
1.8.2.rc0.89.g6e4b41d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root

2013-02-19 Thread Adam Spiers
On Tue, Feb 19, 2013 at 7:59 PM, Junio C Hamano gits...@pobox.com wrote:
 Adam Spiers g...@adamspiers.org writes:

 Fix a corner case where check-ignore would segfault when run with the
 '.' argument from the top level of a repository, due to prefix_path()
 converting '.' into the empty string.

 The description does not match what I understand is happening from
 the original report, though.

Why not?

 The above is more like this, no?

 When check-ignore is run with the '.' argument from the top level of
 a repository, it fed an empty string to hash_name() in name-hash.c
 and caused a segfault, as the function kept reading forever past the
 end of the string.

The only difference I can see between the two is that yours has
changed the phrase order and gives a bit more information.  I don't
see any disagreement between them though.

 A point to note is that it is not cleaer why it is a corner case to
 ask about a pathspec ..  It is a valid question Is the whole tree
 ignored by default?, isn't it?

It's probably valid, but I'm not the best judge of that.  It doesn't
make much sense to me why anyone would do that, and it seems very
unlikely it would be a common use case, therefore it's a corner case.
The next sentence then qualifies that:

 It doesn't make much sense to
 call check-ignore from the top level with '.' as a parameter, since
 the top-level directory would never typically be ignored,

 And this sounds like a really bad excuse.

An excuse for what?  The final part of that sentence which you trimmed
made it clear that it was not an excuse for the segfault.  Your choice
of wording here sounds more like a personal attack than a technical
discussion - presumably unintentional, so I'll choose not to take
offense.

 If it were it does not
 make *any* sense ... because the top level is *never* ignored, then
 the patch is a perfectly fine optimization that happens to work
 around the problem, but the use of much and typically is a sure
 sign that the design of the fix is iffy.  It also shows that the
 patch is not a fix, but is sweeping the problem under the rug, if
 there were a valid use case to set the top level to be ignored.

There *may* be a valid use case to set the top level to be ignored.
*I* can't think of one personally, therefore it doesn't make sense to
me to do so, but my intention was to avoid imposing my own personal
judgement on everyone else by preventing people from doing that.
However, in that case I now realise that check-ignore should report a
match, and my proposed patches do not do that.  Perhaps that is what
you meant by sweeping the problem under the rug?

 I wonder what would happen if we removed that  path[0] check
 from the caller, not add the assume the top is never ignored
 workaround, and do something like this to the location that causes
 segv, so that it can give an answer when asked to hash an empty
 string?

  name-hash.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/name-hash.c b/name-hash.c
 index d8d25c2..942c459 100644
 --- a/name-hash.c
 +++ b/name-hash.c
 @@ -24,11 +24,11 @@ static unsigned int hash_name(const char *name, int 
 namelen)
  {
 unsigned int hash = 0x123;

 -   do {
 +   while (namelen--) {
 unsigned char c = *name++;
 c = icase_hash(c);
 hash = hash*101 + c;
 -   } while (--namelen);
 +   }
 return hash;
  }

Yep, that makes sense - that's what I meant when I said I was
wondering whether hash_name() should do stricter input validation.

 Does the callchain that goes down to this function have other places
 that assume they will never see an empty string, like this function
 does, which I _think_ is the real issue?

Good question.  In the absence of a proper audit for similar issues,
it definitely makes sense to defensively program hash_name() (and any
other low-level functions which accept path strings).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] check-ignore.c, dir.c: fix segfault with '.' argument from repo root

2013-02-19 Thread Adam Spiers
On Tue, Feb 19, 2013 at 02:03:01PM -0800, Junio C Hamano wrote:
 I started to suspect that may be the right approach.  Why not do this?
 
 -- 8 --
 From: Junio C Hamano gits...@pobox.com
 Date: Tue, 19 Feb 2013 11:56:44 -0800
 Subject: [PATCH] name-hash: allow hashing an empty string
 
 Usually we do not pass an empty string to the function hash_name()
 because we almost always ask for hash values for a path that is a
 candidate to be added to the index. However, check-ignore (and most
 likely check-attr, but I didn't check) apparently has a callchain
 to ask the hash value for an empty path when it was given a . from
 the top-level directory to ask Is the path . excluded by default?

According to a single gdb run, 'git check-attr -a .' does not hit
hash_name() for me.  However that naive experiment doesn't rule out
the possibility of it happening if the right attributes are set, but I
don't know enough about that code to comment.

 Make sure that hash_name() does not overrun the end of the given
 pathname even when it is empty.
 
 Remove a sweep-the-issue-under-the-rug conditional in check-ignore
 that avoided to pass an empty string to the callchain while at it.
 It is a valid question to ask for check-ignore if the top-level is
 set to be ignored by default, even though the answer is most likely

Hmm, I see very little difference between the use of most likely and
the use of the words much and typically which you previously
considered a sure sign that the design of the fix is iffy.

 no, if only because there is currently no way to specify such an
 entry in the .gitignore file. But it is an unusual thing to ask and
 it is not worth optimizing for it by special casing at the top level
 of the call chain.

Although I agree with your proposed patch's sentiment of avoiding
sweeping this corner case under the rug, 'check-ignore .' still
wouldn't match anything if for example './' was a supported mechanism
for ignoring the top level.  So, modulo the obvious advantages that
defensive coding in hash_name() bring, I'm struggling to see a
significant difference between any of the three patches proposed so
far.  All of them fix the segfault, and all would require the same
amount of extra work in order to support matching against './'.

But I don't have any strong objections either, so you have my approval
for whichever patch you prefer to take.  Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html