Re: [regression?] trailing slash required in .gitattributes

2013-03-23 Thread Junio C Hamano
Jeff King  writes:

> Yeah, that is a possibility, though it involves casting away some
> constness. Patch is below, which seems to work.

Hmm, because this was after I read this part:

> ... match_basename, despite taking the length
> of all of the strings we pass it, will happily use NUL-terminated
> functions like strcmp or fnmatch.

I actually meant to do that inside match_basename(), not in one
particular caller of it.  Otherwise, new callers to match_basename()
will also suffer from this broken API that pretends as if it takes
counted strings but uses strings as NUL terminated, no?

> It still feels really ugly to me, and like match_basename is misdesigned
> and should respect the lengths we pass it.

Exactly.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression?] trailing slash required in .gitattributes

2013-03-23 Thread Jeff King
On Fri, Mar 22, 2013 at 04:08:08PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >   if (pathlen && pathname[pathlen-1] == '/')
> >   pathlen--;
> >
> > would work. But it seems that match_basename, despite taking the length
> > of all of the strings we pass it, will happily use NUL-terminated
> > functions like strcmp or fnmatch. Converting the former to check lengths
> > should be pretty straightforward. But there is no version of fnmatch
> > that does what we want. I wonder if we using wildmatch can get around
> > this limitation.
> 
> Or save away pathname[pathlen], temporarily NUL terminate and call
> these functions?

Yeah, that is a possibility, though it involves casting away some
constness. Patch is below, which seems to work.

It still feels really ugly to me, and like match_basename is misdesigned
and should respect the lengths we pass it. Also, if it does respect the
lengths, it should be able to go much faster (e.g., in the common case,
we can drop a ton of strcmp_icase calls if we just check the lengths
beforehand). I feel like Duy was working on something like this
recently, but I don't see anything in pu.

---
diff --git a/attr.c b/attr.c
index e2f9377..bd00a78 100644
--- a/attr.c
+++ b/attr.c
@@ -663,20 +663,58 @@ static int path_matches(const char *pathname, int pathlen,
 {
const char *pattern = pat->pattern;
int prefix = pat->nowildcardlen;
+   char path_munge = 0;
+   char pattern_munge = 0;
+   int ret;
 
if ((pat->flags & EXC_FLAG_MUSTBEDIR) &&
((!pathlen) || (pathname[pathlen-1] != '/')))
return 0;
 
+   /*
+* Drop trailing slash from path, as we would want
+* an unadorned pattern like "foo" to match both the
+* file "foo" and the directory "foo/".
+*/
+   if (pathlen && pathname[pathlen-1] == '/') {
+   pathlen--;
+
+   /*
+* The match_* functions, despite taking a string length, will
+* happily read all the way up to the NUL-terminating character.
+* So we must not only shrink pathlen, but munge the buffer
+* to NUL-terminate it.
+*/
+   path_munge = pathname[pathlen];
+   ((char *)pathname)[pathlen] = '\0';
+   }
+
+   /*
+* The pattern up to patternlen will not include a
+* trailing slash, but it may still be present in the string.
+* And since the match_* functions will read up to the NUL,
+* we need to terminate the buffer.
+*/
+   pattern_munge = pattern[pat->patternlen];
+   ((char *)pattern)[pat->patternlen] = '\0';
+
if (pat->flags & EXC_FLAG_NODIR) {
-   return match_basename(basename,
- pathlen - (basename - pathname),
- pattern, prefix,
- pat->patternlen, pat->flags);
-   }
-   return match_pathname(pathname, pathlen,
- base, baselen,
- pattern, prefix, pat->patternlen, pat->flags);
+   ret = match_basename(basename,
+pathlen - (basename - pathname),
+pattern, prefix,
+pat->patternlen, pat->flags);
+   }
+   else {
+   ret = match_pathname(pathname, pathlen,
+base, baselen,
+pattern, prefix,
+pat->patternlen, pat->flags);
+   }
+
+   if (path_munge)
+   ((char *)pathname)[pathlen] = path_munge;
+   ((char *)pattern)[pat->patternlen] = pattern_munge;
+   return ret;
 }
 
 static int macroexpand_one(int attr_nr, int rem);
diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh
index 0c847fb..3be809c 100755
--- a/t/t5002-archive-attr-pattern.sh
+++ b/t/t5002-archive-attr-pattern.sh
@@ -27,6 +27,10 @@ test_expect_success 'setup' '
echo ignored-only-if-dir/ export-ignore >>.git/info/attributes &&
git add ignored-only-if-dir &&
 
+   mkdir -p ignored-without-slash &&
+   echo ignored without slash >ignored-without-slash/foo &&
+   git add ignored-without-slash/foo &&
+   echo ignored-without-slash export-ignore >>.git/info/attributes &&
 
mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir &&
echo ignored by ignored dir 
>one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir &&
@@ -49,6 +53,8 @@ test_expect_missing   
archive/ignored-ony-if-dir/ignored-by-ignored-dir
 test_expect_exists archive/not-ignored-dir/
 test_expect_missingarchive/ignored-only-if-dir/
 test_expect_missingarchive/ignored-ony-if-dir/ignored-by-ignored-dir
+test_expect_missing archive/ignored-without-slash/ &&
+test_expect_missin

Re: [regression?] trailing slash required in .gitattributes

2013-03-22 Thread Duy Nguyen
On Sat, Mar 23, 2013 at 11:18:24AM +0700, Duy Nguyen wrote:
> You can use nwildmatch() from this patch. I tested it lightly with
> t3070-wildmatch.sh, feeding the strings with no terminating NUL. It
> seems to work ok.

And valgrind spotted my faults, especially for using strchr. You would
need this on top:

-- 8< --
diff --git a/wildmatch.c b/wildmatch.c
index f97ae2a..939bac8 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -61,9 +61,13 @@ static int dowild(const uchar *p, const uchar *text,
for ( ; (p_ch = *p) != '\0'; text++, p++) {
int matched, match_slash, negated;
uchar t_ch, prev_ch;
-   if (text >= textend && p_ch != '*')
-   return WM_ABORT_ALL;
-   t_ch = *text;
+   if (text >= textend) {
+   if (p_ch != '*')
+   return WM_ABORT_ALL;
+   else
+   t_ch = '\0';
+   } else
+   t_ch = *text;
if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
t_ch = tolower(t_ch);
if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
@@ -115,8 +119,9 @@ static int dowild(const uchar *p, const uchar *text,
/* Trailing "**" matches everything.  Trailing 
"*" matches
 * only if there are no more slash characters. 
*/
if (!match_slash) {
-   if (strchr((char*)text, '/') != NULL)
-   return WM_NOMATCH;
+   for (;text < textend; text++)
+   if (*text == '/')
+   return WM_NOMATCH;
}
return WM_MATCH;
} else if (!match_slash && *p == '/') {
@@ -125,10 +130,11 @@ static int dowild(const uchar *p, const uchar *text,
 * with WM_PATHNAME matches the next
 * directory
 */
-   const char *slash = strchr((char*)text, '/');
-   if (!slash)
+   for (;text < textend; text++)
+   if (*text == '/')
+   break;
+   if (text == textend)
return WM_NOMATCH;
-   text = (const uchar*)slash;
/* the slash is consumed by the top-level for 
loop */
break;
}
@@ -151,7 +157,7 @@ static int dowild(const uchar *p, const uchar *text,
t_ch = tolower(t_ch);
if (t_ch == p_ch)
break;
-   t_ch = *++text;
+   t_ch = ++text < textend ? *text 
: '\0';
}
if (t_ch != p_ch)
return WM_NOMATCH;
-- 8< --
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression?] trailing slash required in .gitattributes

2013-03-22 Thread Duy Nguyen
On Fri, Mar 22, 2013 at 06:24:39PM -0400, Jeff King wrote:
> I'm having trouble figuring out the right solution for this.

Thanks for looking into this. It was on my todo list, but you beat me
to it :)

> But then here we'll end up feeding "foo/" to be compared with "foo",
> which we don't want. For a pattern "foo", we want to match _either_
> "foo/" or "foo". So you'd think something like:
> 
>   if (pathlen && pathname[pathlen-1] == '/')
>   pathlen--;
> 
> would work. But it seems that match_basename, despite taking the length
> of all of the strings we pass it, will happily use NUL-terminated
> functions like strcmp or fnmatch. Converting the former to check lengths
> should be pretty straightforward. But there is no version of fnmatch
> that does what we want. I wonder if we using wildmatch can get around
> this limitation.

You can use nwildmatch() from this patch. I tested it lightly with
t3070-wildmatch.sh, feeding the strings with no terminating NUL. It
seems to work ok.

-- 8< --
Subject: [PATCH] wildmatch: do not require "text" to be NUL-terminated

This may be helpful when we just want to match a part of "text".
nwildmatch can be used for this purpose.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 wildmatch.c | 25 +
 wildmatch.h | 11 +--
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index 7192bdc..f97ae2a 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -52,7 +52,8 @@ typedef unsigned char uchar;
 #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
 
 /* Match pattern "p" against "text" */
-static int dowild(const uchar *p, const uchar *text, unsigned int flags)
+static int dowild(const uchar *p, const uchar *text,
+ const uchar *textend, unsigned int flags)
 {
uchar p_ch;
const uchar *pattern = p;
@@ -60,8 +61,9 @@ static int dowild(const uchar *p, const uchar *text, unsigned 
int flags)
for ( ; (p_ch = *p) != '\0'; text++, p++) {
int matched, match_slash, negated;
uchar t_ch, prev_ch;
-   if ((t_ch = *text) == '\0' && p_ch != '*')
+   if (text >= textend && p_ch != '*')
return WM_ABORT_ALL;
+   t_ch = *text;
if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
t_ch = tolower(t_ch);
if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
@@ -101,7 +103,7 @@ static int dowild(const uchar *p, const uchar *text, 
unsigned int flags)
 * both foo/bar and foo/a/bar.
 */
if (p[0] == '/' &&
-   dowild(p + 1, text, flags) == 
WM_MATCH)
+   dowild(p + 1, text, textend, flags) 
== WM_MATCH)
return WM_MATCH;
match_slash = 1;
} else
@@ -130,9 +132,7 @@ static int dowild(const uchar *p, const uchar *text, 
unsigned int flags)
/* the slash is consumed by the top-level for 
loop */
break;
}
-   while (1) {
-   if (t_ch == '\0')
-   break;
+   while (text < textend) {
/*
 * Try to advance faster when an asterisk is
 * followed by a literal. We know in this case
@@ -145,18 +145,18 @@ static int dowild(const uchar *p, const uchar *text, 
unsigned int flags)
p_ch = *p;
if ((flags & WM_CASEFOLD) && 
ISUPPER(p_ch))
p_ch = tolower(p_ch);
-   while ((t_ch = *text) != '\0' &&
+   while (text < textend &&
   (match_slash || t_ch != '/')) {
if ((flags & WM_CASEFOLD) && 
ISUPPER(t_ch))
t_ch = tolower(t_ch);
if (t_ch == p_ch)
break;
-   text++;
+   t_ch = *++text;
}
if (t_ch != p_ch)
return WM_NOMATCH;
}
-   if ((matched = dowild(p, text, flags)) != 
WM_NOMATCH) {
+   if ((matched = dowild(p, text, textend, flags)) 
!= WM_NOMATCH) {
 

Re: [regression?] trailing slash required in .gitattributes

2013-03-22 Thread Junio C Hamano
Jeff King  writes:

>   if (pathlen && pathname[pathlen-1] == '/')
>   pathlen--;
>
> would work. But it seems that match_basename, despite taking the length
> of all of the strings we pass it, will happily use NUL-terminated
> functions like strcmp or fnmatch. Converting the former to check lengths
> should be pretty straightforward. But there is no version of fnmatch
> that does what we want. I wonder if we using wildmatch can get around
> this limitation.

Or save away pathname[pathlen], temporarily NUL terminate and call
these functions?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression?] trailing slash required in .gitattributes

2013-03-22 Thread Jeff King
On Tue, Mar 19, 2013 at 02:10:42PM -0400, Jeff King wrote:

> > The issue bisects to 94bc671 (Add directory pattern matching to
> > attributes, 2012-12-08). That commit actually tests not only that
> > "subdir/" matches, but also that just "subdir" does not match.
> [...]
> So I think the regression is accidental. And we would want tests like
> this on top (which currently fail):
> [...]

I'm having trouble figuring out the right solution for this.

The problem is in path_matches, which used to receive just the unadorned
pathname, and now receives "path/" for directories. It now looks like
this:

> static int path_matches(const char *pathname, int pathlen,
>   const char *basename,
>   const struct pattern *pat,
>   const char *base, int baselen)
> {
>   const char *pattern = pat->pattern;
>   int prefix = pat->nowildcardlen;
> 
>   if ((pat->flags & EXC_FLAG_MUSTBEDIR) &&
>   ((!pathlen) || (pathname[pathlen-1] != '/')))
>   return 0;

This first stanza checks that a pattern like "foo/" must be matched by a
real directory. Which is fine; that's the point of adding the "/" to the
pattern.

>   if (pat->flags & EXC_FLAG_NODIR) {
>   return match_basename(basename,
> pathlen - (basename - pathname),
> pattern, prefix,
> pat->patternlen, pat->flags);
>   }
>   return match_pathname(pathname, pathlen,
> base, baselen,
> pattern, prefix, pat->patternlen, pat->flags);
> }

But then here we'll end up feeding "foo/" to be compared with "foo",
which we don't want. For a pattern "foo", we want to match _either_
"foo/" or "foo". So you'd think something like:

  if (pathlen && pathname[pathlen-1] == '/')
  pathlen--;

would work. But it seems that match_basename, despite taking the length
of all of the strings we pass it, will happily use NUL-terminated
functions like strcmp or fnmatch. Converting the former to check lengths
should be pretty straightforward. But there is no version of fnmatch
that does what we want. I wonder if we using wildmatch can get around
this limitation.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression?] trailing slash required in .gitattributes

2013-03-19 Thread Jeff King
On Tue, Mar 19, 2013 at 01:57:56PM -0400, Jeff King wrote:

> Prior to v1.8.1.1, if I did this:
> 
>   git init
>   echo content >foo &&
>   mkdir subdir &&
>   echo content >subdir/bar &&
>   echo "subdir export-ignore" >.gitattributes
>   git add . &&
>   git commit -m one &&
>   git archive HEAD | tar tf -
> 
> my archive would contain only "foo" and ".gitattributes", not subdir. As
> of v1.8.1.1, the attribute on subdir is ignored unless it is written
> with a trailing slash, like:
> 
>   subdir/ export-ignore
> 
> The issue bisects to 94bc671 (Add directory pattern matching to
> attributes, 2012-12-08). That commit actually tests not only that
> "subdir/" matches, but also that just "subdir" does not match.

Sorry, I mis-read the tests. They are not testing that "subdir" does not
work, only that "subdir/" will match only a directory, not a regular
file. Which does make sense.

So I think the regression is accidental. And we would want tests like
this on top (which currently fail):

diff --git a/t/t5002-archive-attr-pattern.sh b/t/t5002-archive-attr-pattern.sh
index 0c847fb..3be809c 100755
--- a/t/t5002-archive-attr-pattern.sh
+++ b/t/t5002-archive-attr-pattern.sh
@@ -27,6 +27,10 @@ test_expect_success 'setup' '
echo ignored-only-if-dir/ export-ignore >>.git/info/attributes &&
git add ignored-only-if-dir &&
 
+   mkdir -p ignored-without-slash &&
+   echo ignored without slash >ignored-without-slash/foo &&
+   git add ignored-without-slash/foo &&
+   echo ignored-without-slash export-ignore >>.git/info/attributes &&
 
mkdir -p one-level-lower/two-levels-lower/ignored-only-if-dir &&
echo ignored by ignored dir 
>one-level-lower/two-levels-lower/ignored-only-if-dir/ignored-by-ignored-dir &&
@@ -49,6 +53,8 @@ test_expect_missing   
archive/ignored-ony-if-dir/ignored-by-ignored-dir
 test_expect_exists archive/not-ignored-dir/
 test_expect_missingarchive/ignored-only-if-dir/
 test_expect_missingarchive/ignored-ony-if-dir/ignored-by-ignored-dir
+test_expect_missing archive/ignored-without-slash/ &&
+test_expect_missing archive/ignored-without-slash/foo &&
 test_expect_exists archive/one-level-lower/
 test_expect_missing
archive/one-level-lower/two-levels-lower/ignored-only-if-dir/
 test_expect_missing
archive/one-level-lower/two-levels-lower/ignored-ony-if-dir/ignored-by-ignored-dir
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression?] trailing slash required in .gitattributes

2013-03-19 Thread Junio C Hamano
Jeff King  writes:

> Prior to v1.8.1.1, if I did this:
>
>   git init
>   echo content >foo &&
>   mkdir subdir &&
>   echo content >subdir/bar &&
>   echo "subdir export-ignore" >.gitattributes
>   git add . &&
>   git commit -m one &&
>   git archive HEAD | tar tf -
>
> my archive would contain only "foo" and ".gitattributes", not subdir. As
> of v1.8.1.1, the attribute on subdir is ignored unless it is written
> with a trailing slash, like:
>
>   subdir/ export-ignore
>
> The issue bisects to 94bc671 (Add directory pattern matching to
> attributes, 2012-12-08). That commit actually tests not only that
> "subdir/" matches, but also that just "subdir" does not match.
>
> The commit message there is vague about the reasoning, but my
> understanding is that it was meant to harmonize gitignore and
> gitattributes, the former of which can take "dir/". I don't have a
> problem with offering "dir/" to match only directories, but what is the
> point in disallowing just "dir" to match a directory?
>
> It seems like a pointless regression to me, but I'm not clear whether it
> was intentional or not (and if it was intentional, I think we would need
> to handle it with a proper transition period, not in a maint release).

I agree that is a pointless regression.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html