[PATCH/RFC] attr: allow pattern escape using backslash

2012-11-11 Thread Nguyễn Thái Ngọc Duy
.gitattributes pattern syntax is supposed to be the same as .gitignore
(except a few things that do not make sense in attr context, but
that's a different issue). .gitignore uses fnmatch() as the matching
machinery and \ is accepted as an escape code. In theory the pattern
'foo\ bar' should match path 'foo bar' in .gitignore. Granted, no one
would write 'foo\ bar' in .gitignore when 'foo bar' should
suffice.

Regardless, 'foo\ bar attr' does not (but should) attach attr to
path foo bar because pattern/attr parse code does not understand
backslash escape. It parses the line as path 'foo\' and attributes
'bar' and 'attr'. This patch teaches attr code to recognize the
backslash in patterns (not macro names) and pass 'foo\ bar' down to
fnmatch().

This changes the attr behavior. foo\ attr, if exists in the field,
would match nothing because path foo\ is invalid in UNIX and is a
directory in Windows, which we do not accept attaching attributes
to. With this patch, that line becomes invalid and is rejected. So
it's not really bad (i.e. no silent changes in behavior).

Other subtle behavioral changes may happen. Still, I think we should
do this as it seems like a correct thing to do.

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 It was posted [1] during rc cycles (I think) and initial feedback
 from Junio was not utterly wrong. I still think this is worth
 doing, hence this resend for discussion.

 [1] http://thread.gmane.org/gmane.comp.version-control.git/207135

 attr.c| 12 +++-
 t/t0003-attributes.sh |  5 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 887a9ae..173d51d 100644
--- a/attr.c
+++ b/attr.c
@@ -221,8 +221,18 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
return NULL;
}
}
-   else
+   else {
is_macro = 0;
+   namelen = 0;
+   while (name[namelen] != '\0' 
+  !strchr(blank, name[namelen])) {
+   if (name[namelen] == '\\' 
+   name[namelen + 1] != '\0')
+   namelen += 2;
+   else
+   namelen++;
+   }
+   }
 
states = name + namelen;
states += strspn(states, blank);
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index febc45c..16b419e 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -24,6 +24,7 @@ test_expect_success 'setup' '
echo offon -test test
echo no notest
echo A/e/F test=A/e/F
+   echo A\\ b test=space
) .gitattributes 
(
echo g test=a/g 
@@ -196,6 +197,10 @@ test_expect_success 'root subdir attribute test' '
attr_check subdir/a/i unspecified
 '
 
+test_expect_success 'quoting in pattern' '
+   attr_check A b space
+'
+
 test_expect_success 'setup bare' '
git clone --bare . bare.git 
cd bare.git
-- 
1.7.12.1.406.g6ab07c4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] attr: allow pattern escape using backslash

2012-10-06 Thread Nguyễn Thái Ngọc Duy
.gitattributes pattern syntax is supposed to be the same as .gitignore
(except a few things that do not make sense in attr context, but
that's a different issue). .gitignore uses fnmatch() as the matching
machinery and \ is accepted as an escape code. In theory the pattern
'foo\ bar' should match path 'foo bar' in .gitignore. Granted, no one
would write 'foo\ bar' in .gitignore when 'foo bar' should
suffice.

Regardless, 'foo\ bar attr' does not (but should) attach attr to
path foo bar because pattern/attr parse code does not understand
backslash escape. It parses the line as path 'foo\' and attributes
'bar' and 'attr'. This patch teaches attr code to recognize the
backslash in patterns (not macro names) and pass 'foo\ bar' down to
fnmatch().

This changes the attr behavior. foo\ attr, if exists in the field,
would match nothing because path foo\ is invalid in UNIX and is a
directory in Windows, which we do not accept attaching attributes
to. With this patch, that line becomes invalid and is rejected. So
it's not really bad (i.e. no silent changes in behavior).

Other subtle behavioral changes may happen. Still, I think we should
do this as it seems like a correct thing to do.

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 We discussed the spaces in path names in attr before and I remember
 using quotes or double quotes, even substituting spaces with
 dots, were raised. I don't remember if backslashes were mentioned.

 attr.c| 12 +++-
 t/t0003-attributes.sh |  5 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 887a9ae..173d51d 100644
--- a/attr.c
+++ b/attr.c
@@ -221,8 +221,18 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
return NULL;
}
}
-   else
+   else {
is_macro = 0;
+   namelen = 0;
+   while (name[namelen] != '\0' 
+  !strchr(blank, name[namelen])) {
+   if (name[namelen] == '\\' 
+   name[namelen + 1] != '\0')
+   namelen += 2;
+   else
+   namelen++;
+   }
+   }
 
states = name + namelen;
states += strspn(states, blank);
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index febc45c..16b419e 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -24,6 +24,7 @@ test_expect_success 'setup' '
echo offon -test test
echo no notest
echo A/e/F test=A/e/F
+   echo A\\ b test=space
) .gitattributes 
(
echo g test=a/g 
@@ -196,6 +197,10 @@ test_expect_success 'root subdir attribute test' '
attr_check subdir/a/i unspecified
 '
 
+test_expect_success 'quoting in pattern' '
+   attr_check A b space
+'
+
 test_expect_success 'setup bare' '
git clone --bare . bare.git 
cd bare.git
-- 
1.7.12.1.406.g6ab07c4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] attr: allow pattern escape using backslash

2012-10-06 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

 .gitattributes pattern syntax is supposed to be the same as .gitignore
 (except a few things that do not make sense in attr context, but
 that's a different issue). .gitignore uses fnmatch() as the matching
 machinery and \ is accepted as an escape code. In theory the pattern
 'foo\ bar' should match path 'foo bar' in .gitignore. Granted, no one
 would write 'foo\ bar' in .gitignore when 'foo bar' should
 suffice.

 Regardless, 'foo\ bar attr' does not (but should) attach attr to
 path foo bar because pattern/attr parse code does not understand
 backslash escape. It parses the line as path 'foo\' and attributes
 'bar' and 'attr'. This patch teaches attr code to recognize the
 backslash in patterns (not macro names) and pass 'foo\ bar' down to
 fnmatch().

 This changes the attr behavior. foo\ attr, if exists in the field,
 would match nothing because path foo\ is invalid in UNIX and is a
 directory in Windows, which we do not accept attaching attributes
 to. With this patch, that line becomes invalid and is rejected. So
 it's not really bad (i.e. no silent changes in behavior).

 Other subtle behavioral changes may happen. Still, I think we should
 do this as it seems like a correct thing to do.

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
 ---
  We discussed the spaces in path names in attr before and I remember
  using quotes or double quotes, even substituting spaces with
  dots, were raised. I don't remember if backslashes were mentioned.

My knee-jerk reaction, without thinking things through, is that this
makes sense and doing the same for .gitignore would, too (even
though the one-item-per-line nature of .gitignore makes \  and  
practically equivalent).

Shouldn't we do the same for quoting fnmatch(3) metacharacters?  To
match a path component 'a' followed by an asterisk followed by 'b',
you could then write 'a\*b'.  Same for quoting the backslash itself.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] attr: allow pattern escape using backslash

2012-10-06 Thread Nguyen Thai Ngoc Duy
On Sat, Oct 6, 2012 at 2:54 PM, Junio C Hamano gits...@pobox.com wrote:
 Shouldn't we do the same for quoting fnmatch(3) metacharacters?  To
 match a path component 'a' followed by an asterisk followed by 'b',
 you could then write 'a\*b'.  Same for quoting the backslash itself.

I think my patch does that too. The thing it does not do is optimize
this case so that we can do strcmp() instead of fnmatch() if the
entire pattern does not contain any metacharacters but backslashes. I
don't think it'll become popular enough to deserve an optimization.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html