Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Carlos Martín Nieto
On Thu, 2015-04-16 at 17:03 +0200, Johannes Schindelin wrote:
 Hi Carlos,
 
 On 2015-04-16 16:05, Carlos Martín Nieto wrote:
  Some text editors like Notepad or LibreOffice write an UTF-8 BOM in
  order to indicate that the file is Unicode text rather than whatever the
  current locale would indicate.
  
  If someone uses such an editor to edit a gitignore file, we are left
  with those three bytes at the beginning of the file. If we do not skip
  them, we will attempt to match a filename with the BOM as prefix, which
  won't match the files the user is expecting.
  
  ---
  
  If you're wondering how I came up with LibreOffice, I was doing a
  workshop recently and one of the participants was not content with the
  choice of vim or nano, so he opened LibreOffice to edit the gitignore
  file with confusing consequences.
  
  This codepath doesn't go as far as the config code in validating that
  we do not have a partial BOM which would mean there's some invalid
  content, but we don't really have invalid content any other way, as
  we're just dealing with a list of paths in the file.
 
 Yeah, users are entertaining!
 
 I agree that this is a good patch. *Maybe* we would need the same handling in 
 more places, in which case it might make sense to refactor the test into its 
 own function.

Yes, this was my train of thought. If we (discover that) need it in a
third place, then we can unify the test and skip. For two places I
reckoned it was fine if we duplicated a bit.

 
 In any case, though, the Git project requires a [Developer's Certificate of 
 Origin](https://github.com/git/git/blob/v2.3.5/Documentation/SubmittingPatches#L234-L277);
  Would you mind adding that?
 

Yeah, sorry. I keep forgetting to do that. I'll reply to my original
e-mail with it.

Cheers,
   cmn


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Carlos Martín Nieto
On Thu, 2015-04-16 at 16:05 +0200, Carlos Martín Nieto wrote:
 Some text editors like Notepad or LibreOffice write an UTF-8 BOM in
 order to indicate that the file is Unicode text rather than whatever the
 current locale would indicate.
 
 If someone uses such an editor to edit a gitignore file, we are left
 with those three bytes at the beginning of the file. If we do not skip
 them, we will attempt to match a filename with the BOM as prefix, which
 won't match the files the user is expecting.

Signed-off-by: Carlos Martín Nieto c...@elego.de

which I keep forgetting.

 
 ---
 
 If you're wondering how I came up with LibreOffice, I was doing a
 workshop recently and one of the participants was not content with the
 choice of vim or nano, so he opened LibreOffice to edit the gitignore
 file with confusing consequences.
 
 This codepath doesn't go as far as the config code in validating that
 we do not have a partial BOM which would mean there's some invalid
 content, but we don't really have invalid content any other way, as
 we're just dealing with a list of paths in the file.
 
  dir.c  | 8 +++-
  t/t7061-wtstatus-ignore.sh | 2 ++
  2 files changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/dir.c b/dir.c
 index 0943a81..6368247 100644
 --- a/dir.c
 +++ b/dir.c
 @@ -581,6 +581,7 @@ int add_excludes_from_file_to_list(const char *fname,
   struct stat st;
   int fd, i, lineno = 1;
   size_t size = 0;
 + static const unsigned char *utf8_bom = (unsigned char *) \xef\xbb\xbf;
   char *buf, *entry;
  
   fd = open(fname, O_RDONLY);
 @@ -617,7 +618,12 @@ int add_excludes_from_file_to_list(const char *fname,
   }
  
   el-filebuf = buf;
 - entry = buf;
 +
 + if (size = 3  !memcmp(buf, utf8_bom, 3))
 + entry = buf + 3;
 + else
 + entry = buf;
 +
   for (i = 0; i  size; i++) {
   if (buf[i] == '\n') {
   if (entry != buf + i  entry[0] != '#') {
 diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
 index 460789b..0a06fbf 100755
 --- a/t/t7061-wtstatus-ignore.sh
 +++ b/t/t7061-wtstatus-ignore.sh
 @@ -13,6 +13,8 @@ EOF
  
  test_expect_success 'status untracked directory with --ignored' '
   echo ignored .gitignore 
 + sed -e s/^/\xef\xbb\xbf/ .gitignore .gitignore.new 
 + mv .gitignore.new .gitignore 
   mkdir untracked 
   : untracked/ignored 
   : untracked/uncommitted 



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Johannes Schindelin
Hi Carlos,

On 2015-04-16 16:05, Carlos Martín Nieto wrote:
 Some text editors like Notepad or LibreOffice write an UTF-8 BOM in
 order to indicate that the file is Unicode text rather than whatever the
 current locale would indicate.
 
 If someone uses such an editor to edit a gitignore file, we are left
 with those three bytes at the beginning of the file. If we do not skip
 them, we will attempt to match a filename with the BOM as prefix, which
 won't match the files the user is expecting.
 
 ---
 
 If you're wondering how I came up with LibreOffice, I was doing a
 workshop recently and one of the participants was not content with the
 choice of vim or nano, so he opened LibreOffice to edit the gitignore
 file with confusing consequences.
 
 This codepath doesn't go as far as the config code in validating that
 we do not have a partial BOM which would mean there's some invalid
 content, but we don't really have invalid content any other way, as
 we're just dealing with a list of paths in the file.

Yeah, users are entertaining!

I agree that this is a good patch. *Maybe* we would need the same handling in 
more places, in which case it might make sense to refactor the test into its 
own function.

In any case, though, the Git project requires a [Developer's Certificate of 
Origin](https://github.com/git/git/blob/v2.3.5/Documentation/SubmittingPatches#L234-L277);
 Would you mind adding that?

Thanks,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Junio C Hamano
Carlos Martín Nieto c...@elego.de writes:

 diff --git a/dir.c b/dir.c
 index 0943a81..6368247 100644
 --- a/dir.c
 +++ b/dir.c
 @@ -581,6 +581,7 @@ int add_excludes_from_file_to_list(const char *fname,
   struct stat st;
   int fd, i, lineno = 1;
   size_t size = 0;
 + static const unsigned char *utf8_bom = (unsigned char *) \xef\xbb\xbf;
   char *buf, *entry;
  
   fd = open(fname, O_RDONLY);
 @@ -617,7 +618,12 @@ int add_excludes_from_file_to_list(const char *fname,
   }
  
   el-filebuf = buf;
 - entry = buf;
 +
 + if (size = 3  !memcmp(buf, utf8_bom, 3))

OK.

 + entry = buf + 3;
 + else
 + entry = buf;
 +
   for (i = 0; i  size; i++) {
   if (buf[i] == '\n') {
   if (entry != buf + i  entry[0] != '#') {
 diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
 index 460789b..0a06fbf 100755
 --- a/t/t7061-wtstatus-ignore.sh
 +++ b/t/t7061-wtstatus-ignore.sh
 @@ -13,6 +13,8 @@ EOF
  
  test_expect_success 'status untracked directory with --ignored' '
   echo ignored .gitignore 
 + sed -e s/^/\xef\xbb\xbf/ .gitignore .gitignore.new 
 + mv .gitignore.new .gitignore 

Is this write literal in \xHEX on the replacement side of sed
substitution potable?  In any case, replacing the above three with
something like:

printf bomignored\n .gitignore

may be more sensible, no?

Also do we need a similar change to the attribute side, or are we
already covered and we forgot to do the same for the ignore files?


   mkdir untracked 
   : untracked/ignored 
   : untracked/uncommitted 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Jeff King
On Thu, Apr 16, 2015 at 08:39:55AM -0700, Junio C Hamano wrote:

   test_expect_success 'status untracked directory with --ignored' '
  echo ignored .gitignore 
  +   sed -e s/^/\xef\xbb\xbf/ .gitignore .gitignore.new 
  +   mv .gitignore.new .gitignore 
 
 Is this write literal in \xHEX on the replacement side of sed
 substitution potable?  In any case, replacing the above three with
 something like:
 
   printf bomignored\n .gitignore
 
 may be more sensible, no?

I'm not sure about sed, but I agree it is suspect. And note that printf
with hex codes is not portable, either You have to use octal:

  printf '\357\273\277ignored\n' .gitignore

Also, as a nit, I'd much rather see this in its own test rather than
crammed into another test_expect_success. It's much easier to diagnose
failures if the test description mentions the goal, and it is not tied
up with testing other parts that might fail.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Johannes Schindelin
Hi,

On 2015-04-16 17:39, Junio C Hamano wrote:

 Also do we need a similar change to the attribute side, or are we
 already covered and we forgot to do the same for the ignore files?

I fear so: https://github.com/git/git/blob/v2.3.5/attr.c#L359-L376

As for the config, we are safe: 
https://github.com/git/git/blob/v2.3.5/config.c#L419-L439

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Torsten Bögershausen
On 2015-04-16 16.05, Carlos Martín Nieto wrote:
[]
May be it is easier to move this into an own function, like remove_utf8_bom() ?

  dir.c  | 8 +++-
  t/t7061-wtstatus-ignore.sh | 2 ++
  2 files changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/dir.c b/dir.c
 index 0943a81..6368247 100644
 --- a/dir.c
 +++ b/dir.c
 @@ -581,6 +581,7 @@ int add_excludes_from_file_to_list(const char *fname,
   struct stat st;
   int fd, i, lineno = 1;
   size_t size = 0;
 + static const unsigned char *utf8_bom = (unsigned char *) \xef\xbb\xbf;
Do we really need to cast here (and if, is the cast dropping the const ?)

Another suggestion, see below:
either:
static const size_t bom_len = 3;
or
static const size_t bom_len = strlen(utf8_bom);

   char *buf, *entry;
  
   fd = open(fname, O_RDONLY);
 @@ -617,7 +618,12 @@ int add_excludes_from_file_to_list(const char *fname,
   }
  
   el-filebuf = buf;
 - entry = buf;
 +
And now we can avoid magic numbers:
if (size = bom_len  !memcmp(buf, utf8_bom, bom_len))
entry = buf + bom_len;
else
entry = buf;
[]
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Apr 16, 2015 at 08:39:55AM -0700, Junio C Hamano wrote:

   test_expect_success 'status untracked directory with --ignored' '
 echo ignored .gitignore 
  +  sed -e s/^/\xef\xbb\xbf/ .gitignore .gitignore.new 
  +  mv .gitignore.new .gitignore 
 
 Is this write literal in \xHEX on the replacement side of sed
 substitution potable?  In any case, replacing the above three with
 something like:
 
  printf bomignored\n .gitignore
 
 may be more sensible, no?

 I'm not sure about sed, but I agree it is suspect. And note that printf
 with hex codes is not portable, either You have to use octal:

   printf '\357\273\277ignored\n' .gitignore

 Also, as a nit, I'd much rather see this in its own test rather than
 crammed into another test_expect_success. It's much easier to diagnose
 failures if the test description mentions the goal, and it is not tied
 up with testing other parts that might fail.

Yeah, I totally agree.

Carlos, something like this squashed in, perhaps?

 t/t7061-wtstatus-ignore.sh | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 0a06fbf..cdc0747 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -13,8 +13,6 @@ EOF
 
 test_expect_success 'status untracked directory with --ignored' '
echo ignored .gitignore 
-   sed -e s/^/\xef\xbb\xbf/ .gitignore .gitignore.new 
-   mv .gitignore.new .gitignore 
mkdir untracked 
: untracked/ignored 
: untracked/uncommitted 
@@ -22,6 +20,15 @@ test_expect_success 'status untracked directory with 
--ignored' '
test_cmp expected actual
 '
 
+test_expect_success 'same with gitignore starting with BOM' '
+   printf \357\273\277ignored\n .gitignore 
+   mkdir -p untracked 
+   : untracked/ignored 
+   : untracked/uncommitted 
+   git status --porcelain --ignored actual 
+   test_cmp expected actual
+'
+
 cat expected \EOF
 ?? .gitignore
 ?? actual
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Carlos Martín Nieto
On Thu, 2015-04-16 at 10:16 -0700, Junio C Hamano wrote:
 Jeff King p...@peff.net writes:
 
  On Thu, Apr 16, 2015 at 08:39:55AM -0700, Junio C Hamano wrote:
 
test_expect_success 'status untracked directory with --ignored' '
echo ignored .gitignore 
   +sed -e s/^/\xef\xbb\xbf/ .gitignore .gitignore.new 
   +mv .gitignore.new .gitignore 
  
  Is this write literal in \xHEX on the replacement side of sed
  substitution potable?  In any case, replacing the above three with
  something like:
  
 printf bomignored\n .gitignore
  
  may be more sensible, no?
 
  I'm not sure about sed, but I agree it is suspect. And note that printf
  with hex codes is not portable, either You have to use octal:
 
printf '\357\273\277ignored\n' .gitignore
 
  Also, as a nit, I'd much rather see this in its own test rather than
  crammed into another test_expect_success. It's much easier to diagnose
  failures if the test description mentions the goal, and it is not tied
  up with testing other parts that might fail.
 
 Yeah, I totally agree.
 
 Carlos, something like this squashed in, perhaps?
 
  t/t7061-wtstatus-ignore.sh | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)
 
 diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
 index 0a06fbf..cdc0747 100755
 --- a/t/t7061-wtstatus-ignore.sh
 +++ b/t/t7061-wtstatus-ignore.sh
 @@ -13,8 +13,6 @@ EOF
  
  test_expect_success 'status untracked directory with --ignored' '
   echo ignored .gitignore 
 - sed -e s/^/\xef\xbb\xbf/ .gitignore .gitignore.new 
 - mv .gitignore.new .gitignore 
   mkdir untracked 
   : untracked/ignored 
   : untracked/uncommitted 
 @@ -22,6 +20,15 @@ test_expect_success 'status untracked directory with 
 --ignored' '
   test_cmp expected actual
  '
  
 +test_expect_success 'same with gitignore starting with BOM' '
 + printf \357\273\277ignored\n .gitignore 
 + mkdir -p untracked 
 + : untracked/ignored 
 + : untracked/uncommitted 
 + git status --porcelain --ignored actual 
 + test_cmp expected actual
 +'
 +
  cat expected \EOF
  ?? .gitignore
  ?? actual
 

Yeah, that makes sense. I had something similar in my patch at one point
before going with modifying the current one.

   cmn


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html