Re: [BUG] Git does not convert CRLF=LF on files with \r not before \n

2015-04-22 Thread Alexandre Garnier
Indeed, when changing the gitattributes for '* text', the replacement is OK.
Thanks for all the explanations.

At first, my use case was some source files (imported from another
VCS) with CR in different contexts:
 - lines ending with CRCRLF
 - all content in LF or CRLF but some CR that should be EOL...
 - CR in the middle of the line for no reason!
For all this, I will fix the files during import.

But when digging I found some shell or awk scripts with CR as a valid
char in search/replacement string. I know that the EOL should not be
CRLF in this case, but I don't know if this situation could happen in
DOS batch files or PowerShell scripts with CRLF EOL.

2015-04-21 21:28 GMT+02:00 Torsten Bögershausen tbo...@web.de:
 On 2015-04-21 15.51, Alexandre Garnier wrote:
 Here is a test:

 git init -q crlf-test
 cd crlf-test
 echo '*   text=auto'  .gitattributes
 git add .gitattributes
 git commit -q -m Normalize EOL
 echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain
 content with\r\r\n'  inline-cr.txt
 echo Working directory content:
 cat -A inline-cr.txt
 echo
 git add inline-cr.txt
 echo Indexed content:
 git show :inline-cr.txt | cat -A

 Result
 --
 File content:
 some content^M$
 other ^Mcontent with CR^M$
 content^M$
 again content with^M^M$

 Indexed content:
 some content^M$
 other ^Mcontent with CR^M$
 content^M$
 again content with^M^M$

 Expected result
 ---
 File content:
 some content^M$
 other ^Mcontent with CR^M$
 content^M$
 again content with^M^M$

 Indexed content:
 some content$
 other ^Mcontent with CR$
 content$
 again content with^M$
 # or even 'again content with$' for this last line

 If you remove the \r that are not at the end of the lines, EOL are
 converted as expected:
 File content:
 some content^M$
 other content with CR^M$
 content^M$
 again content with^M$

 Indexed content:
 some content$
 other content with CR$
 content$
 again content with$


 First of all, thanks for the info.

 The current implementation of Git does an auto-detection
 if a file is text or binary.

 For a file which is suspected to be text, it is expected to have either LF 
 or CRLF as
 line endings, but a bare CR make Git wonder:
 Should this still be treated as a text file ?
 If yes, should the CR be kept as is, or should it be converted into LF (or 
 CRLF) ?

 The current implementation may simply be explained by the fact that nobody 
 has so far asked
 to treat this file as text, so the implementation assumes it to be binary.

 (Which makes the code a little bit easier, at the time it was written)

 So the status of today is that you can force Git to let the CR as is,
 when you specify that the file is text.

 Is there a real life problem behind it ?
 And what should happen to the CRs ?





--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Git does not convert CRLF=LF on files with \r not before \n

2015-04-22 Thread Junio C Hamano
Alexandre Garnier zig...@gmail.com writes:

 Indeed, when changing the gitattributes for '* text', the replacement is OK.

OK.  Earlier I said:

 But it would be a bug if the same thing happens when the user
 explicitly tells us that the file has CRLF line endings, and I
 suspect we have that bug, which may want to be corrected.

but you are saying that my suspicion is incorrect and we do not have
such a bug.

Thanks for digging further.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Git does not convert CRLF=LF on files with \r not before \n

2015-04-21 Thread Junio C Hamano
Alexandre Garnier zig...@gmail.com writes:

 echo '*   text=auto'  .gitattributes
 git add .gitattributes
 git commit -q -m Normalize EOL
 echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain

With text=auto, the user instructs us to guess, and we expect either
LF or CRLF line-terminated files that is *TEXT*.  A lone CR in the
middle of the line would mean we cannot reliably guess---it may be
LF terminated file with CRs sprinkled inside text, some of which
happen to be at the end of the line, or it may be CRLF terminated
file with CRs sprinkled in.  We try to preserve the user input by
not munging when we are not sure.

You are seeing the designed and intended behaviour.

But it would be a bug if the same thing happens when the user
explicitly tells us that the file has CRLF line endings, and I
suspect we have that bug, which may want to be corrected.

I've Cc'ed various people who worked on convert.c around line
endings.  I recall we saw a few other discussion threads on
text=auto and eol settings.  Stakeholders may want to have a unified
discussion to first list the issues in the current implementation
and come up with fixes for them.

Thanks.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Git does not convert CRLF=LF on files with \r not before \n

2015-04-21 Thread Torsten Bögershausen
On 2015-04-21 15.51, Alexandre Garnier wrote:
 Here is a test:
 
 git init -q crlf-test
 cd crlf-test
 echo '*   text=auto'  .gitattributes
 git add .gitattributes
 git commit -q -m Normalize EOL
 echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain
 content with\r\r\n'  inline-cr.txt
 echo Working directory content:
 cat -A inline-cr.txt
 echo
 git add inline-cr.txt
 echo Indexed content:
 git show :inline-cr.txt | cat -A
 
 Result
 --
 File content:
 some content^M$
 other ^Mcontent with CR^M$
 content^M$
 again content with^M^M$
 
 Indexed content:
 some content^M$
 other ^Mcontent with CR^M$
 content^M$
 again content with^M^M$
 
 Expected result
 ---
 File content:
 some content^M$
 other ^Mcontent with CR^M$
 content^M$
 again content with^M^M$
 
 Indexed content:
 some content$
 other ^Mcontent with CR$
 content$
 again content with^M$
 # or even 'again content with$' for this last line
 
 If you remove the \r that are not at the end of the lines, EOL are
 converted as expected:
 File content:
 some content^M$
 other content with CR^M$
 content^M$
 again content with^M$
 
 Indexed content:
 some content$
 other content with CR$
 content$
 again content with$
 

First of all, thanks for the info.

The current implementation of Git does an auto-detection
if a file is text or binary.

For a file which is suspected to be text, it is expected to have either LF or 
CRLF as
line endings, but a bare CR make Git wonder:
Should this still be treated as a text file ?
If yes, should the CR be kept as is, or should it be converted into LF (or 
CRLF) ?

The current implementation may simply be explained by the fact that nobody has 
so far asked 
to treat this file as text, so the implementation assumes it to be binary.

(Which makes the code a little bit easier, at the time it was written)

So the status of today is that you can force Git to let the CR as is,
when you specify that the file is text.

Is there a real life problem behind it ?
And what should happen to the CRs ?





--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] Git does not convert CRLF=LF on files with \r not before \n

2015-04-21 Thread Alexandre Garnier
Here is a test:

git init -q crlf-test
cd crlf-test
echo '*   text=auto'  .gitattributes
git add .gitattributes
git commit -q -m Normalize EOL
echo -ne 'some content\r\nother \rcontent with CR\r\ncontent\r\nagain
content with\r\r\n'  inline-cr.txt
echo Working directory content:
cat -A inline-cr.txt
echo
git add inline-cr.txt
echo Indexed content:
git show :inline-cr.txt | cat -A

Result
--
File content:
some content^M$
other ^Mcontent with CR^M$
content^M$
again content with^M^M$

Indexed content:
some content^M$
other ^Mcontent with CR^M$
content^M$
again content with^M^M$

Expected result
---
File content:
some content^M$
other ^Mcontent with CR^M$
content^M$
again content with^M^M$

Indexed content:
some content$
other ^Mcontent with CR$
content$
again content with^M$
# or even 'again content with$' for this last line

If you remove the \r that are not at the end of the lines, EOL are
converted as expected:
File content:
some content^M$
other content with CR^M$
content^M$
again content with^M$

Indexed content:
some content$
other content with CR$
content$
again content with$

-- 
Alex
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html