Re: [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply

2017-08-16 Thread Junio C Hamano
tbo...@web.de writes:

> Analyze the patch if there is a) any context line with CRLF,
> or b) if any line with CRLF is to be removed.
> Thanks to Junio C Hamano, his input became the base for the changes in t4124.
> One test case is split up into 3:
> - Detect the " a\r" line in the patch
> - Detect the "-a\r" line in the patch
> - Use LF in repo and CLRF in the worktree. (*)
>
> * This one proves that convert_to_git(&the_index,...) still needs to pass
> the &index, otherwise Git will crash.

I do not understand why you think it proves anything like that.
Forget about "in repo" when you think about "git apply" without
"--index/--cache".  There is *NO* role the index should play in that
mode.

"Otherwise Git will crash" is true, because convert_to_git() tries
to dereference the istate it is given to see if there is CR in the
blob that happens to be in the index at the path.

But step back and *think*.  It only proves that convert_to_git() you
have and/or the way read_old_data() you updated calls it after
applying these two patches are still broken.

The "blob that happens to be in the index at the path" does *NOT*
have anything to do with the file in the working tree that you are
patching.  Such an index entry may not exist (and the code would see
that there is 0 CRs and 0 LFs---so what?), or it may have a blob
full of CRLF, or it may have a blob full of CR not LF, or any random
thing that has NO relation to the file you are patching.  Why should
that random blob (which may not even exist---we are discussing "git
apply" without "--index/--cache" here) affect the outcome?  If it
does not affect the outcome, why should convert_to_git() even look
at it?

It shouldn't be calling down to "has_cr_in_index(istate, path)" that
is called from crlf_to_git() in the first place.  The check for CR
was done in the caller of convert_to_git(), i.e. read_old_data(),
long before convert_to_git() is called, and the caller knows if it
is (or it is not) dealing with data that needs crlf conversion at
that point, based on the contents of the file being patched (which,
again, does not have any relation to the blob that may or may not
happen to be in the index).  

Your updated caller is already telling convert_to_git() codepath
when it needs CRLF and it refrains from peeking in the index with
SAFE_CRLF_KEEP_CRLF flag.  The bug still in the code after applying
these two patches is that the other choice, i.e. SAFE_CRLF_FALSE,
that is passed from read_old_data() to convert_to_git() does *not*
prevent the latter from peeking into the in-core index.  

And that is why "Otherwise Git will crash".


[PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply

2017-08-16 Thread tboegi
From: Torsten Bögershausen 

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true),
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index format.

Analyze the patch if there is a) any context line with CRLF,
or b) if any line with CRLF is to be removed.
In this case the patch file `patch` has mixed line endings, for a)
it looks like this (ignore the $ at the begin of the line):

$ diff --git a/one b/one
$ index 533790e..c30dea8 100644
$ --- a/one
$ +++ b/one
$ @@ -1 +1,2 @@
$  a\r
$ +b\r

And for b) it looks like this:

$ diff --git a/one b/one
$ index 533790e..485540d 100644
$ --- a/one
$ +++ b/one
$ @@ -1 +1 @@
$ -a\r
$ +b\r

If `git apply` detects that the patch itself has CRLF, (look at the line
" a\r" or "-a\r" above), the new flag has_crlf is set in "struct patch"
and two things will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

Thanks to Junio C Hamano, his input became the base for the changes in t4124.
One test case is split up into 3:
- Detect the " a\r" line in the patch
- Detect the "-a\r" line in the patch
- Use LF in repo and CLRF in the worktree. (*)

* This one proves that convert_to_git(&the_index,...) still needs to pass
the &index, otherwise Git will crash.

Reported-by: Anthony Sottile 
Signed-off-by: Torsten Bögershausen 
---
 apply.c  | 28 +++-
 t/t4124-apply-ws-rule.sh | 33 +++--
 2 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..bebb176099 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
unsigned int recount:1;
unsigned int conflicted_threeway:1;
unsigned int direct_to_threeway:1;
+   unsigned int has_crlf:1;
struct fragment *fragments;
char *result;
size_t resultsize;
@@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state,
record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/* Check if the patch has context lines with CRLF or
+   the patch wants to remove lines with CRLF */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+   if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+   patch->ws_rule |= WS_CR_AT_EOL;
+   patch->has_crlf = 1;
+   }
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state,
if (!deleted && !added)
leading++;
trailing++;
+   check_old_for_crlf(patch, line, len);
if (!state->apply_in_reverse &&
state->ws_error_action == correct_ws_error)
check_whitespace(state, line, len, 
patch->ws_rule);
break;
case '-':
+   check_old_for_crlf(patch, line, len);
if (state->apply_in_reverse &&
state->ws_error_action != nowarn_ws_error)
check_whitespace(state, line, len, 
patch->ws_rule);
@@ -2268,8 +2282,11 @@ static void show_stats(struct apply_state *state, struct 
patch *patch)
add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, struct patch *patch,
+const char *path, struct strbuf *buf)
 {
+   enum safe_crlf safe_crlf = patch->has_crlf ?
+   SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE;
switch (st->st_mode & S_IFMT) {
case S_IFLNK:
if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2295,7 @@ static int read_old_data(struct stat *st, const char 
*path, struct strbuf *buf)
case S_IFREG:
if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
return error(_("unable to open or read %s"), path);
-   convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+   convert_to_git(&the_index, path, buf->buf, buf->len, buf, 
safe_crlf);
return 0;
default: