Re: [PATCH v9 5/8] convert: add 'working-tree-encoding' attribute

2018-03-06 Thread Eric Sunshine
On Tue, Mar 6, 2018 at 5:13 PM, Lars Schneider  wrote:
>> On 06 Mar 2018, at 21:42, Eric Sunshine  wrote:
>> On Sun, Mar 4, 2018 at 3:14 PM,   wrote:
>>> +   return xstrdup_toupper(value);
>>
>> xstrdup_toupper() allocates memory...
>>
>>> +   const char *working_tree_encoding; /* Supported encoding or default 
>>> encoding if NULL */
>>
>> ...which is assigned to 'const char *'...
>>
>>> +   ca->working_tree_encoding = git_path_check_encoding(ccheck 
>>> + 5);
>>
>> ...by this code, and eventually leaked.
>>
>> It's too bad it isn't cleaned up (freed), but looking at the callers,
>> fixing this leak would be mildly noisy (though not particularly
>> invasive). How much do we care about this leak?
>
> Hmm. You are right. That was previously handled by the encoding struct
> linked list that I removed in this iteration. I forgot about that aspect :/
> I don't like it leaking. I think I would like to reintroduce the linked
> list. This way every encoding is only once in memory. What do you think?

It's subjective, but I find the use of a linked-list just for the
purpose of not leaking these strings unnecessarily confusing.

If I was doing it, I'd probably add a conv_attrs_free() function and
call it from each function allocates a 'struct conv_attrs' (including
calling it before early returns -- which prompted my earlier comment
about it being a "mildly noisy" fix).


Re: [PATCH v9 5/8] convert: add 'working-tree-encoding' attribute

2018-03-06 Thread Lars Schneider

> On 06 Mar 2018, at 21:42, Eric Sunshine  wrote:
> 
> On Sun, Mar 4, 2018 at 3:14 PM,   wrote:
>> Git recognizes files encoded with ASCII or one of its supersets (e.g.
>> UTF-8 or ISO-8859-1) as text files. All other encodings are usually
>> interpreted as binary and consequently built-in Git text processing
>> tools (e.g. 'git diff') as well as most Git web front ends do not
>> visualize the content.
>> [...]
>> Signed-off-by: Lars Schneider 
>> ---
>> diff --git a/convert.c b/convert.c
>> @@ -978,6 +1051,25 @@ static int ident_to_worktree(const char *path, const 
>> char *src, size_t len,
>> +static const char *git_path_check_encoding(struct attr_check_item *check)
>> +{
>> +   [...]
>> +   /*
>> +* Ensure encoding names are always upper case (e.g. UTF-8) to
>> +* simplify subsequent string comparisons.
>> +*/
>> +   return xstrdup_toupper(value);
> 
> xstrdup_toupper() allocates memory...
> 
>> +}
>> @@ -1033,6 +1125,7 @@ struct conv_attrs {
>>enum crlf_action attr_action; /* What attr says */
>>enum crlf_action crlf_action; /* When no attr is set, use 
>> core.autocrlf */
>>int ident;
>> +   const char *working_tree_encoding; /* Supported encoding or default 
>> encoding if NULL */
> 
> ...which is assigned to 'const char *'...
> 
>> };
>> @@ -1064,6 +1158,7 @@ static void convert_attrs(struct conv_attrs *ca, const 
>> char *path)
>>else if (eol_attr == EOL_CRLF)
>>ca->crlf_action = CRLF_TEXT_CRLF;
>>}
>> +   ca->working_tree_encoding = git_path_check_encoding(ccheck + 
>> 5);
> 
> ...by this code, and eventually leaked.
> 
> It's too bad it isn't cleaned up (freed), but looking at the callers,
> fixing this leak would be mildly noisy (though not particularly
> invasive). How much do we care about this leak?

Hmm. You are right. That was previously handled by the encoding struct 
linked list that I removed in this iteration. I forgot about that aspect :/
I don't like it leaking. I think I would like to reintroduce the linked
list. This way every encoding is only once in memory. What do you think?


>>} else {
>>ca->drv = NULL;
>>ca->crlf_action = CRLF_UNDEFINED;
>> diff --git a/t/t0028-working-tree-encoding.sh 
>> b/t/t0028-working-tree-encoding.sh
>> @@ -0,0 +1,135 @@
>> +test_expect_success 'check $GIT_DIR/info/attributes support' '
>> +   test_when_finished "rm -f test.utf8.raw test.utf32.raw 
>> test.utf32.git" &&
> 
> It seems weird to be cleaning up files this test didn't create
> (test.utf8.raw and test.utf32.raw).

Agreed.


>> +   test_when_finished "git reset --hard HEAD" &&
>> +
>> +   echo "*.utf32 text working-tree-encoding=utf-32" 
>> >.git/info/attributes &&
>> +   git add test.utf32 &&
>> +
>> +   git cat-file -p :test.utf32 >test.utf32.git &&
>> +   test_cmp_bin test.utf8.raw test.utf32.git
>> +'
>> +
>> +test_expect_success 'check unsupported encodings' '
>> +   test_when_finished "rm -f err.out" &&
>> +   test_when_finished "git reset --hard HEAD" &&
> 
> Resetting to HEAD here is an important cleanup action, but tests don't
> usually clean up files such as 'err.out' since such detritus doesn't
> usually impact subsequent tests negatively. (Just an observation; no
> re-roll needed.)

OK. I'll fix it if I reroll.

- Lars

Re: [PATCH v9 5/8] convert: add 'working-tree-encoding' attribute

2018-03-06 Thread Eric Sunshine
On Sun, Mar 4, 2018 at 3:14 PM,   wrote:
> Git recognizes files encoded with ASCII or one of its supersets (e.g.
> UTF-8 or ISO-8859-1) as text files. All other encodings are usually
> interpreted as binary and consequently built-in Git text processing
> tools (e.g. 'git diff') as well as most Git web front ends do not
> visualize the content.
> [...]
> Signed-off-by: Lars Schneider 
> ---
> diff --git a/convert.c b/convert.c
> @@ -978,6 +1051,25 @@ static int ident_to_worktree(const char *path, const 
> char *src, size_t len,
> +static const char *git_path_check_encoding(struct attr_check_item *check)
> +{
> +   [...]
> +   /*
> +* Ensure encoding names are always upper case (e.g. UTF-8) to
> +* simplify subsequent string comparisons.
> +*/
> +   return xstrdup_toupper(value);

xstrdup_toupper() allocates memory...

> +}
> @@ -1033,6 +1125,7 @@ struct conv_attrs {
> enum crlf_action attr_action; /* What attr says */
> enum crlf_action crlf_action; /* When no attr is set, use 
> core.autocrlf */
> int ident;
> +   const char *working_tree_encoding; /* Supported encoding or default 
> encoding if NULL */

...which is assigned to 'const char *'...

>  };
> @@ -1064,6 +1158,7 @@ static void convert_attrs(struct conv_attrs *ca, const 
> char *path)
> else if (eol_attr == EOL_CRLF)
> ca->crlf_action = CRLF_TEXT_CRLF;
> }
> +   ca->working_tree_encoding = git_path_check_encoding(ccheck + 
> 5);

...by this code, and eventually leaked.

It's too bad it isn't cleaned up (freed), but looking at the callers,
fixing this leak would be mildly noisy (though not particularly
invasive). How much do we care about this leak?

> } else {
> ca->drv = NULL;
> ca->crlf_action = CRLF_UNDEFINED;
> diff --git a/t/t0028-working-tree-encoding.sh 
> b/t/t0028-working-tree-encoding.sh
> @@ -0,0 +1,135 @@
> +test_expect_success 'check $GIT_DIR/info/attributes support' '
> +   test_when_finished "rm -f test.utf8.raw test.utf32.raw 
> test.utf32.git" &&

It seems weird to be cleaning up files this test didn't create
(test.utf8.raw and test.utf32.raw).

> +   test_when_finished "git reset --hard HEAD" &&
> +
> +   echo "*.utf32 text working-tree-encoding=utf-32" 
> >.git/info/attributes &&
> +   git add test.utf32 &&
> +
> +   git cat-file -p :test.utf32 >test.utf32.git &&
> +   test_cmp_bin test.utf8.raw test.utf32.git
> +'
> +
> +test_expect_success 'check unsupported encodings' '
> +   test_when_finished "rm -f err.out" &&
> +   test_when_finished "git reset --hard HEAD" &&

Resetting to HEAD here is an important cleanup action, but tests don't
usually clean up files such as 'err.out' since such detritus doesn't
usually impact subsequent tests negatively. (Just an observation; no
re-roll needed.)

> +   echo "*.nothing text working-tree-encoding=" >>.gitattributes &&
> +   printf "nothing" >t.nothing &&
> +   git add t.nothing &&
> +
> +   echo "*.garbage text working-tree-encoding=garbage" >>.gitattributes 
> &&
> +   printf "garbage" >t.garbage &&
> +   test_must_fail git add t.garbage 2>err.out &&
> +   test_i18ngrep "fatal: failed to encode" err.out
> +'