Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute
Lars Schneiderwrites: > Nice catch. What do you think about this solution using is_encoding_utf8() > from utf.c? That helper was invented exactly for a case like this, I would think.
Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute
> On 07 Mar 2018, at 18:54, Eric Sunshinewrote: > > On Wed, Mar 7, 2018 at 12:30 PM, wrote: >> [...] >> Add an attribute to tell Git what encoding the user has defined for a >> given file. If the content is added to the index, then Git converts the >> content to a canonical UTF-8 representation. On checkout Git will >> reverse the conversion. >> >> Signed-off-by: Lars Schneider >> --- >> Documentation/gitattributes.txt | 80 +++ >> diff --git a/convert.c b/convert.c >> @@ -265,6 +266,78 @@ static int will_convert_lf_to_crlf(size_t len, struct >> text_stat *stats, >> +static const char *default_encoding = "UTF-8"; >> @@ -978,6 +1051,21 @@ static int ident_to_worktree(const char *path, const >> char *src, size_t len, >> +static const char *git_path_check_encoding(struct attr_check_item *check) >> +{ >> + const char *value = check->value; >> + >> + if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value) || >> + !strlen(value)) >> + return NULL; >> + >> + /* Don't encode to the default encoding */ >> + if (!strcasecmp(value, default_encoding)) >> + return NULL; > > As of v10, the rest of the code accepts encoding names "UTF-xx" and > "UTFxx" (case insensitive), however, this check recognizes only > "UTF-8" (case insensitive). For consistency, one would expect this > also to recognize "UTF8" (case insensitive). Nice catch. What do you think about this solution using is_encoding_utf8() from utf.c? if (is_encoding_utf8(value) && is_encoding_utf8(default_encoding)) - Lars
Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute
lars.schnei...@autodesk.com writes: > +static const char *git_path_check_encoding(struct attr_check_item *check) > +{ > + const char *value = check->value; > + > + if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value) || > + !strlen(value)) > + return NULL; This means that having "* working-tree-encoding" (without "=value") in your .gitattributes file is silently ignored. Thinking aloud. I wonder if that is something we would want to warn about so that the project can fix it, perhaps? Or would that become too noisy to use an older version of Git that includes this series when a newer version that defines new meanings to boolean (set/unset) values for w-t-e attribute becomes available and projects adopt it? On the other hand, with this code as-is or with an additional warning, such an "older version" of Git by definition behaves differently from such a "newer version" that does something different when the attribute is not a non-empty string, so it is quite likely that we won't be able to redefine or extend the meaning of w-t-e in a way like that. Which means that the only sensible way to make sure we _could_ later extend the meaning of w-t-e and make it behave differently when set to a non-empty string is to make it an error in _this_ "older" version. That way, of course people cannot use the "older" version on a newer project that depends on the way how "newer" Git treats w-t-e that is set to, say, "true", but we won't silently (or loudly) do wrong things to the project's data.
Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute
On Wed, Mar 7, 2018 at 12:30 PM,wrote: > [...] > Add an attribute to tell Git what encoding the user has defined for a > given file. If the content is added to the index, then Git converts the > content to a canonical UTF-8 representation. On checkout Git will > reverse the conversion. > > Signed-off-by: Lars Schneider > --- > Documentation/gitattributes.txt | 80 +++ > diff --git a/convert.c b/convert.c > @@ -265,6 +266,78 @@ static int will_convert_lf_to_crlf(size_t len, struct > text_stat *stats, > +static const char *default_encoding = "UTF-8"; > @@ -978,6 +1051,21 @@ static int ident_to_worktree(const char *path, const > char *src, size_t len, > +static const char *git_path_check_encoding(struct attr_check_item *check) > +{ > + const char *value = check->value; > + > + if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value) || > + !strlen(value)) > + return NULL; > + > + /* Don't encode to the default encoding */ > + if (!strcasecmp(value, default_encoding)) > + return NULL; As of v10, the rest of the code accepts encoding names "UTF-xx" and "UTFxx" (case insensitive), however, this check recognizes only "UTF-8" (case insensitive). For consistency, one would expect this also to recognize "UTF8" (case insensitive). > + > + return value; > +}