[PATCHv2] parse-options: report uncorrupted multi-byte options

2013-02-11 Thread Erik Faye-Lund
Because our command-line parser considers only one byte at the time
for short-options, we incorrectly report only the first byte when
multi-byte input was provided. This makes user-erros slightly
awkward to diagnose for instance under UTF-8 locale and non-English
keyboard layouts.

Make the reporting code report the whole argument-string when a
non-ASCII short-option is detected.

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
Improved-by: Jeff King p...@peff.net
---

Here's a second attempt at fixing error-reporting with UTF-8 encoded
input, this time without corrupting other non-ascii multi-byte
encodings.

I decided to change the text from what Jeff suggested; all we know is
that it's non-ASCII. It might be Latin-1 or some other non-ASCII,
single byte encoding. And since we're trying not to care, let's also
try to not be overly specific :)

I wasn't entirely sure who to attribute for the improvement, so I just
picked Jeff; he provided some code. That decision might not be correct,
feel free to change it.

 parse-options.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/parse-options.c b/parse-options.c
index 67e98a6..6a39446 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -461,8 +461,11 @@ int parse_options(int argc, const char **argv, const char 
*prefix,
default: /* PARSE_OPT_UNKNOWN */
if (ctx.argv[0][1] == '-') {
error(unknown option `%s', ctx.argv[0] + 2);
-   } else {
+   } else if (isascii(*ctx.opt)) {
error(unknown switch `%c', *ctx.opt);
+   } else {
+   error(unknown non-ascii option in string: `%s',
+   ctx.argv[0]);
}
usage_with_options(usagestr, options);
}
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] parse-options: report uncorrupted multi-byte options

2013-02-11 Thread Junio C Hamano
Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] parse-options: report uncorrupted multi-byte options

2013-02-11 Thread Jeff King
On Tue, Feb 12, 2013 at 12:13:48AM +0100, Erik Faye-Lund wrote:

 I decided to change the text from what Jeff suggested; all we know is
 that it's non-ASCII. It might be Latin-1 or some other non-ASCII,
 single byte encoding. And since we're trying not to care, let's also
 try to not be overly specific :)

Yeah, that makes more sense (I did not put too much thought into the
original wording). Thanks.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] parse-options: report uncorrupted multi-byte options

2013-02-11 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Tue, Feb 12, 2013 at 6:13 AM, Erik Faye-Lund kusmab...@gmail.com wrote:
 Because our command-line parser considers only one byte at the time
 for short-options, we incorrectly report only the first byte when
 multi-byte input was provided. This makes user-erros slightly
 awkward to diagnose for instance under UTF-8 locale and non-English
 keyboard layouts.

 Make the reporting code report the whole argument-string when a
 non-ASCII short-option is detected.

 Similar cases:

 config.c:git_default_core_config() assumes core.commentchar is ascii.
 We should catch and report non-ascii chars, or simply accept it as a
 string.

That one is just an uninterpreted byte.  core.commentString might be
a nice extension to the concept, but it is an entirely different
category.

 builtin/update-index.c:cmd_update_index(): error(unknown switch
 '%c', *ctx.opt);

This one is in the same category as this topic.

 builtin/apply.c:apply_one_fragment(): error(_(invalid start of line:
 '%c'), first); where 'first' may be a part of utf-8 from a broken
 patch.

This is where the patch is expected to have either  , - or +,
again, anything else is an uninterpreted byte.  It is more like
reporting the file we found an error in, whose filename is not
encoded in UTF-8 to the user's terminal.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] parse-options: report uncorrupted multi-byte options

2013-02-11 Thread Duy Nguyen
On Tue, Feb 12, 2013 at 9:10 AM, Junio C Hamano gits...@pobox.com wrote:
 Similar cases:

 config.c:git_default_core_config() assumes core.commentchar is ascii.
 We should catch and report non-ascii chars, or simply accept it as a
 string.

 That one is just an uninterpreted byte.  core.commentString might be
 a nice extension to the concept, but it is an entirely different
 category.

My point is not to output broken utf-8 if we can. If someone
accidentally puts a UTF-8 character in core.commentChar, it will
produce broken utf-8 templates that editors might react, but hard to
see by eye. Something like this may give sufficient protection:

diff --git a/config.c b/config.c
index aefd80b..b6f73e0 100644
--- a/config.c
+++ b/config.c
@@ -726,8 +726,11 @@ static int git_default_core_config(const char
*var, const char *value)
if (!strcmp(var, core.commentchar)) {
const char *comment;
int ret = git_config_string(comment, var, value);
-   if (!ret)
+   if (!ret) {
+   if (comment[1])
+   return error(core.commentchar must be
one ASCII character);
comment_line_char = comment[0];
+   }
return ret;
}
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html