Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
On Tue, Feb 12, 2013 at 9:10 AM, Junio C Hamano wrote: >> Similar cases: >> >> config.c:git_default_core_config() assumes core.commentchar is ascii. >> We should catch and report non-ascii chars, or simply accept it as a >> string. > > That one is just an uninterpreted byte. core.commentString might be > a nice extension to the concept, but it is an entirely different > category. My point is not to output broken utf-8 if we can. If someone accidentally puts a UTF-8 character in core.commentChar, it will produce broken utf-8 templates that editors might react, but hard to see by eye. Something like this may give sufficient protection: diff --git a/config.c b/config.c index aefd80b..b6f73e0 100644 --- a/config.c +++ b/config.c @@ -726,8 +726,11 @@ static int git_default_core_config(const char *var, const char *value) if (!strcmp(var, "core.commentchar")) { const char *comment; int ret = git_config_string(&comment, var, value); - if (!ret) + if (!ret) { + if (comment[1]) + return error("core.commentchar must be one ASCII character"); comment_line_char = comment[0]; + } return ret; } -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
Duy Nguyen writes: > On Tue, Feb 12, 2013 at 6:13 AM, Erik Faye-Lund wrote: >> Because our command-line parser considers only one byte at the time >> for short-options, we incorrectly report only the first byte when >> multi-byte input was provided. This makes user-erros slightly >> awkward to diagnose for instance under UTF-8 locale and non-English >> keyboard layouts. >> >> Make the reporting code report the whole argument-string when a >> non-ASCII short-option is detected. > > Similar cases: > > config.c:git_default_core_config() assumes core.commentchar is ascii. > We should catch and report non-ascii chars, or simply accept it as a > string. That one is just an uninterpreted byte. core.commentString might be a nice extension to the concept, but it is an entirely different category. > builtin/update-index.c:cmd_update_index(): error("unknown switch > '%c'", *ctx.opt); This one is in the same category as this topic. > builtin/apply.c:apply_one_fragment(): error(_("invalid start of line: > '%c'"), first); where 'first' may be a part of utf-8 from a broken > patch. This is where the patch is expected to have either " ", "-" or "+", again, anything else is an uninterpreted byte. It is more like reporting the file we found an error in, whose filename is not encoded in UTF-8 to the user's terminal. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
On Tue, Feb 12, 2013 at 6:13 AM, Erik Faye-Lund wrote: > Because our command-line parser considers only one byte at the time > for short-options, we incorrectly report only the first byte when > multi-byte input was provided. This makes user-erros slightly > awkward to diagnose for instance under UTF-8 locale and non-English > keyboard layouts. > > Make the reporting code report the whole argument-string when a > non-ASCII short-option is detected. Similar cases: config.c:git_default_core_config() assumes core.commentchar is ascii. We should catch and report non-ascii chars, or simply accept it as a string. builtin/update-index.c:cmd_update_index(): error("unknown switch '%c'", *ctx.opt); builtin/apply.c:apply_one_fragment(): error(_("invalid start of line: '%c'"), first); where 'first' may be a part of utf-8 from a broken patch. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
On Tue, Feb 12, 2013 at 12:13:48AM +0100, Erik Faye-Lund wrote: > I decided to change the text from what Jeff suggested; all we know is > that it's non-ASCII. It might be Latin-1 or some other non-ASCII, > single byte encoding. And since we're trying not to care, let's also > try to not be overly specific :) Yeah, that makes more sense (I did not put too much thought into the original wording). Thanks. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] parse-options: report uncorrupted multi-byte options
Because our command-line parser considers only one byte at the time for short-options, we incorrectly report only the first byte when multi-byte input was provided. This makes user-erros slightly awkward to diagnose for instance under UTF-8 locale and non-English keyboard layouts. Make the reporting code report the whole argument-string when a non-ASCII short-option is detected. Signed-off-by: Erik Faye-Lund Improved-by: Jeff King --- Here's a second attempt at fixing error-reporting with UTF-8 encoded input, this time without corrupting other non-ascii multi-byte encodings. I decided to change the text from what Jeff suggested; all we know is that it's non-ASCII. It might be Latin-1 or some other non-ASCII, single byte encoding. And since we're trying not to care, let's also try to not be overly specific :) I wasn't entirely sure who to attribute for the improvement, so I just picked Jeff; he provided some code. That decision might not be correct, feel free to change it. parse-options.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/parse-options.c b/parse-options.c index 67e98a6..6a39446 100644 --- a/parse-options.c +++ b/parse-options.c @@ -461,8 +461,11 @@ int parse_options(int argc, const char **argv, const char *prefix, default: /* PARSE_OPT_UNKNOWN */ if (ctx.argv[0][1] == '-') { error("unknown option `%s'", ctx.argv[0] + 2); - } else { + } else if (isascii(*ctx.opt)) { error("unknown switch `%c'", *ctx.opt); + } else { + error("unknown non-ascii option in string: `%s'", + ctx.argv[0]); } usage_with_options(usagestr, options); } -- 1.8.1.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html