Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
On 12/31/2016 12:08 PM, Gerald Pfeifer wrote: On Fri, 9 Sep 2016, Martin Sebor wrote: I mentioned the hex vs octal notation to invite input into which of the two of them people would prefer to see used by the %qc and qs directives, and whether it's worth considering changing the %qE directive to use the same notation as well, for consistency (and to help with readability if there is consensus that one is clearer than the other). I do think hex is the way to go, and that it would be good to be consistent across the board. (All e-mail alert, but I don't think I saw a response to that.) What I meant by ambiguity is for example a string like "\1234" where it's not obvious where the octal sequence ends. Is it '\1' followed by "234" or '\12' followed by "34" or '\123' followed by "4"? (It's only possible to tell if one knows that GCC always uses three digits for the octal character, but not everyone knows that.) Agreed. And octal notation is just not very common today, too, I'd argue. Thanks. I think the thread petered out after that and I didn't remember to get back to it and the still outstanding %qE problem where GCC uses the octal base and doesn't convert the character values to unsigned char, resulting in confusing output like that below: $ echo 'constexpr int i = "\x80";' | gcc -S -Wall -Wextra -xc++ - :1:19: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive] :1:19: error: ‘(int)((const char*)"\3777600")’ is not a constant expression (The still unconfirmed bug 77573 came out of my tests of the fix for the related bugs in the subject and tracks the wide character part of the problem.) Martin
Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
On Fri, 9 Sep 2016, Martin Sebor wrote: > I mentioned the hex vs octal notation to invite input into which > of the two of them people would prefer to see used by the %qc and > qs directives, and whether it's worth considering changing the %qE > directive to use the same notation as well, for consistency (and > to help with readability if there is consensus that one is clearer > than the other). I do think hex is the way to go, and that it would be good to be consistent across the board. (All e-mail alert, but I don't think I saw a response to that.) > What I meant by ambiguity is for example a string like "\1234" > where it's not obvious where the octal sequence ends. Is it '\1' > followed by "234" or '\12' followed by "34" or '\123' followed > by "4"? (It's only possible to tell if one knows that GCC always > uses three digits for the octal character, but not everyone knows > that.) Agreed. And octal notation is just not very common today, too, I'd argue. Gerald
Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
On 09/08/2016 08:19 PM, Martin Sebor wrote: While working on the -Wformat-length pass I noticed that in some diagnostics that make use of the %qc and %qs directives GCC prints non-printable characters raw. For example, it might print a newline, corrupting the diagnostic stream (bug 77521). Some other diagnostics that try to avoid this problem by using a directive such as %x when the character is not printable might use the sign-extended value of the character, printing a very large hexadecimal value (bug 77520). The attached patch changes the pretty printer to detect non-printable characters in %qc and %qs directives (but not %c or %s) and print those in hexadecimal (via "\\x%02x"). Martin PS I used hexadecimal based on what c-format.c does but now that I checked more carefully how %qE formats string literals I see it uses octal. I think hexadecimal is preferable because it avoids ambiguity but I'm open to changing it to octal if there's a strong preference for it. Incidentally, %qE too suffers from bug 77520 (see below). The patch doesn't try to fix that. $ cat z.C && gcc z.C constexpr int i = "ABC\x7f_\x80XYZ"; z.C:1:19: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive] constexpr int i = "ABC\x7f_\x80XYZ"; ^ z.C:1:19: error: ‘(int)((const char*)"ABC\177_\3777600XYZ")’ is not a constant expression gcc-77520.diff PR c/77520 - wrong value for extended ASCII characters in -Wformat message PR c/77521 - %qc format directive should quote non-printable characters gcc/c-family/ChangeLog: 2016-09-08 Martin SeborPR c/77520 PR c/77521 * c-format.c (argument_parser::find_format_char_info): Use %qc format directive unconditionally. gcc/ChangeLog: 2016-09-08 Martin Sebor PR c/77520 PR c/77521 * pretty-print.c (pp_quoted_string): New function. (pp_format): Call it for %c and %s directives. gcc/testsuite/ChangeLog: 2016-09-08 Martin Sebor PR c/77520 PR c/77521 * gcc.dg/pr77520.c: New test. * gcc.dg/pr77521.c: New test. I was about to ack, the realized Joseph already did and you'd already installed the change :-) Jeff
Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
On 09/09/2016 05:17 PM, Martin Sebor wrote: On 09/09/2016 07:59 AM, Joseph Myers wrote: On Thu, 8 Sep 2016, Martin Sebor wrote: PS I used hexadecimal based on what c-format.c does but now that I checked more carefully how %qE formats string literals I see it uses octal. I think hexadecimal is preferable because it avoids ambiguity but I'm open to changing it to octal if there's a strong I'm not clear what you mean about ambiguity. In C strings, an octal escape sequence has up to three characters, so if it has three characters it's unambiguous, whereas a hex escape sequence can have any number of characters, so if the unprintable character is followed by a valid hex digit then in C you need to represent that as an escape (or use string constant concatenation, etc.). The patch doesn't try to do that as far as I can see. Now, presumably the output isn't intended to be interpreted as C strings anyway (if it was, you'd need to escape " and \ as well), so the patch is OK, but I don't think it avoids ambiguity (and there's a clear case that it shouldn't - that if the string passed to %qs is printable, it should be printed as-is even if it contains escape sequences that could also result from a non-printable string passed to %qs). Thank you. I tried to be clear about it in the description of the changes but I see the PS caused some confusion. Let me clarify that the patch has nothing to do with with ambiguity (perceived or real) in the representation of the escape sequences. The only purpose of the change is to avoid printing non-printable characters or excessively large escape sequences in GCC diagnostics. I mentioned the hex vs octal notation to invite input into which of the two of them people would prefer to see used by the %qc and qs directives, and whether it's worth considering changing the %qE directive to use the same notation as well, for consistency (and to help with readability if there is consensus that one is clearer than the other). What I meant by ambiguity is for example a string like "\1234" where it's not obvious where the octal sequence ends. Is it '\1' followed by "234" or '\12' followed by "34" or '\123' followed by "4"? (It's only possible to tell if one knows that GCC always uses three digits for the octal character, but not everyone knows that.) To be clear: I'm talking about the GCC output and not necessarily about what the standard has to say about it. In contrast to the octal notation, I find the string "\x1234" clearer. It can only mean '\x1' followed by "234" or '\x12' followed by "34" and I think more people will expect it to be the latter because representing characters using two hex digits is more common. But this is just my own perception and YMMV. Both styles are ambiguous, but isn't that an inherent problem once we try to avoid non-printable characters by rendering them as octal or hex sequences? I can't make a strong argument for either style over the other. Jeff
Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
On 09/09/2016 07:59 AM, Joseph Myers wrote: On Thu, 8 Sep 2016, Martin Sebor wrote: PS I used hexadecimal based on what c-format.c does but now that I checked more carefully how %qE formats string literals I see it uses octal. I think hexadecimal is preferable because it avoids ambiguity but I'm open to changing it to octal if there's a strong I'm not clear what you mean about ambiguity. In C strings, an octal escape sequence has up to three characters, so if it has three characters it's unambiguous, whereas a hex escape sequence can have any number of characters, so if the unprintable character is followed by a valid hex digit then in C you need to represent that as an escape (or use string constant concatenation, etc.). The patch doesn't try to do that as far as I can see. Now, presumably the output isn't intended to be interpreted as C strings anyway (if it was, you'd need to escape " and \ as well), so the patch is OK, but I don't think it avoids ambiguity (and there's a clear case that it shouldn't - that if the string passed to %qs is printable, it should be printed as-is even if it contains escape sequences that could also result from a non-printable string passed to %qs). Thank you. I tried to be clear about it in the description of the changes but I see the PS caused some confusion. Let me clarify that the patch has nothing to do with with ambiguity (perceived or real) in the representation of the escape sequences. The only purpose of the change is to avoid printing non-printable characters or excessively large escape sequences in GCC diagnostics. I mentioned the hex vs octal notation to invite input into which of the two of them people would prefer to see used by the %qc and qs directives, and whether it's worth considering changing the %qE directive to use the same notation as well, for consistency (and to help with readability if there is consensus that one is clearer than the other). What I meant by ambiguity is for example a string like "\1234" where it's not obvious where the octal sequence ends. Is it '\1' followed by "234" or '\12' followed by "34" or '\123' followed by "4"? (It's only possible to tell if one knows that GCC always uses three digits for the octal character, but not everyone knows that.) To be clear: I'm talking about the GCC output and not necessarily about what the standard has to say about it. In contrast to the octal notation, I find the string "\x1234" clearer. It can only mean '\x1' followed by "234" or '\x12' followed by "34" and I think more people will expect it to be the latter because representing characters using two hex digits is more common. But this is just my own perception and YMMV. Martin
Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
On Thu, 8 Sep 2016, Martin Sebor wrote: > PS I used hexadecimal based on what c-format.c does but now that > I checked more carefully how %qE formats string literals I see it > uses octal. I think hexadecimal is preferable because it avoids > ambiguity but I'm open to changing it to octal if there's a strong I'm not clear what you mean about ambiguity. In C strings, an octal escape sequence has up to three characters, so if it has three characters it's unambiguous, whereas a hex escape sequence can have any number of characters, so if the unprintable character is followed by a valid hex digit then in C you need to represent that as an escape (or use string constant concatenation, etc.). The patch doesn't try to do that as far as I can see. Now, presumably the output isn't intended to be interpreted as C strings anyway (if it was, you'd need to escape " and \ as well), so the patch is OK, but I don't think it avoids ambiguity (and there's a clear case that it shouldn't - that if the string passed to %qs is printable, it should be printed as-is even if it contains escape sequences that could also result from a non-printable string passed to %qs). -- Joseph S. Myers jos...@codesourcery.com