Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2017-01-01 Thread Martin Sebor

On 12/31/2016 12:08 PM, Gerald Pfeifer wrote:

On Fri, 9 Sep 2016, Martin Sebor wrote:

I mentioned the hex vs octal notation to invite input into which
of the two of them people would prefer to see used by the %qc and
qs directives, and whether it's worth considering changing the %qE
directive to use the same notation as well, for consistency (and
to help with readability if there is consensus that one is clearer
than the other).


I do think hex is the way to go, and that it would be good to be
consistent across the board.

(All e-mail alert, but I don't think I saw a response to that.)


What I meant by ambiguity is for example a string like "\1234"
where it's not obvious where the octal sequence ends.  Is it '\1'
followed  by "234" or '\12' followed by "34" or '\123' followed
by "4"?  (It's only possible to tell if one knows that GCC always
uses three digits for the octal character, but not everyone knows
that.)


Agreed.  And octal notation is just not very common today, too,
I'd argue.


Thanks.  I think the thread petered out after that and I didn't
remember to get back to it and the still outstanding %qE problem
where GCC uses the octal base and doesn't convert the character
values to unsigned char, resulting in confusing output like that
below:

$ echo 'constexpr int i = "\x80";' | gcc -S -Wall -Wextra -xc++ -
:1:19: error: invalid conversion from ‘const char*’ to ‘int’ 
[-fpermissive]
:1:19: error: ‘(int)((const char*)"\3777600")’ is not a 
constant expression


(The still unconfirmed bug 77573 came out of my tests of the fix
for the related bugs in the subject and tracks the wide character
part of the problem.)

Martin


Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-12-31 Thread Gerald Pfeifer
On Fri, 9 Sep 2016, Martin Sebor wrote:
> I mentioned the hex vs octal notation to invite input into which
> of the two of them people would prefer to see used by the %qc and
> qs directives, and whether it's worth considering changing the %qE
> directive to use the same notation as well, for consistency (and
> to help with readability if there is consensus that one is clearer
> than the other).

I do think hex is the way to go, and that it would be good to be
consistent across the board.

(All e-mail alert, but I don't think I saw a response to that.)

> What I meant by ambiguity is for example a string like "\1234"
> where it's not obvious where the octal sequence ends.  Is it '\1'
> followed  by "234" or '\12' followed by "34" or '\123' followed
> by "4"?  (It's only possible to tell if one knows that GCC always
> uses three digits for the octal character, but not everyone knows
> that.)

Agreed.  And octal notation is just not very common today, too,
I'd argue.

Gerald


Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-09-16 Thread Jeff Law

On 09/08/2016 08:19 PM, Martin Sebor wrote:

While working on the -Wformat-length pass I noticed that in some
diagnostics that make use of the %qc and %qs directives GCC prints
non-printable characters raw.  For example, it might print a newline,
corrupting the diagnostic stream (bug 77521).

Some other diagnostics that try to avoid this problem by using
a directive such as %x when the character is not printable might
use the sign-extended value of the character, printing a very
large hexadecimal value (bug 77520).

The attached patch changes the pretty printer to detect non-printable
characters in %qc and %qs directives (but not %c or %s) and print
those in hexadecimal (via "\\x%02x").

Martin

PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal.  I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong
preference for it.  Incidentally, %qE too suffers from bug 77520
(see below).  The patch doesn't try to fix that.

$ cat z.C && gcc z.C
constexpr int i = "ABC\x7f_\x80XYZ";
z.C:1:19: error: invalid conversion from ‘const char*’ to ‘int’
[-fpermissive]
 constexpr int i = "ABC\x7f_\x80XYZ";
   ^
z.C:1:19: error: ‘(int)((const char*)"ABC\177_\3777600XYZ")’ is not
a constant expression

gcc-77520.diff


PR c/77520 - wrong value for extended ASCII characters in -Wformat message
PR c/77521 - %qc format directive should quote non-printable characters

gcc/c-family/ChangeLog:
2016-09-08  Martin Sebor  

PR c/77520
PR c/77521
* c-format.c (argument_parser::find_format_char_info): Use %qc
format directive unconditionally.

gcc/ChangeLog:
2016-09-08  Martin Sebor  

PR c/77520
PR c/77521
* pretty-print.c (pp_quoted_string): New function.
(pp_format): Call it for %c and %s directives.

gcc/testsuite/ChangeLog:
2016-09-08  Martin Sebor  

PR c/77520
PR c/77521
* gcc.dg/pr77520.c: New test.
* gcc.dg/pr77521.c: New test.
I was about to ack, the realized Joseph already did and you'd already 
installed the change :-)


Jeff



Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-09-16 Thread Jeff Law

On 09/09/2016 05:17 PM, Martin Sebor wrote:

On 09/09/2016 07:59 AM, Joseph Myers wrote:

On Thu, 8 Sep 2016, Martin Sebor wrote:


PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal.  I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong


I'm not clear what you mean about ambiguity.  In C strings, an octal
escape sequence has up to three characters, so if it has three characters
it's unambiguous, whereas a hex escape sequence can have any number of
characters, so if the unprintable character is followed by a valid hex
digit then in C you need to represent that as an escape (or use string
constant concatenation, etc.).  The patch doesn't try to do that as
far as
I can see.

Now, presumably the output isn't intended to be interpreted as C strings
anyway (if it was, you'd need to escape " and \ as well), so the patch is
OK, but I don't think it avoids ambiguity (and there's a clear case that
it shouldn't - that if the string passed to %qs is printable, it
should be
printed as-is even if it contains escape sequences that could also result
from a non-printable string passed to %qs).


Thank you.

I tried to be clear about it in the description of the changes
but I see the PS caused some confusion.  Let me clarify that
the patch has nothing to do with with ambiguity (perceived or
real) in the representation of the escape sequences.  The only
purpose of the change is to avoid printing non-printable
characters or excessively large escape sequences in GCC
diagnostics.

I mentioned the hex vs octal notation to invite input into which
of the two of them people would prefer to see used by the %qc and
qs directives, and whether it's worth considering changing the %qE
directive to use the same notation as well, for consistency (and
to help with readability if there is consensus that one is clearer
than the other).

What I meant by ambiguity is for example a string like "\1234"
where it's not obvious where the octal sequence ends.  Is it '\1'
followed  by "234" or '\12' followed by "34" or '\123' followed
by "4"?  (It's only possible to tell if one knows that GCC always
uses three digits for the octal character, but not everyone knows
that.)  To be clear: I'm talking about the GCC output and not
necessarily about what the standard has to say about it.

In contrast to the octal notation, I find the string "\x1234"
clearer.  It can only mean '\x1' followed by "234" or '\x12'
followed by "34" and I think more people will expect it to be
the latter because representing characters using two hex digits
is more common.  But this is just my own perception and YMMV.
Both styles are ambiguous, but isn't that an inherent problem once we 
try to avoid non-printable characters by rendering them as octal or hex 
sequences?


I can't make a strong argument for either style over the other.

Jeff



Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-09-09 Thread Martin Sebor

On 09/09/2016 07:59 AM, Joseph Myers wrote:

On Thu, 8 Sep 2016, Martin Sebor wrote:


PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal.  I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong


I'm not clear what you mean about ambiguity.  In C strings, an octal
escape sequence has up to three characters, so if it has three characters
it's unambiguous, whereas a hex escape sequence can have any number of
characters, so if the unprintable character is followed by a valid hex
digit then in C you need to represent that as an escape (or use string
constant concatenation, etc.).  The patch doesn't try to do that as far as
I can see.

Now, presumably the output isn't intended to be interpreted as C strings
anyway (if it was, you'd need to escape " and \ as well), so the patch is
OK, but I don't think it avoids ambiguity (and there's a clear case that
it shouldn't - that if the string passed to %qs is printable, it should be
printed as-is even if it contains escape sequences that could also result
from a non-printable string passed to %qs).


Thank you.

I tried to be clear about it in the description of the changes
but I see the PS caused some confusion.  Let me clarify that
the patch has nothing to do with with ambiguity (perceived or
real) in the representation of the escape sequences.  The only
purpose of the change is to avoid printing non-printable
characters or excessively large escape sequences in GCC
diagnostics.

I mentioned the hex vs octal notation to invite input into which
of the two of them people would prefer to see used by the %qc and
qs directives, and whether it's worth considering changing the %qE
directive to use the same notation as well, for consistency (and
to help with readability if there is consensus that one is clearer
than the other).

What I meant by ambiguity is for example a string like "\1234"
where it's not obvious where the octal sequence ends.  Is it '\1'
followed  by "234" or '\12' followed by "34" or '\123' followed
by "4"?  (It's only possible to tell if one knows that GCC always
uses three digits for the octal character, but not everyone knows
that.)  To be clear: I'm talking about the GCC output and not
necessarily about what the standard has to say about it.

In contrast to the octal notation, I find the string "\x1234"
clearer.  It can only mean '\x1' followed by "234" or '\x12'
followed by "34" and I think more people will expect it to be
the latter because representing characters using two hex digits
is more common.  But this is just my own perception and YMMV.

Martin


Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)

2016-09-09 Thread Joseph Myers
On Thu, 8 Sep 2016, Martin Sebor wrote:

> PS I used hexadecimal based on what c-format.c does but now that
> I checked more carefully how %qE formats string literals I see it
> uses octal.  I think hexadecimal is preferable because it avoids
> ambiguity but I'm open to changing it to octal if there's a strong

I'm not clear what you mean about ambiguity.  In C strings, an octal 
escape sequence has up to three characters, so if it has three characters 
it's unambiguous, whereas a hex escape sequence can have any number of 
characters, so if the unprintable character is followed by a valid hex 
digit then in C you need to represent that as an escape (or use string 
constant concatenation, etc.).  The patch doesn't try to do that as far as 
I can see.

Now, presumably the output isn't intended to be interpreted as C strings 
anyway (if it was, you'd need to escape " and \ as well), so the patch is 
OK, but I don't think it avoids ambiguity (and there's a clear case that 
it shouldn't - that if the string passed to %qs is printable, it should be 
printed as-is even if it contains escape sequences that could also result 
from a non-printable string passed to %qs).

-- 
Joseph S. Myers
jos...@codesourcery.com