Re: toupper and warnings

2021-05-06 Thread Robert Elz
Date:Thu, 06 May 2021 12:52:36 -0700
From:"Greg A. Woods" 
Message-ID:  


  | Yeah, "Undefined Behaviour" should be undefined -- i.e. removed from the
  | spec -- i.e. become either fully defined or at least implementation
  | defined.  It is not helpful at all -- it was a very VERY bad idea.

Not really possible.   To become implementation defined, the implementation
needs to be able to specify what happens (even if different from what other
implementations specify for the same thing).   Sometimes that's  not possible,
and what happens depends upon things outside the control of the implementation.
Eg: accessing an array out of bounds might just return random data from some
other data structure, or it might generate a segmentation violation - it all
depends upon how far out of bounds the access was, and where in the memory map
the array in question happened to be placed.   There's no way to define what
will happen - even worse on an embedded system, running with no memory 
management or privilege separation, the access might hit on memory mapped I/O
control, or CPU control registers, and do almost anything.

  | E.g. for ctype.h interfaces the spec should just say that values outside
  | the recognized range will simply be truncated as if by assignment to an
  | unsigned char.

That might have been a good idea, perhaps, if it had been specified that
way initially - only perhaps because it means penalizing good code with
meaningless extra checks or no-op data manipulations (&0xFF or whatever)
that do nothing for it except make the code run slower, just so bad
code behaves in some kind of predictable (but probably still incorrect) way.

But it wasn't specified like that.

And standards bodies are not legislatures - they don't (or shouldn't) go
defining how things should be, and then attempt to force implementations
to obey.

Rather, they set out what is known to work on all implementations (just
omitting ones with admitted bugs which should be fixed), so that
applications will know what they need to do to correctly use the interfaces
provided, and what they should not do, as the results would either be
unspecified (or implementation defined) or even simply undefined.
They also make it clear what a new implementation needs to implement in
order to be compatible with the other existing implementations, so that
applications which work with other implementations will also work with
the new one.

  | What I am pretty sure of though is that there's a vast difference
  | between the massive number of warnings spit out by the compiler vs. the
  | relatively low number of actual cases of passing values outside of -1..255.
  | We certainly wouldn't want to claim UB and abort for all of the warnings!

It is certainly true that the compiler is guessing when it issues one of
these warnings, in some cases it cannot know what the range of value will
be at run time, in others its analysis functionality is simply not up to
the task.   So a lot of false warnings occur - for some of the warnings
the vast majority look to be bogus (which is annoying) for others a warning
most commonly means a problem exists.

kre



Re: toupper and warnings

2021-05-06 Thread tlaronde
On Thu, May 06, 2021 at 12:52:36PM -0700, Greg A. Woods wrote:
>[...] 
> Well I don't actually do that.
> 
> Which is to say my implementation's goal is to turn them into properly
> promoted values and access the flags array in such a way that this does
> not and cannot "cause" UB.  My implementation carefully tests for a -1
> and then if it is not -1 it uses a cast and assignment to an unsigned
> char, the value of which is then used to access the array.  I.e. totally
> safe _and_ UB-free.
> 

I guess you mean testing not against -1 but testing against EOF? Because
strictly speaking, EOF could be whatever integer value not in the
unsigned char range (and this is why too casting to unsigned char is
necessary because -1 is in the range of a signed char; after all, not
considering EOF for other usages, in this case EOF could be an
impossible signed or unsigned char value).
-- 
Thierry Laronde 
 http://www.kergis.com/
http://kertex.kergis.com/
   http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: toupper and warnings

2021-05-06 Thread Greg A. Woods
At Thu, 06 May 2021 07:06:43 -0400, Greg Troxel  wrote:
Subject: Re: toupper and warnings
>
> Johnny Billquist  writes:
>
> >> See CAVEATS in ctype(3).
> >
> > Right. But is gcc really smart enough to understand at compile time if
> > something else than -1 is the negative value, and that toupper in fact
> > is more limited than what the signature says?
> >
> > The *signature* of the function is int toupper(int). If you pass a
> > char to that, I can't see that there would ever be a warning about any
> > problems.

Well the warning is actually an almost accidental side effect due to the
implementation as a macro and due to the way the particular way the
macro is expanded into an array reference.  The potential to access an
array with a negative index is obvious to the compiler in the current
standard NetBSD implementation.

Note too that actually implementing some of the ctype.h style interfaces
as real functions on a system using signed chars could be problematic as
they could then not internally distinguish between -1 and 0xFF when
passed a plain un-cast signed char value (a value of 0xFF is of course a
valid character, but due to the default integer promotions for all
function parameters the sign will be extended and the result will be
(int) -1).  This matters for iscntrl(), isalpha(), and potentially also
toupper() and tolower() depending on exactly how the return value is
used.


> To answer Greg Woods, It is not "conservative" to silently accept
> UB-causing inputs and turn them into UB array reads.

Well I don't actually do that.

Which is to say my implementation's goal is to turn them into properly
promoted values and access the flags array in such a way that this does
not and cannot "cause" UB.  My implementation carefully tests for a -1
and then if it is not -1 it uses a cast and assignment to an unsigned
char, the value of which is then used to access the array.  I.e. totally
safe _and_ UB-free.

Thus I take the conservative approach of not causing anything bad or
unexpected to happen, including not causing UBSAN to abort the program.

The only bad thing is that I do is to silently coerce truly badly
behaving code into continuing to work without complaint, even from
UBSAN.

My test code for my implementation is here:

https://github.com/robohack/experiments/blob/master/tctype.c

So far it's only been tested with a Clang and a range of GCC versions.

I can send the patch that adds it to -current too if you'd like to see
it in-situ.


> Another approach would be for NetBSD to adust the code to bounds check
> before indexing and do something if the value is out of bounds.

Yes, and my implementation could also easily be adjusted to do that.

>   Options
> are returning false, returning a random bit, calling abort(3), trying to
> send all of the user's files to a random IP addreess, and so on.  All
> are allowed by the spec :-)

Yeah, "Undefined Behaviour" should be undefined -- i.e. removed from the
spec -- i.e. become either fully defined or at least implementation
defined.  It is not helpful at all -- it was a very VERY bad idea.

E.g. for ctype.h interfaces the spec should just say that values outside
the recognized range will simply be truncated as if by assignment to an
unsigned char.

> Probably returning false with an envvar option to abort is best, similar
> to how UB from mutex ops is handled.

On the other hand perhaps this would better be a job for the UBSAN
module to do, but perhaps that's just because I've become much more
comfortable with using UBSAN with Clang in recent years.  SIGILL for the
losers.  I'm not sure how my implementation could be so hooked into
UBSAN tests though, especially in a less compiler-dependent way.

>   When we started with abort it
> seemed there was a vast amount of incorrect code written and debugged on
> other operating systems that allow code specified to hav UB.

I haven't had quite that experience, though there have been some rare
but stunning examples of problematic code -- though all I can remember
were actually caught while porting the code to run on NetBSD/alpha, or
in a few cases when porting to platforms that actually SIGBUS on
unaligned accesses (like sparc IIRC?).

What I am pretty sure of though is that there's a vast difference
between the massive number of warnings spit out by the compiler vs. the
relatively low number of actual cases of passing values outside of -1..255.
We certainly wouldn't want to claim UB and abort for all of the warnings!

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpWHDkukp9tx.pgp
Description: OpenPGP Digital Signature


Re: toupper and warnings

2021-05-06 Thread Greg Troxel

Johnny Billquist  writes:

> On 2021-05-06 13:06, Greg Troxel wrote:
>>
>> Johnny Billquist  writes:
>>
 See CAVEATS in ctype(3).
>>>
>>> Right. But is gcc really smart enough to understand at compile time if
>>> something else than -1 is the negative value, and that toupper in fact
>>> is more limited than what the signature says?
>>>
>>> The *signature* of the function is int toupper(int). If you pass a
>>> char to that, I can't see that there would ever be a warning about any
>>> problems.
>>
>> The signature is that it takes an int, but the specification is that if
>> the value of the int is other than EOF or something representable as
>> unsigned char (projecting to he implementation, meaning -1 is ok and
>> 0..255 is ok), then you get UB.
>
> Right. My question is just how on earth gcc would know this? There is
> nothing in the actual declaration that tells (or even can) tell
> this. So this would then (again) be an example of gcc knowing more
> about the function than is actually visible.

What is going on is that there is a header file that defines toupper as
a macro (#define).After expanding that, gcc sees code that is using
a char (which is signed) as an array subscript, which can reasonably be
expected to have the possibility of out-of-bounds reads.  gcc is AFAIK
not bringing knowledge of toupper.

> What if I wrote my own function called toupper, which was defined for
> the full range of an int? Would gcc then understand that this is a
> different toupper that it shouldn't warn about?

toppper is specified by C99.  So yes, you could implement a version that
had safer behavior when it is formally undefined.  If gcc had the
specification expressed somehow, and could do static analysis, and gave
a warning that "call to toupper could lead to udefined behavior", then I
think that would be great.  But it isn't doing that -- and that's more
like UBsan than a compiler anyway.

A lot of trouble is caused by people writing code  that's ok with the
implementation in front of them, but that ventures into UB per the
standard.   So I don't think such code should be accomodated in general.


signature.asc
Description: PGP signature


Re: toupper and warnings

2021-05-06 Thread Johnny Billquist

On 2021-05-06 13:06, Greg Troxel wrote:


Johnny Billquist  writes:


See CAVEATS in ctype(3).


Right. But is gcc really smart enough to understand at compile time if
something else than -1 is the negative value, and that toupper in fact
is more limited than what the signature says?

The *signature* of the function is int toupper(int). If you pass a
char to that, I can't see that there would ever be a warning about any
problems.


The signature is that it takes an int, but the specification is that if
the value of the int is other than EOF or something representable as
unsigned char (projecting to he implementation, meaning -1 is ok and
0..255 is ok), then you get UB.


Right. My question is just how on earth gcc would know this? There is 
nothing in the actual declaration that tells (or even can) tell this. So 
this would then (again) be an example of gcc knowing more about the 
function than is actually visible. What if I wrote my own function 
called toupper, which was defined for the full range of an int? Would 
gcc then understand that this is a different toupper that it shouldn't 
warn about?


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: toupper and warnings

2021-05-06 Thread Greg Troxel

Johnny Billquist  writes:

>> See CAVEATS in ctype(3).
>
> Right. But is gcc really smart enough to understand at compile time if
> something else than -1 is the negative value, and that toupper in fact
> is more limited than what the signature says?
>
> The *signature* of the function is int toupper(int). If you pass a
> char to that, I can't see that there would ever be a warning about any
> problems.

The signature is that it takes an int, but the specification is that if
the value of the int is other than EOF or something representable as
unsigned char (projecting to he implementation, meaning -1 is ok and
0..255 is ok), then you get UB.

The point of UB in specifications is to admit efficient implementations
(or to end arguments about existing implementations).  The NetBSD
implementation uses the value to index an array.   So if you have

   char c, d;
   c = getchar();
   d = toupper(c);

then it is entirely possible to cause an out-of-bounds read.  The
warning is coming from the code in the implementation of ctype -- which
is a macro -- that does this read.  In reading the code that leads to
the warning, it is easy to conclude by inspection that the calling code
is incorrect.  

> The fact that toupper in fact is only defined for -1..255 is hardly
> visible in the signature of the function.

True but I don't tnink that has much to do with this.   We have a
conforming implementation, and the compiler is pointing out a different
kind of UB that results from the function being invoked in a way that
causes UB.


To follow up, adding -Wsystem-headers brought the warning back.   clang
also warns.


To answer Greg Woods, It is not "conservative" to silently accept
UB-causing inputs and turn them into UB array reads.

Another approach would be for NetBSD to adust the code to bounds check
before indexing and do something if the value is out of bounds.  Options
are returning false, returning a random bit, calling abort(3), trying to
send all of the user's files to a random IP addreess, and so on.  All
are allowed by the spec :-)

Probably returning false with an envvar option to abort is best, similar
to how UB from mutex ops is handled.   When we started with abort it
seemed there was a vast amount of incorrect code written and debugged on
other operating systems that allow code specified to hav UB.


signature.asc
Description: PGP signature


Re: toupper and warnings

2021-05-06 Thread Johnny Billquist

On 2021-05-06 09:33, Martin Husemann wrote:

On Thu, May 06, 2021 at 09:26:33AM +0200, Johnny Billquist wrote:

I'm not sure I understand why you'd get a warning in the first place.
toupper should take an int as input. Which is signed.
If you pass in a char, it should always fit and work.
Why should you get a warning?


Because the function requires an unsigned char or EOF and is undefined
for everything else.

See CAVEATS in ctype(3).


Right. But is gcc really smart enough to understand at compile time if 
something else than -1 is the negative value, and that toupper in fact 
is more limited than what the signature says?


The *signature* of the function is int toupper(int). If you pass a char 
to that, I can't see that there would ever be a warning about any problems.


The fact that toupper in fact is only defined for -1..255 is hardly 
visible in the signature of the function.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: toupper and warnings

2021-05-06 Thread Martin Husemann
On Thu, May 06, 2021 at 09:26:33AM +0200, Johnny Billquist wrote:
> I'm not sure I understand why you'd get a warning in the first place.
> toupper should take an int as input. Which is signed.
> If you pass in a char, it should always fit and work.
> Why should you get a warning?

Because the function requires an unsigned char or EOF and is undefined
for everything else.

See CAVEATS in ctype(3).

Martin


Re: toupper and warnings

2021-05-06 Thread Johnny Billquist

On 2021-05-06 02:26, Greg A. Woods wrote:

At Wed, 5 May 2021 20:33:30 - (UTC), chris...@astron.com (Christos Zoulas) 
wrote:
Subject: Re: toupper and warnings


gcc hides warnings from system headers by default (my guess is because
the Linux system headers have a concealed firearms license).

Our bsd.sys.mk turns it on... Try gcc -Wall -Wsystem-headers


Does that really make the difference?

(I can't quickly test it at the moment as I've fixed my test system such
that I don't ever get these warnings from the ctype macros.  I.e. I went
the other way and chose to make my implementation's "undefined
behaviour" to be to behave as conservative as possible and allow the
caller to use these macros in the traditional naive way without fear and
without noisy warnings from picky compilers when char is signed.)


I'm not sure I understand why you'd get a warning in the first place.
toupper should take an int as input. Which is signed.
If you pass in a char, it should always fit and work.
Why should you get a warning?

  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: toupper and warnings

2021-05-05 Thread Greg A. Woods
At Wed, 5 May 2021 20:33:30 - (UTC), chris...@astron.com (Christos Zoulas) 
wrote:
Subject: Re: toupper and warnings
>
> gcc hides warnings from system headers by default (my guess is because
> the Linux system headers have a concealed firearms license).
>
> Our bsd.sys.mk turns it on... Try gcc -Wall -Wsystem-headers

Does that really make the difference?

(I can't quickly test it at the moment as I've fixed my test system such
that I don't ever get these warnings from the ctype macros.  I.e. I went
the other way and chose to make my implementation's "undefined
behaviour" to be to behave as conservative as possible and allow the
caller to use these macros in the traditional naive way without fear and
without noisy warnings from picky compilers when char is signed.)

> I've been considering to make it the default for our native gcc,
> but then again I am against gratuitous user facing changes...

Note that -Wsystem-headers doesn't go well with -Wundef on NetBSD yet.

I find -Wundef to be quite helpful in writing more robust CPP
expressions -- it's saved me more than once from embarrassingly
non-portable ones, though I didn't start using it nearly soon enough.

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpzulPfCalY4.pgp
Description: OpenPGP Digital Signature


Re: toupper and warnings

2021-05-05 Thread Christos Zoulas
In article ,
Greg Troxel   wrote:
>-=-=-=-=-=-
>
>
>We used to get warnings when toupper was called with char, because it is
>UB pe POSIX if the value is negative.
>
>I am not seeing this any more on NetBSD 9.
>
>I wonder if this is intended, and if the UB is safely avoided somhow, or
>if the warning is just missing, or ?
>
>example program that I think should get a UB warning; doesn't on
>netbsd-9.
>
>
>#include 
>#include 
>
>int
>main()
>{
>  char c, d;
>
>  c = getchar();
>
>  d = toupper(c);
>
>  if (d != c)
>return 1;
>  else
>return 0;
>}

gcc hides warnings from system headers by default (my guess is because
the Linux system headers have a concealed firearms license).

Our bsd.sys.mk turns it on... Try gcc -Wall -Wsystem-headers

I've been considering to make it the default for our native gcc,
but then again I am against gratuitous user facing changes...

christos



toupper and warnings

2021-05-05 Thread Greg Troxel

We used to get warnings when toupper was called with char, because it is
UB pe POSIX if the value is negative.

I am not seeing this any more on NetBSD 9.

I wonder if this is intended, and if the UB is safely avoided somhow, or
if the warning is just missing, or ?

example program that I think should get a UB warning; doesn't on
netbsd-9.


#include 
#include 

int
main()
{
  char c, d;

  c = getchar();

  d = toupper(c);

  if (d != c)
return 1;
  else
return 0;
}


signature.asc
Description: PGP signature