Re: toupper and warnings
Date:Thu, 06 May 2021 12:52:36 -0700 From:"Greg A. Woods" Message-ID: | Yeah, "Undefined Behaviour" should be undefined -- i.e. removed from the | spec -- i.e. become either fully defined or at least implementation | defined. It is not helpful at all -- it was a very VERY bad idea. Not really possible. To become implementation defined, the implementation needs to be able to specify what happens (even if different from what other implementations specify for the same thing). Sometimes that's not possible, and what happens depends upon things outside the control of the implementation. Eg: accessing an array out of bounds might just return random data from some other data structure, or it might generate a segmentation violation - it all depends upon how far out of bounds the access was, and where in the memory map the array in question happened to be placed. There's no way to define what will happen - even worse on an embedded system, running with no memory management or privilege separation, the access might hit on memory mapped I/O control, or CPU control registers, and do almost anything. | E.g. for ctype.h interfaces the spec should just say that values outside | the recognized range will simply be truncated as if by assignment to an | unsigned char. That might have been a good idea, perhaps, if it had been specified that way initially - only perhaps because it means penalizing good code with meaningless extra checks or no-op data manipulations (&0xFF or whatever) that do nothing for it except make the code run slower, just so bad code behaves in some kind of predictable (but probably still incorrect) way. But it wasn't specified like that. And standards bodies are not legislatures - they don't (or shouldn't) go defining how things should be, and then attempt to force implementations to obey. Rather, they set out what is known to work on all implementations (just omitting ones with admitted bugs which should be fixed), so that applications will know what they need to do to correctly use the interfaces provided, and what they should not do, as the results would either be unspecified (or implementation defined) or even simply undefined. They also make it clear what a new implementation needs to implement in order to be compatible with the other existing implementations, so that applications which work with other implementations will also work with the new one. | What I am pretty sure of though is that there's a vast difference | between the massive number of warnings spit out by the compiler vs. the | relatively low number of actual cases of passing values outside of -1..255. | We certainly wouldn't want to claim UB and abort for all of the warnings! It is certainly true that the compiler is guessing when it issues one of these warnings, in some cases it cannot know what the range of value will be at run time, in others its analysis functionality is simply not up to the task. So a lot of false warnings occur - for some of the warnings the vast majority look to be bogus (which is annoying) for others a warning most commonly means a problem exists. kre
Re: toupper and warnings
On Thu, May 06, 2021 at 12:52:36PM -0700, Greg A. Woods wrote: >[...] > Well I don't actually do that. > > Which is to say my implementation's goal is to turn them into properly > promoted values and access the flags array in such a way that this does > not and cannot "cause" UB. My implementation carefully tests for a -1 > and then if it is not -1 it uses a cast and assignment to an unsigned > char, the value of which is then used to access the array. I.e. totally > safe _and_ UB-free. > I guess you mean testing not against -1 but testing against EOF? Because strictly speaking, EOF could be whatever integer value not in the unsigned char range (and this is why too casting to unsigned char is necessary because -1 is in the range of a signed char; after all, not considering EOF for other usages, in this case EOF could be an impossible signed or unsigned char value). -- Thierry Laronde http://www.kergis.com/ http://kertex.kergis.com/ http://www.sbfa.fr/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: toupper and warnings
At Thu, 06 May 2021 07:06:43 -0400, Greg Troxel wrote: Subject: Re: toupper and warnings > > Johnny Billquist writes: > > >> See CAVEATS in ctype(3). > > > > Right. But is gcc really smart enough to understand at compile time if > > something else than -1 is the negative value, and that toupper in fact > > is more limited than what the signature says? > > > > The *signature* of the function is int toupper(int). If you pass a > > char to that, I can't see that there would ever be a warning about any > > problems. Well the warning is actually an almost accidental side effect due to the implementation as a macro and due to the way the particular way the macro is expanded into an array reference. The potential to access an array with a negative index is obvious to the compiler in the current standard NetBSD implementation. Note too that actually implementing some of the ctype.h style interfaces as real functions on a system using signed chars could be problematic as they could then not internally distinguish between -1 and 0xFF when passed a plain un-cast signed char value (a value of 0xFF is of course a valid character, but due to the default integer promotions for all function parameters the sign will be extended and the result will be (int) -1). This matters for iscntrl(), isalpha(), and potentially also toupper() and tolower() depending on exactly how the return value is used. > To answer Greg Woods, It is not "conservative" to silently accept > UB-causing inputs and turn them into UB array reads. Well I don't actually do that. Which is to say my implementation's goal is to turn them into properly promoted values and access the flags array in such a way that this does not and cannot "cause" UB. My implementation carefully tests for a -1 and then if it is not -1 it uses a cast and assignment to an unsigned char, the value of which is then used to access the array. I.e. totally safe _and_ UB-free. Thus I take the conservative approach of not causing anything bad or unexpected to happen, including not causing UBSAN to abort the program. The only bad thing is that I do is to silently coerce truly badly behaving code into continuing to work without complaint, even from UBSAN. My test code for my implementation is here: https://github.com/robohack/experiments/blob/master/tctype.c So far it's only been tested with a Clang and a range of GCC versions. I can send the patch that adds it to -current too if you'd like to see it in-situ. > Another approach would be for NetBSD to adust the code to bounds check > before indexing and do something if the value is out of bounds. Yes, and my implementation could also easily be adjusted to do that. > Options > are returning false, returning a random bit, calling abort(3), trying to > send all of the user's files to a random IP addreess, and so on. All > are allowed by the spec :-) Yeah, "Undefined Behaviour" should be undefined -- i.e. removed from the spec -- i.e. become either fully defined or at least implementation defined. It is not helpful at all -- it was a very VERY bad idea. E.g. for ctype.h interfaces the spec should just say that values outside the recognized range will simply be truncated as if by assignment to an unsigned char. > Probably returning false with an envvar option to abort is best, similar > to how UB from mutex ops is handled. On the other hand perhaps this would better be a job for the UBSAN module to do, but perhaps that's just because I've become much more comfortable with using UBSAN with Clang in recent years. SIGILL for the losers. I'm not sure how my implementation could be so hooked into UBSAN tests though, especially in a less compiler-dependent way. > When we started with abort it > seemed there was a vast amount of incorrect code written and debugged on > other operating systems that allow code specified to hav UB. I haven't had quite that experience, though there have been some rare but stunning examples of problematic code -- though all I can remember were actually caught while porting the code to run on NetBSD/alpha, or in a few cases when porting to platforms that actually SIGBUS on unaligned accesses (like sparc IIRC?). What I am pretty sure of though is that there's a vast difference between the massive number of warnings spit out by the compiler vs. the relatively low number of actual cases of passing values outside of -1..255. We certainly wouldn't want to claim UB and abort for all of the warnings! -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgpWHDkukp9tx.pgp Description: OpenPGP Digital Signature
Re: toupper and warnings
Johnny Billquist writes: > On 2021-05-06 13:06, Greg Troxel wrote: >> >> Johnny Billquist writes: >> See CAVEATS in ctype(3). >>> >>> Right. But is gcc really smart enough to understand at compile time if >>> something else than -1 is the negative value, and that toupper in fact >>> is more limited than what the signature says? >>> >>> The *signature* of the function is int toupper(int). If you pass a >>> char to that, I can't see that there would ever be a warning about any >>> problems. >> >> The signature is that it takes an int, but the specification is that if >> the value of the int is other than EOF or something representable as >> unsigned char (projecting to he implementation, meaning -1 is ok and >> 0..255 is ok), then you get UB. > > Right. My question is just how on earth gcc would know this? There is > nothing in the actual declaration that tells (or even can) tell > this. So this would then (again) be an example of gcc knowing more > about the function than is actually visible. What is going on is that there is a header file that defines toupper as a macro (#define).After expanding that, gcc sees code that is using a char (which is signed) as an array subscript, which can reasonably be expected to have the possibility of out-of-bounds reads. gcc is AFAIK not bringing knowledge of toupper. > What if I wrote my own function called toupper, which was defined for > the full range of an int? Would gcc then understand that this is a > different toupper that it shouldn't warn about? toppper is specified by C99. So yes, you could implement a version that had safer behavior when it is formally undefined. If gcc had the specification expressed somehow, and could do static analysis, and gave a warning that "call to toupper could lead to udefined behavior", then I think that would be great. But it isn't doing that -- and that's more like UBsan than a compiler anyway. A lot of trouble is caused by people writing code that's ok with the implementation in front of them, but that ventures into UB per the standard. So I don't think such code should be accomodated in general. signature.asc Description: PGP signature
Re: toupper and warnings
On 2021-05-06 13:06, Greg Troxel wrote: Johnny Billquist writes: See CAVEATS in ctype(3). Right. But is gcc really smart enough to understand at compile time if something else than -1 is the negative value, and that toupper in fact is more limited than what the signature says? The *signature* of the function is int toupper(int). If you pass a char to that, I can't see that there would ever be a warning about any problems. The signature is that it takes an int, but the specification is that if the value of the int is other than EOF or something representable as unsigned char (projecting to he implementation, meaning -1 is ok and 0..255 is ok), then you get UB. Right. My question is just how on earth gcc would know this? There is nothing in the actual declaration that tells (or even can) tell this. So this would then (again) be an example of gcc knowing more about the function than is actually visible. What if I wrote my own function called toupper, which was defined for the full range of an int? Would gcc then understand that this is a different toupper that it shouldn't warn about? Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: toupper and warnings
Johnny Billquist writes: >> See CAVEATS in ctype(3). > > Right. But is gcc really smart enough to understand at compile time if > something else than -1 is the negative value, and that toupper in fact > is more limited than what the signature says? > > The *signature* of the function is int toupper(int). If you pass a > char to that, I can't see that there would ever be a warning about any > problems. The signature is that it takes an int, but the specification is that if the value of the int is other than EOF or something representable as unsigned char (projecting to he implementation, meaning -1 is ok and 0..255 is ok), then you get UB. The point of UB in specifications is to admit efficient implementations (or to end arguments about existing implementations). The NetBSD implementation uses the value to index an array. So if you have char c, d; c = getchar(); d = toupper(c); then it is entirely possible to cause an out-of-bounds read. The warning is coming from the code in the implementation of ctype -- which is a macro -- that does this read. In reading the code that leads to the warning, it is easy to conclude by inspection that the calling code is incorrect. > The fact that toupper in fact is only defined for -1..255 is hardly > visible in the signature of the function. True but I don't tnink that has much to do with this. We have a conforming implementation, and the compiler is pointing out a different kind of UB that results from the function being invoked in a way that causes UB. To follow up, adding -Wsystem-headers brought the warning back. clang also warns. To answer Greg Woods, It is not "conservative" to silently accept UB-causing inputs and turn them into UB array reads. Another approach would be for NetBSD to adust the code to bounds check before indexing and do something if the value is out of bounds. Options are returning false, returning a random bit, calling abort(3), trying to send all of the user's files to a random IP addreess, and so on. All are allowed by the spec :-) Probably returning false with an envvar option to abort is best, similar to how UB from mutex ops is handled. When we started with abort it seemed there was a vast amount of incorrect code written and debugged on other operating systems that allow code specified to hav UB. signature.asc Description: PGP signature
Re: toupper and warnings
On 2021-05-06 09:33, Martin Husemann wrote: On Thu, May 06, 2021 at 09:26:33AM +0200, Johnny Billquist wrote: I'm not sure I understand why you'd get a warning in the first place. toupper should take an int as input. Which is signed. If you pass in a char, it should always fit and work. Why should you get a warning? Because the function requires an unsigned char or EOF and is undefined for everything else. See CAVEATS in ctype(3). Right. But is gcc really smart enough to understand at compile time if something else than -1 is the negative value, and that toupper in fact is more limited than what the signature says? The *signature* of the function is int toupper(int). If you pass a char to that, I can't see that there would ever be a warning about any problems. The fact that toupper in fact is only defined for -1..255 is hardly visible in the signature of the function. Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: toupper and warnings
On Thu, May 06, 2021 at 09:26:33AM +0200, Johnny Billquist wrote: > I'm not sure I understand why you'd get a warning in the first place. > toupper should take an int as input. Which is signed. > If you pass in a char, it should always fit and work. > Why should you get a warning? Because the function requires an unsigned char or EOF and is undefined for everything else. See CAVEATS in ctype(3). Martin
Re: toupper and warnings
On 2021-05-06 02:26, Greg A. Woods wrote: At Wed, 5 May 2021 20:33:30 - (UTC), chris...@astron.com (Christos Zoulas) wrote: Subject: Re: toupper and warnings gcc hides warnings from system headers by default (my guess is because the Linux system headers have a concealed firearms license). Our bsd.sys.mk turns it on... Try gcc -Wall -Wsystem-headers Does that really make the difference? (I can't quickly test it at the moment as I've fixed my test system such that I don't ever get these warnings from the ctype macros. I.e. I went the other way and chose to make my implementation's "undefined behaviour" to be to behave as conservative as possible and allow the caller to use these macros in the traditional naive way without fear and without noisy warnings from picky compilers when char is signed.) I'm not sure I understand why you'd get a warning in the first place. toupper should take an int as input. Which is signed. If you pass in a char, it should always fit and work. Why should you get a warning? Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: toupper and warnings
At Wed, 5 May 2021 20:33:30 - (UTC), chris...@astron.com (Christos Zoulas) wrote: Subject: Re: toupper and warnings > > gcc hides warnings from system headers by default (my guess is because > the Linux system headers have a concealed firearms license). > > Our bsd.sys.mk turns it on... Try gcc -Wall -Wsystem-headers Does that really make the difference? (I can't quickly test it at the moment as I've fixed my test system such that I don't ever get these warnings from the ctype macros. I.e. I went the other way and chose to make my implementation's "undefined behaviour" to be to behave as conservative as possible and allow the caller to use these macros in the traditional naive way without fear and without noisy warnings from picky compilers when char is signed.) > I've been considering to make it the default for our native gcc, > but then again I am against gratuitous user facing changes... Note that -Wsystem-headers doesn't go well with -Wundef on NetBSD yet. I find -Wundef to be quite helpful in writing more robust CPP expressions -- it's saved me more than once from embarrassingly non-portable ones, though I didn't start using it nearly soon enough. -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgpzulPfCalY4.pgp Description: OpenPGP Digital Signature
Re: toupper and warnings
In article , Greg Troxel wrote: >-=-=-=-=-=- > > >We used to get warnings when toupper was called with char, because it is >UB pe POSIX if the value is negative. > >I am not seeing this any more on NetBSD 9. > >I wonder if this is intended, and if the UB is safely avoided somhow, or >if the warning is just missing, or ? > >example program that I think should get a UB warning; doesn't on >netbsd-9. > > >#include >#include > >int >main() >{ > char c, d; > > c = getchar(); > > d = toupper(c); > > if (d != c) >return 1; > else >return 0; >} gcc hides warnings from system headers by default (my guess is because the Linux system headers have a concealed firearms license). Our bsd.sys.mk turns it on... Try gcc -Wall -Wsystem-headers I've been considering to make it the default for our native gcc, but then again I am against gratuitous user facing changes... christos
toupper and warnings
We used to get warnings when toupper was called with char, because it is UB pe POSIX if the value is negative. I am not seeing this any more on NetBSD 9. I wonder if this is intended, and if the UB is safely avoided somhow, or if the warning is just missing, or ? example program that I think should get a UB warning; doesn't on netbsd-9. #include #include int main() { char c, d; c = getchar(); d = toupper(c); if (d != c) return 1; else return 0; } signature.asc Description: PGP signature