Re: NULL pointer arithmetic issues
> I wrote the following rant some time ago and posted it somewhere > I'll throw it in here for some more fuel > NO MORE "undefined behaviour"!!! Pick something sane and stick to it! > The problem with modern "Standard" C is that instead of refining > the definition of the abstract machine to match the most common > and/or logical behaviour of existing implementations, the standards > committee chose to throw the baby out with the bath water and make > whole swaths of conditions into so-called "undefined behaviour" > conditions. Unfortunately for your argument, they did this because there are "existing implementations" that disagree severely over the points in question. A spec that mandated such things as the "pointers are really just memory addresses" model you sketch below would, at best, simply get ignored by implementors on machines that don't match it. Perhaps that's what you'd want. Personally, I prefer the actual choice. > An excellent example are the data-flow optimizations that are now > commonly abused to elide security/safety-sensitive code: > int > foo(struct bar *p) > { > char *lp = p->s; > > if (p == NULL || lp == NULL) { > return -1; > } This code is, and always has been, broken; it is accessing p->s before it knows that p isn't nil. If you're really unlucky you'll be on a machine where there are device registers at address 0 and you'll poke a device register with that read. If you're less lucky you'll be on MS-DOS or a PDP-11 or some such and silently and harmlessly get a meaningless value for lp. If you're lucky you'll get a segfault or moral equivalent. Anyone who thinks this sort of sloppiness is appropriate in security/safety-sensitive code please stay far, far away from anything that might run on my machines. Yes, an optimizer _might_ defer the fetch of lp, but it also might not, for any of many reasons; relying on its doing so is extremely brittle, most definitely not appropriate for anything security/safety-sensitive. That said, I do agree that simply dropping the p==NULL check but preserving the fetch of lp is, if anything, even more broken; it is gross abuse of the latitude permitted by the undefined-behaviour rules. But that is a quality-of-implementation issue. > Worse yet this example stems from actual Linux kernel code [...] Good gods. I'm gladder than ever I don't run Linux. > [...], yet again any programmer worth their salt knows that the > address of an field in a struct is simply the sum of the struct's > base address and the offset of the field, [...] That's what a mediocre C programmer thinks. A good one knows there is a difference between the abstract machine and the implementation and realizes that, while that is a common implementation, it is far from the only possible one, and it is inappropriate to rely on it being an accurate description (except in code not intended to be portable, like a kernel's pmap layer). > Worst of all consider this example: > size_t o = offsetof(p, s); > And then consider an extremely common example of "offsetof()" [...] Such an implementation of offsetof() is nonportable, exactly because it assumes things like your sketch based on the "pointers are just memory addresses" model. Providing it in application code constitutes nonportable code, just as much as assuming shorts are 18 bits does. (What, you mean you're not on a 36-bit machine? What sort of weird hardware are you using?) An implementation may provide it, yes, if - IF! - it knows the associated compiler handles that code such that offsetof() returns the correct result. But what would you expect it to do in, say, Zeta-C? Or do you think Zeta-C should not exist? > or possibly (for those who know that pointers are not always "just" > integers): > #define offsetof(type, member) ((size_t)(unsigned long)((&((type > *)0)->member) - (type *)0)) That has never worked in C since, oh, I dunno, V7? and probably never will; it tries to subtract pointers that point to different types. What I think of as the usual implementation along those lines would be something like ((size_t)((char *)&((type *)0)->member - (char *)0)) (note the lack of an intermediate cast to unsigned long; size_t may be wider than unsigned long, though admittedly it's unlikely offsetof() will need to return a value greater than the largest unsigned long). /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
I very much agree that pointer arithmetic MUST NOT be "undefined", even if it includes "NULL" and/or "0". The warning that begat this thread is insane! Note I say this as someone who is very empathetic to implementers who might try to make C work in any strange hardware systems where "null" pointers are not actually all zeros in the hardware. I hope to be one. At Mon, 24 Feb 2020 14:41:26 +0100, Kamil Rytarowski wrote: Subject: Re: NULL pointer arithmetic issues > > Please join the C committee as a voting member or at least submit papers > with language changes. Complaining here won't change anything. > > (Out of people in the discussion, I am involved in wg14 discussions and > submit papers.) If you are active on the wg14 committee, perhaps you can be convinced to argue on "our" behalf? [0.5 :-)] I wrote the following rant some time ago and posted it somewhere (probably on G+ because I don't find it now with a quick search). I'll throw it in here for some more fuel NO MORE "undefined behaviour"!!! Pick something sane and stick to it! The problem with modern "Standard" C is that instead of refining the definition of the abstract machine to match the most common and/or logical behaviour of existing implementations, the standards committee chose to throw the baby out with the bath water and make whole swaths of conditions into so-called "undefined behaviour" conditions. An excellent example are the data-flow optimizations that are now commonly abused to elide security/safety-sensitive code: int foo(struct bar *p) { char *lp = p->s; if (p == NULL || lp == NULL) { return -1; } lp[0] = '\0'; return 0; } Any programmer worth their salt will assume the compiler can calculate the offset of 's' at compile time and thus anyone ignorant of C's new "undefined behaviour" rules will guess that at worst some location on the stack will be assigned a value pulled from low memory (if that doesn't cause a SIGSEGV), but more likely the de-reference of 'p' won't happen right away because we all know that any optimizer worth it's salt SHOULD defer it until the first use of 'lp', perhaps not even allocating any stack space for 'lp' at all! Worse yet this example stems from actual Linux kernel code like this: static int podhd_try_init(struct usb_interface *interface, struct usb_line6_podhd *podhd) { struct usb_line6 *line6 = >line6; if ((interface == NULL) || (podhd == NULL)) return ENODEV; } Here some language-lawyer-wannabees [[LLWs]] might try in vain to argue over the interpretation of "dereferencing", yet again any programmer worth their salt knows that the address of an field in a struct is simply the sum of the struct's base address and the offset of the field, the latter of which the compiler obviously knows at compile time, and adding a value to a NULL pointer should never be considered invalid or undefined! [[ You have to start from somewhere, after all Why not zero? ]] (I suspect the LLWs are being misled by the congruence between "a->b" and "(*a).b".) Worst of all consider this example: void * foo(struct bar *p) { size_t o = offsetof(p, s); if (s == NULL) return NULL; } And then consider an extremely common example of "offsetof()" which might very well appear in a legacy application's own code because it pre-dated , though indeed this very definition has been used in by several standard compiler implementations, and indeed it was specifically allowed in general by ISO C90 (and only more recently denied by C11, sort of): #define offsetof(type, member) ((size_t)(unsigned long)(&((type *)0)->member)) or possibly (for those who know that pointers are not always "just" integers): #define offsetof(type, member) ((size_t)(unsigned long)((&((type *)0)->member) - (type *)0)) Here we have very effectively and entirely hidden the fact that the '->' operator is used with 's'. Any sane person with some understanding of programming languages should agree that it is wrong to assume that calculating the address of an lvalue "evaluates" that lvalue. In C the '->' and '[]' operators are arithmetic operators, not (immediately and on their own) memory access operators. Sadly C's new undefined behaviour rules as interpreted by some compiler maintainers now allow the compiler to STUPIDLY assume that since the programmer has knowingly put a supposed de-reference of a pointer on the first line of the function, then any comparisons of that pointer with NULL further on are OBVIOUSLY never ever going to be true
Re: NULL pointer arithmetic issues
> Date: Mon, 24 Feb 2020 11:42:01 +0100 > From: Kamil Rytarowski > > Forbidding NULL pointer arithmetic is not just for C purists trolls. It > is now in C++ mainstream and already in C2x draft. > > The newer C standard will most likely (already accepted by the > committee) adopt nullptr on par with nullptr from C++. In C++ we can > "#define NULL nullptr" and possibly the same will be possible in C. > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2394.pdf > > This will change all arithmetic code operating on NULL into syntax error. Arithmetic on bare NULL is already an error, flagged by the options -Wpointer-arith -Werror which we already use, and arithmetic on the proposed nullptr will remain so. This question is not about that, or about syntax. The question is whether it is realistic to imagine that a compiler we will ever use to build the kernel -- particularly with the option -fno-delete-null-pointer-checks as we already use to build the kernel with gcc -- will actually meaningfully distinguish the fragments char *x = NULL; return x; and char *x = NULL; return x + 0; Will two programs that differ only by this fragment actually behave differently on any serious C implementation we use in NetBSD, ignoring the pedantry of ubsan? (The question is the same if you substitute the proposed nullptr for NULL; it's about the meaning of + on a null pointer, not whether the program is syntactically written with the letters `NULL' or `nullptr'.) The second program technically has undefined behaviour because in, e.g., C99 6.5.6 `Additive operators', the meaning of + is defined on pointer/integer operands only when the pointer is to an object in an array and the sum stays within the array or points one past the end -- in other words, there's nothing in C99 formally defining what x + 0 means when x is a null pointer. Why is the standard written this way? I surmise that it's because technically there exist implementations such as Zeta-C where a `pointer' is not simply a virtual address in a machine register but actually a pair of a Lisp array and an index into it. NetBSD does not run on such implementations. Corners of the standard that serve _only_ to accommodate such implementations are not relevant to NetBSD on their own. The standard is also technically written so that a null pointer is not necessarily stored as all bits zero in memory, so char *x; memset(, 0, sizeof x); return x; is not guaranteed to return a null pointer. However, NetBSD only runs on C implementations where it actually is guaranteed to return a null pointer, and we rely on this pervasively. If we make _only_ the assumptions that the standard formally guarantees, then ubsan would be right to object that char *x; memset(, 0, sizeof x); return x == NULL ? 0 : *(char *)x; has undefined behaviour. But in NetBSD this is guaranteed to return 0 and so if ubsan flagged it we would treat that as a useless false alarm that detracts from the value of ubsan as a tool. If you can present a compelling argument that C implementations which are _relevant to NetBSD_ -- not merely technically allowed by the letter of the standard like Zeta-C -- will actually behave differently from how I described, please present that. Otherwise please find a way to suppress the false alarm in the tool so it doesn't waste any more time. (And please do the same for memcpy(x,NULL,0)/memcpy(NULL,y,0)!)
Re: NULL pointer arithmetic issues
>> int one(void) { return(1); } >> then (one()-one()) is not [a null pointer constant] > As you say, it's an integer expression. And I read that "or" part as > just an expression, which this is. So I believe it is a valid way to > creating something that can be converted to a NULL pointer. C99 words it as [#3] An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. That little word "such" is important. As I read it, that means a null pointer constant is "[a]n integer constant expression with the value 0, or an integer constant expression with the value 0 cast to type void *" (not just "an integer expression with the value 0 cast to..."). > if (expression) statement; shall execute statement if expression not > equals 0, according to the standard. Yes: [#2] In both forms, the first substatement is executed if the expression compares unequal to 0. In the else form, the second substatement is executed if the expression compares equal to 0. If the first substatement is reached via a label, the second substatement is not executed. > So, where does that leave this code: > char *p; > if (p) foo(); > p is not an integer. How do you compare it to 0? The same way you do in if (p != 0) foo(); How else? I would admittedly prefer slightly more verbose wording, saying explicitly that the 0 to which the control expression is compared is, when applicable, a null pointer constant. (Comparison of a pointer with a null pointer constant is specifically permitted for == and != - see 6.5.9.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
On 2020-02-25 02:12, Mouse wrote: Oh. And I actually do not believe it has to be a constant. You are correct; it does not need to be a simple constant. The text says "integer constant expression with the value 0, or such an expression..." Yes. (void *)(1-1) is a valid null pointer constant. So, on an all-ASCII system, is (0x1f+(3*5)-'.'). But, in the presence of int one(void) { return(1); } then (one()-one()) is not - it is an integer expression with value zero, but it is not an integer _constant_ expression. It's entirely possible that (int *)(one()-one()) will produce a different pointer from (int *)(1-1) - the latter is a null pointer; the former might or might not be, depending on the implementation. As you say, it's an integer expression. And I read that "or" part as just an expression, which this is. So I believe it is a valid way to creating something that can be converted to a NULL pointer. Also: if (expression) statement; shall execute statement if expression not equals 0, according to the standard. So, where does that leave this code: char *p; . . if (p) foo(); p is not an integer. How do you compare it to 0? Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: NULL pointer arithmetic issues
> Oh. And I actually do not believe it has to be a constant. You are correct; it does not need to be a simple constant. > The text says "integer constant expression with the value 0, or such > an expression..." Yes. (void *)(1-1) is a valid null pointer constant. So, on an all-ASCII system, is (0x1f+(3*5)-'.'). But, in the presence of int one(void) { return(1); } then (one()-one()) is not - it is an integer expression with value zero, but it is not an integer _constant_ expression. It's entirely possible that (int *)(one()-one()) will produce a different pointer from (int *)(1-1) - the latter is a null pointer; the former might or might not be, depending on the implementation. Similarly, int i; i = 0; if ((int *)i == (int *)0) ... else ... may test unequal. (I have a very fuzzy memory that says POSIX may impose additional restrictions that might affect this; I'm talking strictly about C99 here.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
On Mon, Feb 24, 2020 at 05:35:22PM -0500, Mouse wrote: > > Unless I remember wrong, older C standards explicitly say that the > > integer 0 can be converted to a pointer, and that will be the NULL > > pointer, and a NULL pointer cast as an integer shall give the value > > 0. > > The only one I have anything close to a copy of is C99, for which I > have a very late draft. > > Based on that: > > You are not quite correct. Any integer may be converted to a pointer, > and any pointer may be converted to an integer - but the mapping is > entirely implementation-dependent, except in the integer->pointer > direction when the integer is a "null pointer constant", defined as > "[a]n integer constant expression with the value 0" (or such an > expression cast to void *, though not if we're talking specifically > about integers), in which case "the resulting pointer, called a null > pointer, is guaranteed to compare unequal to a pointer to any object or > function". You could have meant that, but what you wrote could also be > taken as applying to the _run-time_ integer value 0, which C99's > promise does not apply to. (Quotes are from 6.3.2.3.) > > I don't think there is any promise that converting a null pointer of > any type back to an integer will necessarily produce a zero integer. > The wording was the same for C89 and there is this paragraph in K (second edition, p 102): "Pointers and integers are not interchangeable. Zero is the sole exception: the constant zero may be assigned to a pointer, and a pointer may be compared with the constant zero. The symbolic constant NULL is often used in place of zero, as a mnemonic to indicate more clearly that this is a special value for a pointer. [...]" I interpret this (the paragraph above and the standard) as: in comparing a pointer to the constant zero, the constant zero is converted to a pointer of NULL value, thus comparing pointer to pointer and not comparing an integer value (the integer value of the pointer) to an integer value (0). So defining NULL as the casting of 0 is (was?) in the C standard, the actual value of the expression i.e. of an incorrect (NULL) pointer being implementation defined. FWIW, -- Thierry Laronde http://www.kergis.com/ http://www.sbfa.fr/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Re: NULL pointer arithmetic issues
> On Feb 24, 2020, at 4:08 PM, Kamil Rytarowski wrote: > > NULL in C is expected to be harmonized with nullptr from C++. This is insanity. C++ is a cesspit where dreams of elegant code go to die. -- thorpej ...proud owner of a "C++ Barf Bag" that hangs outside his office...
Re: NULL pointer arithmetic issues
On 24.02.2020 23:35, Mouse wrote: >> Unless I remember wrong, older C standards explicitly say that the >> integer 0 can be converted to a pointer, and that will be the NULL >> pointer, and a NULL pointer cast as an integer shall give the value >> 0. > > The only one I have anything close to a copy of is C99, for which I > have a very late draft. > > Based on that: > > You are not quite correct. Any integer may be converted to a pointer, > and any pointer may be converted to an integer - but the mapping is > entirely implementation-dependent, except in the integer->pointer > direction when the integer is a "null pointer constant", defined as > "[a]n integer constant expression with the value 0" (or such an > expression cast to void *, though not if we're talking specifically > about integers), in which case "the resulting pointer, called a null > pointer, is guaranteed to compare unequal to a pointer to any object or > function". You could have meant that, but what you wrote could also be > taken as applying to the _run-time_ integer value 0, which C99's > promise does not apply to. (Quotes are from 6.3.2.3.) > > I don't think there is any promise that converting a null pointer of > any type back to an integer will necessarily produce a zero integer. > > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTML mo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B > $ cat test.cpp #include int main(int argc, char **argv) { if (((char *)0)[argc]) return 1; else return 0; } $ g++ test.cpp $ ./a.out Memory fault (core dumped) And some variations: $ g++ test.cpp test.cpp: In function ‘int main(int, char**)’: test.cpp:6:15: warning: converting NULL to non-pointer type [-Wconversion-null] if (NULL[argc]) ^ test.cpp:6:15: error: invalid types ‘long int[int]’ for array subscript $ g++ test.cpp test.cpp: In function ‘int main(int, char**)’: test.cpp:6:18: error: invalid types ‘std::nullptr_t[int]’ for array subscript if (nullptr[argc]) ^ NULL in C is expected to be harmonized with nullptr from C++. We still can store NULL/nullptr in variables as before and there is no change in the produced code. The only change is on the syntax level as we can catch more bugs earlier. Whenever a compiler will be smart enough to deduce that the code is nullptr[0] it will raise an error. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 2020-02-24 21:24, Kamil Rytarowski wrote: On 24.02.2020 21:18, Mouse wrote: If we use 0x0, it can be a valid pointer. If we use NULL, it's not expected to work and will eventually generate a syntax erro. Then someone has severely broken compatability with older versions of C. 0x0 and (when one of the suitable #includes has been done) NULL have both historically been perfectly good null pointer constants. Also...syntax error? Really? _Syntax_ error?? I'd really like to see what they've done to the grammar to lead to that; I'm having trouble imagining how that would be done. The process of evaluation of the NULL semantics is not a recent thing. Not so long time, still in the NetBSD times, it was a general practice to allow dereferencing the NULL pointer and expect zeroed bytes over there. We still maintain compatibility with this behavior (originated as a hack in PDP11) in older NetBSD releases (NetBSD-0.9 Franz Lisp binaries depend on this). Really? I thought we usually do not have anything mapped at address 0 to explicitly catch any dereferencing of NULL pointers. But yes, on the PDP11 this was/is not the case. Memory space is too precious to allow some of it to be wasted for this... (Even if there are a comment about it in 2.11BSD, bemoaning this fact...) Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: NULL pointer arithmetic issues
On 2020-02-25 00:24, Johnny Billquist wrote: On 2020-02-24 23:35, Mouse wrote: Unless I remember wrong, older C standards explicitly say that the integer 0 can be converted to a pointer, and that will be the NULL pointer, and a NULL pointer cast as an integer shall give the value 0. The only one I have anything close to a copy of is C99, for which I have a very late draft. Based on that: You are not quite correct. Any integer may be converted to a pointer, and any pointer may be converted to an integer - but the mapping is entirely implementation-dependent, except in the integer->pointer direction when the integer is a "null pointer constant", defined as "[a]n integer constant expression with the value 0" (or such an expression cast to void *, though not if we're talking specifically about integers), in which case "the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function". You could have meant that, but what you wrote could also be taken as applying to the _run-time_ integer value 0, which C99's promise does not apply to. (Quotes are from 6.3.2.3.) I don't think there is any promise that converting a null pointer of any type back to an integer will necessarily produce a zero integer. Maybe we are reading things differently...? Looking at 6.3.2.3... As far as I read, paragraph 3 says: "An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function." Essentially, the integer constant 0 can be casted to a pointer, and that pointer is then a null pointer constand, also called a null pointer. And footnote 55 says: Oh. And I actually do not believe it has to be a constant. The text says "integer constant expression with the value 0, or such an expression..." So either a constant expression, or just an expression, which gives a 0, can be cast to a pointer, that that will be the NULL pointer. (I realized when reading, that I might have implied that it only applied to constanst, which I think it does not.) But I might have misunderstood everything, of course... Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: NULL pointer arithmetic issues
On 2020-02-24 23:35, Mouse wrote: Unless I remember wrong, older C standards explicitly say that the integer 0 can be converted to a pointer, and that will be the NULL pointer, and a NULL pointer cast as an integer shall give the value 0. The only one I have anything close to a copy of is C99, for which I have a very late draft. Based on that: You are not quite correct. Any integer may be converted to a pointer, and any pointer may be converted to an integer - but the mapping is entirely implementation-dependent, except in the integer->pointer direction when the integer is a "null pointer constant", defined as "[a]n integer constant expression with the value 0" (or such an expression cast to void *, though not if we're talking specifically about integers), in which case "the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function". You could have meant that, but what you wrote could also be taken as applying to the _run-time_ integer value 0, which C99's promise does not apply to. (Quotes are from 6.3.2.3.) I don't think there is any promise that converting a null pointer of any type back to an integer will necessarily produce a zero integer. Maybe we are reading things differently...? Looking at 6.3.2.3... As far as I read, paragraph 3 says: "An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.55) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function." Essentially, the integer constant 0 can be casted to a pointer, and that pointer is then a null pointer constand, also called a null pointer. And footnote 55 says: "The macro NULL is defined in (and other headers) as a null pointer constant; see 7.17." So, 0 casted as a pointer gives a NULL pointer. And paragraph 6 says: "Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type." And I can only read the "previously specified" to refer to the equivalence between a NULL pointer and integer 0, because nothing before paragraph 6 talks about pointer to integer, so I can't see how it can be read as something more specific than all the things mentioned in the prebious 6 paragraphs. Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: NULL pointer arithmetic issues
On 2020-02-24 21:18, Mouse wrote: If we use 0x0, it can be a valid pointer. If we use NULL, it's not expected to work and will eventually generate a syntax erro. Then someone has severely broken compatability with older versions of C. 0x0 and (when one of the suitable #includes has been done) NULL have both historically been perfectly good null pointer constants. Also...syntax error? Really? _Syntax_ error?? I'd really like to see what they've done to the grammar to lead to that; I'm having trouble imagining how that would be done. Unless I remember wrong, older C standards explicitly say that the integer 0 can be converted to a pointer, and that will be the NULL pointer, and a NULL pointer cast as an integer shall give the value 0. This is also used in such things as code traversing linked lists to check for the end of the list... And the C standard also explicitly allows the NULL pointer to not be represented by something with all bits cleared. Only that casting to/from integers have a very defined behavior. Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: NULL pointer arithmetic issues
> Unless I remember wrong, older C standards explicitly say that the > integer 0 can be converted to a pointer, and that will be the NULL > pointer, and a NULL pointer cast as an integer shall give the value > 0. The only one I have anything close to a copy of is C99, for which I have a very late draft. Based on that: You are not quite correct. Any integer may be converted to a pointer, and any pointer may be converted to an integer - but the mapping is entirely implementation-dependent, except in the integer->pointer direction when the integer is a "null pointer constant", defined as "[a]n integer constant expression with the value 0" (or such an expression cast to void *, though not if we're talking specifically about integers), in which case "the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function". You could have meant that, but what you wrote could also be taken as applying to the _run-time_ integer value 0, which C99's promise does not apply to. (Quotes are from 6.3.2.3.) I don't think there is any promise that converting a null pointer of any type back to an integer will necessarily produce a zero integer. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
If we use 0x0, it can be a valid pointer. If we use NULL, it's not expected to work and [...] >> Then someone has severely broken compatability with older versions >> of C. [...] > The process of evaluation of the NULL semantics is not a recent > thing. No, it's not. But I was talking about the equivalence of 0x0 and NULL (in pointer contexts), not about what happens when you indirect through the result of converting either to a specific object pointer type. If 0x0 and NULL do different things in a pointer context, someone has severely broken backward compatability, regardless of what either of the "different things" is. There's a _lot_ of code that depends on the (historically promised by the spec) assurance that any of the various historically specified ways of spelling a null pointer constant, including 0x0 and, with a suitable #include, NULL, _is_ a null pointer constant. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
Kamil Rytarowski wrote: >We still maintain compatibility with this behavior (originated as a hack >in PDP11) in older NetBSD releases (NetBSD-0.9 Franz Lisp binaries >depend on this). I presume this was built for NetBSD/vax. A 68k build of Franz Lisp wouldn't try to dereference a NULL pointer.
Re: NULL pointer arithmetic issues
On 24.02.2020 21:18, Mouse wrote: >>> If we use 0x0, it can be a valid pointer. > >>> If we use NULL, it's not expected to work and will eventually >>> generate a syntax erro. > > Then someone has severely broken compatability with older versions of > C. 0x0 and (when one of the suitable #includes has been done) NULL > have both historically been perfectly good null pointer constants. > > Also...syntax error? Really? _Syntax_ error?? I'd really like to see > what they've done to the grammar to lead to that; I'm having trouble > imagining how that would be done. > The process of evaluation of the NULL semantics is not a recent thing. Not so long time, still in the NetBSD times, it was a general practice to allow dereferencing the NULL pointer and expect zeroed bytes over there. We still maintain compatibility with this behavior (originated as a hack in PDP11) in older NetBSD releases (NetBSD-0.9 Franz Lisp binaries depend on this). > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTML mo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B > signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
>> If we use 0x0, it can be a valid pointer. >> If we use NULL, it's not expected to work and will eventually >> generate a syntax erro. Then someone has severely broken compatability with older versions of C. 0x0 and (when one of the suitable #includes has been done) NULL have both historically been perfectly good null pointer constants. Also...syntax error? Really? _Syntax_ error?? I'd really like to see what they've done to the grammar to lead to that; I'm having trouble imagining how that would be done. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
re: NULL pointer arithmetic issues
> > Nonsense, I think it's fair to classify that as a bug. That sort of > > stuff is *not* supposed to happen if -ffreestanding is passed to the > > compiler. > > If we use 0x0, it can be a valid pointer. > > If we use NULL, it's not expected to work and will eventually generate a > syntax erro. this is not true in GCC, since a while now -- 2 years ago i had to write this code to stop GCC from emitting "trap" when accessing the 0x0 pointer, from powerpc/oea/oea_machdep.c: /* * Load pointer with 0 behind GCC's back, otherwise it will * emit a "trap" instead. */ static __inline__ uintptr_t zero_value(void) { uintptr_t dont_tell_gcc; __asm volatile ("li %0, 0" : "=r"(dont_tell_gcc) :); return dont_tell_gcc; } .mrg.
Re: NULL pointer arithmetic issues
In article , Kamil Rytarowski wrote: >-=-=-=-=-=- >-=-=-=-=-=- > >On 24.02.2020 13:41, Joerg Sonnenberger wrote: >> On Mon, Feb 24, 2020 at 11:42:01AM +0100, Kamil Rytarowski wrote: >>> Forbidding NULL pointer arithmetic is not just for C purists trolls. It >>> is now in C++ mainstream and already in C2x draft. >> >> This is not true. NULL pointer arithmetic and nullptr arithmetic are >> *very* different things. Do not conflate them. >> >> Joerg >> > >As noted, they are allowed to be practically the same in C++. The C >proposal (n2394) NULL is marked as deprecated and NULL should be set to >nullptr. This is just a proposal; once it becomes part of the standard we can worry about it. I agree with the rest of the people that we should (for now) change these cases in the sanitizer to not produce errors instead of making the code more complicated, to make the sanitizer happy. christos
Re: NULL pointer arithmetic issues
On 24.02.2020 15:35, Don Lee wrote: > >> On Feb 24, 2020, at 8:05 AM, Mouse wrote: >> > RUST is better defined that C and is indeed used in OS development > these days ...so? I don't see how this is related to the rest of the discussion. >>> As C is considered as not suitable for OS development, >> >> Once again, there is no such language as C. There is a family of >> closely related languages collectively called C. >> >> But it's actually the compiler, not the language. >> >>> there is an escape plan, already with a successful story in this >>> domain. >> >> There's another one, and one that doesn't require the complete rewrite >> a switch as drastic as C->rust would: various compilers (including >> older versions of the gcc family) that don't think it reasonable to >> take clear code and language-lawyer it into broken executables. >> > We need to be mindful of the gargantuan body of code written in “C”, > expecting the “old” behavior, much of it no longer having any sort of support. > > Software lives almost as long as government programs. > > -dgl- > While there, CHERI CPU can catch invalid intermediates (invalid pointer, before dereferencing). This is something that breaks a lot of old C code. tcpdump (that still preserves ifdefs for MSDOS) received rewrite to remove these types of bugs. https://www.cl.cam.ac.uk/~dc552/papers/asplos15-memory-safe-c.pdf signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
> On Feb 24, 2020, at 8:05 AM, Mouse wrote: > RUST is better defined that C and is indeed used in OS development these days >>> ...so? I don't see how this is related to the rest of the >>> discussion. >> As C is considered as not suitable for OS development, > > Once again, there is no such language as C. There is a family of > closely related languages collectively called C. > > But it's actually the compiler, not the language. > >> there is an escape plan, already with a successful story in this >> domain. > > There's another one, and one that doesn't require the complete rewrite > a switch as drastic as C->rust would: various compilers (including > older versions of the gcc family) that don't think it reasonable to > take clear code and language-lawyer it into broken executables. > We need to be mindful of the gargantuan body of code written in “C”, expecting the “old” behavior, much of it no longer having any sort of support. Software lives almost as long as government programs. -dgl-
Re: NULL pointer arithmetic issues
On 24.02.2020 15:04, Jason Thorpe wrote: > >> On Feb 24, 2020, at 4:22 AM, Kamil Rytarowski wrote: >> >> A compiler once being smart enough can introduce ILL/SEGV traps into >> code that performs operations on NULL pointers. This already bitten us >> when we were registering a handler at address 0x0 for the kernel code, >> GCC changed the operation into a cpu trap. (IIRC it was in the sparc code.) > > Nonsense, I think it's fair to classify that as a bug. That sort of stuff is > *not* supposed to happen if -ffreestanding is passed to the compiler. > > -- thorpej > If we use 0x0, it can be a valid pointer. If we use NULL, it's not expected to work and will eventually generate a syntax erro. UBSan as a runtime tool tries to indirectly catch the latter with the former and is prone to some rare false positives (so far not reported). If a compiler is too smart for 0x0 pointers, transforming them to abort traps, it is a compiler bug. I noted that this already happens. On 24.02.2020 15:05, Mouse wrote: > (3) If you have reason to think the C committee would be interested in > having me as a member, let me know whom to talk to. I might or might > not actually end up interested in joining, but I'd like more info. http://www.open-std.org/jtc1/sc22/wg14/ signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
> On Feb 24, 2020, at 4:22 AM, Kamil Rytarowski wrote: > > A compiler once being smart enough can introduce ILL/SEGV traps into > code that performs operations on NULL pointers. This already bitten us > when we were registering a handler at address 0x0 for the kernel code, > GCC changed the operation into a cpu trap. (IIRC it was in the sparc code.) Nonsense, I think it's fair to classify that as a bug. That sort of stuff is *not* supposed to happen if -ffreestanding is passed to the compiler. -- thorpej
Re: NULL pointer arithmetic issues
>> C is not a language. C is a family of closely related languages. >> Some of them are suitable for OS implementation. It appears some of >> the more recent ones are not, but this does not mean the older ones >> also aren't. > From my perception the trend is inversed. Things that were undefined > or unspecified in older revisions of C, are more clearly defined now. You seem to be confusing "clearly defined" with "useful". Modern members of the C family may indeed be more clearly defined. That is not the problem. The problem is...hmm, actually, I misspoke upthread. It is not the language that is the problem; it is the compiler. Unless - and I find this highly unlikely - there is something in the latest versions of C _requiring_ the compiler to perform these unexpected transformations, the language itself is fine; it is the compiler that is at fault, in that it chooses to take advantage of the not-forbidden-by-the-spec latitude to "optimize" code in unexpected ways. >> [...]; it is not a compiler's place to take the position of "ha ha, >> the code you wrote is clear but I can find a way to lawyer it into >> formally undefined behaviour, so I'm going to transform it into >> something I know damn well you didn't expect". > Please join the C committee as a voting member or at least submit > papers with language changes. Complaining here won't change > anything. (1) It might get NetBSD to stop trying to insist on using a compiler that is not suitable for the purpose. (2) As I realized above, it's not the language that's the problem. (3) If you have reason to think the C committee would be interested in having me as a member, let me know whom to talk to. I might or might not actually end up interested in joining, but I'd like more info. >>> RUST is better defined that C and is indeed used in OS development >>> these days >> ...so? I don't see how this is related to the rest of the >> discussion. > As C is considered as not suitable for OS development, Once again, there is no such language as C. There is a family of closely related languages collectively called C. But it's actually the compiler, not the language. > there is an escape plan, already with a successful story in this > domain. There's another one, and one that doesn't require the complete rewrite a switch as drastic as C->rust would: various compilers (including older versions of the gcc family) that don't think it reasonable to take clear code and language-lawyer it into broken executables. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
On 24.02.2020 14:03, Mouse wrote: It is now in C++ mainstream and already in C2x draft. >>> Then those are not suitable languages for OS implementations. >> This battle is lost for C > > C is not a language. C is a family of closely related languages. > If we tread C as gnu89 gnu99 gnu11 k etc this is true. > Some of them are suitable for OS implementation. It appears some of > the more recent ones are not, but this does not mean the older ones > also aren't. > From my perception the trend is inversed. Things that were undefined or unspecified in older revisions of C, are more clearly defined now. > Undefined behaviour as a way of describing differences between > implementations, things that it limits portability to depend on, is > useful. Undefined behaviour as a license-by-fiat for compilers to > unnecessarily transform code in unexpected ways is not. Software > languages and their compilers exist to serve their users, not the other > way around; it is not a compiler's place to take the position of "ha > ha, the code you wrote is clear but I can find a way to lawyer it into > formally undefined behaviour, so I'm going to transform it into > something I know damn well you didn't expect". > Please join the C committee as a voting member or at least submit papers with language changes. Complaining here won't change anything. (Out of people in the discussion, I am involved in wg14 discussions and submit papers.) >> RUST is better defined that C and is indeed used in OS development >> these days > > ...so? I don't see how this is related to the rest of the discussion. > As C is considered as not suitable for OS development, there is an escape plan, already with a successful story in this domain. > /~\ The ASCII Mouse > \ / Ribbon Campaign > X Against HTML mo...@rodents-montreal.org > / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B > signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
On 24.02.2020 13:41, Joerg Sonnenberger wrote: > On Mon, Feb 24, 2020 at 11:42:01AM +0100, Kamil Rytarowski wrote: >> Forbidding NULL pointer arithmetic is not just for C purists trolls. It >> is now in C++ mainstream and already in C2x draft. > > This is not true. NULL pointer arithmetic and nullptr arithmetic are > *very* different things. Do not conflate them. > > Joerg > As noted, they are allowed to be practically the same in C++. The C proposal (n2394) NULL is marked as deprecated and NULL should be set to nullptr. signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
>>> It is now in C++ mainstream and already in C2x draft. >> Then those are not suitable languages for OS implementations. > This battle is lost for C C is not a language. C is a family of closely related languages. Some of them are suitable for OS implementation. It appears some of the more recent ones are not, but this does not mean the older ones also aren't. Undefined behaviour as a way of describing differences between implementations, things that it limits portability to depend on, is useful. Undefined behaviour as a license-by-fiat for compilers to unnecessarily transform code in unexpected ways is not. Software languages and their compilers exist to serve their users, not the other way around; it is not a compiler's place to take the position of "ha ha, the code you wrote is clear but I can find a way to lawyer it into formally undefined behaviour, so I'm going to transform it into something I know damn well you didn't expect". > RUST is better defined that C and is indeed used in OS development > these days ...so? I don't see how this is related to the rest of the discussion. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
On Mon, Feb 24, 2020 at 11:42:01AM +0100, Kamil Rytarowski wrote: > Forbidding NULL pointer arithmetic is not just for C purists trolls. It > is now in C++ mainstream and already in C2x draft. This is not true. NULL pointer arithmetic and nullptr arithmetic are *very* different things. Do not conflate them. Joerg
Re: NULL pointer arithmetic issues
On 24.02.2020 12:14, Mouse wrote: >> Forbidding NULL pointer arithmetic is not just for C purists trolls. >> It is now in C++ mainstream and already in C2x draft. > > Then those are not suitable languages for OS implementations. > > I'm with campbell and mrg on this one. It is not appropriate to twist > NetBSD's code into a pretzel to work around "bugs" created by language > committees deciding to give compilers new latitutde to "optimize" > meaningful code into trash. > This battle is lost for C and not be fought on a downstream user of a C compiler (Matt Thomas insisted at some point to get the kernel buildable with C++ and patched it for this..). A compiler once being smart enough can introduce ILL/SEGV traps into code that performs operations on NULL pointers. This already bitten us when we were registering a handler at address 0x0 for the kernel code, GCC changed the operation into a cpu trap. (IIRC it was in the sparc code.) Looking at it from the proper perspective, the only rumpkernel reported NULL->0 arithmetic is performed by the pserialize macros. Once we will patch them, the problem can go away. So claim about twisting the kernel code or churn is exaggeration. RUST is better defined that C and is indeed used in OS development these days (there are startups doing OS development in RUST, e.g. https://github.com/oxidecomputer). signature.asc Description: OpenPGP digital signature
Re: NULL pointer arithmetic issues
> Forbidding NULL pointer arithmetic is not just for C purists trolls. > It is now in C++ mainstream and already in C2x draft. Then those are not suitable languages for OS implementations. I'm with campbell and mrg on this one. It is not appropriate to twist NetBSD's code into a pretzel to work around "bugs" created by language committees deciding to give compilers new latitutde to "optimize" meaningful code into trash. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: NULL pointer arithmetic issues
On 24.02.2020 05:03, Taylor R Campbell wrote: >> Date: Sun, 23 Feb 2020 22:51:08 +0100 >> From: Kamil Rytarowski >> >> On 23.02.2020 20:08, Taylor R Campbell wrote: >> Date: Sun, 23 Feb 2020 22:51:08 +0100 >> From: Kamil Rytarowski >> >> On 23.02.2020 20:08, Taylor R Campbell wrote: Date: Sat, 22 Feb 2020 17:25:42 +0100 From: Kamil Rytarowski What's the proper approach to address this issue? >>> >>> What do these reports mean? >>> >>> UBSan: Undefined Behavior in >>> /usr/src/sys/rump/net/lib/libnet/../../../../netinet6/in6.c:2351:2, pointer >>> expression with base 0 overflowed to 0 >> >> We added 0 to a NULL pointer. >> >> They can be triggered by code like: >> >> char *p = NULL; >> p += 0; > > It seems to me the proper approach is to teach the tool to accept > this, and to avoid cluttering the tree with churn to work around the > tool's deficiency, unless there's actually a serious compelling > argument -- beyond a language-lawyering troll -- that (char *)NULL + 0 > is meaningfully undefined. > > We already assume, for example, that memset(...,0,...) is the same as > initialization to null pointers where the object in question is a > pointer or has pointers as subobjects. > Forbidding NULL pointer arithmetic is not just for C purists trolls. It is now in C++ mainstream and already in C2x draft. The newer C standard will most likely (already accepted by the committee) adopt nullptr on par with nullptr from C++. In C++ we can "#define NULL nullptr" and possibly the same will be possible in C. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2394.pdf This will change all arithmetic code operating on NULL into syntax error. > I think we should treat memcpy(NULL,NULL,0) similarly and tell the > tool `no, on NetBSD that really is defined and we're not interested in > hearing about theoretical nasal demons from armchair language > lawyers'. > memcpy(3) and other string functions are different. It is undefined if we just run it with memcpy(rand(), rand(), 0) and the first two arguments point to invalid memory. memcpy(0, 0, x) have another issue with overlapping memory that makes it undefined. In theory memcpy(x,y,z) where x or y are 0 is valid, whenever we map 0x0 in the address space, but that is so rare that GCC defines these arguments as nonnull. signature.asc Description: OpenPGP digital signature