Re: Avoid UB in pslist.h (NULL + 0)

2020-03-24 Thread Kamil Rytarowski
On 24.03.2020 14:30, Kamil Rytarowski wrote:
> (3) Patch Clang to start optimizing on NULL + in C so we can return to
> points (1) and (2).
> 

I have received a feedback that the particular NULL + 0 issue is
intended to be reported to the C committee as a defect.

I appreciate this approach. If there is an intention to tune the common
interpretation of the C code, it's better to collaborate with the C
committee directly.



signature.asc
Description: OpenPGP digital signature


Re: Avoid UB in pslist.h (NULL + 0)

2020-03-24 Thread Kamil Rytarowski
On 24.03.2020 07:43, Taylor R Campbell wrote:
>> Date: Sun, 22 Mar 2020 03:30:56 +0100
>> From: Kamil Rytarowski 
>>
>> On 22.03.2020 01:50, Taylor R Campbell wrote:
>>> So far, after several weeks of discussion, nobody has presented a case
>>> that there is a credible thread of a compiler actually misbehaving in
>>> this scenario.
>>
>> There are no public declarations (that was requested on a local ML) but
>> according to my IRC talks, there were plans to start optimizing on NULL
>> + 0 operations, but is was/is considered as a chicken-egg issue. There
>> is a time span now to alert users and
> 
> Perhaps it is not a chicken-egg issue, but pointless abuse of leeway
> in the standard.  So far the public declarations are that Clang
> developers realized they _could_ abuse the leeway this way but they
> _chose not to_; that essentially everyone involved sees such
> `optimization' as abuse; that there are no public reasons stated for
> why the C standard diverges from the C++ standard on this note.
> 
> If you have contrary information, please cite it specifically.
> 

I have nothing to add. I'm personally agnostic to both sides.

>> Both languages can be synced here as the incompatibility was revealed
>> and discussed. Personally I wouldn't be surprised to see adoption of the
>> C behavior as nullptr + 0 is invalid in C++ (and it will be likely
>> invalid in future C).
> 
> The fragment nullptr + 0 is invalid for an unrelated reason: nullptr
> is a pointer to an incomplete type.  We've gone over this already.
> 
> It's not helpful to keep bringing it up: at best it's a distraction,
> and at worst it gives the impression you might not understand basic
> aspects of C and C++ -- an impression which, if true, would make it
> all the more important that you _not_ go around tweaking things to
> appease sanitizers before discussing and understanding them.
> 
>> So far in reply I get 'just noise from the tool' so maybe my messages
>> were not clear enough and reaching authorities (from the clang community
>> and C committee) not credible enough.
> 
> You successfully presented a case that it is technically undefined
> behaviour.  This is not in dispute -- a month ago I cited the specific
> place where the C standard neglects to define it[*] -- so I don't
> understand why you continue to argue that case. 
> 
> [*] https://mail-index.NetBSD.org/tech-kern/2020/02/25/msg026105.html
> 

I reported that these issues are triggered in userland rump (+ in the
pslist.h ATF tests). Rump has userland assumptions with compiler flags.

I already declarared that using assembly code is not planned to be touched.

>>> (b) Change how we invoke ubsan and the compiler by passing
>>> -fno-delete-null-pointer-checks to clang.  joerg objected to this
>>> but I don't recall the details off the top of my head; joerg, can
>>> you expand on your argument against this, and which alternative
>>> you would prefer?
>>
>> I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I
>> was asked to revert... after first getting acknowledge from you that it
>> is a good idea...
> 
> This is an inaccurate representation of what happened.  What I said is
> (https://mail-index.NetBSD.org/tech-kern/2020/03/08/msg026125.html):
> 
>   `Adding -fno-delete-null-pointer-checks for clang too sounds
>sensible to me in general, but please check with joerg first.  It
>remains unclear to me that it's necessary here.'
> 
> I asked you to revert because:
> (a) the discussion was still ongoing without a conclusion,
> (b) it remained unclear whether the change was necessary,

I noted that it is necessary at least for memcpy(NULL, NULL, X)-like
usage in rump. Even if we ignore NULL + 0.

The kernel code is already prebuilt with
-fno-delete-null-pointer-checks, but not userland rump. This causes
slight incompatibilities.

I don't see what's unclear. I k

> (c) you failed to check with joerg like I asked, and

I checked earlier that joerg disagrees with GCC and he has its
interpretation of the standard that memcpy(NULL, NULL, 0)-like
optimization is wrong.

> (d) joerg objected.
> 

And GCC developers do not support this (me neither).

> I appreciate that sometimes it takes longer than you or I might like
> to get a clear explanation out of joerg, but it is also fatiguing to
> have to correct misrepresentations of positions in long meandering
> threads in order to ensure changes you are itching to make -- or have
> already made despite ongoing discussion -- are justified and correct.
> 
>>> (c) Change how we invoke ubsan, but just ubsan -- not the compiler.
>>
>> UBSan is an integral part of a compiler.
> 
> What I meant is:  Change the options we pass for builds with the
> sanitizer enabled, but not the options we pass for normal builds.  In
> other words, either pass _both_ -fsanitize=undefined and
> -fno-delete-null-pointer-checks (if MKSANITIZER=yes), or _neither_ of
> them (if MKSANITIZER=no).
> 

This could be integrated into 

Re: Avoid UB in pslist.h (NULL + 0)

2020-03-24 Thread Taylor R Campbell
> Date: Sun, 22 Mar 2020 03:30:56 +0100
> From: Kamil Rytarowski 
> 
> On 22.03.2020 01:50, Taylor R Campbell wrote:
> > So far, after several weeks of discussion, nobody has presented a case
> > that there is a credible thread of a compiler actually misbehaving in
> > this scenario.
> 
> There are no public declarations (that was requested on a local ML) but
> according to my IRC talks, there were plans to start optimizing on NULL
> + 0 operations, but is was/is considered as a chicken-egg issue. There
> is a time span now to alert users and

Perhaps it is not a chicken-egg issue, but pointless abuse of leeway
in the standard.  So far the public declarations are that Clang
developers realized they _could_ abuse the leeway this way but they
_chose not to_; that essentially everyone involved sees such
`optimization' as abuse; that there are no public reasons stated for
why the C standard diverges from the C++ standard on this note.

If you have contrary information, please cite it specifically.

> Both languages can be synced here as the incompatibility was revealed
> and discussed. Personally I wouldn't be surprised to see adoption of the
> C behavior as nullptr + 0 is invalid in C++ (and it will be likely
> invalid in future C).

The fragment nullptr + 0 is invalid for an unrelated reason: nullptr
is a pointer to an incomplete type.  We've gone over this already.

It's not helpful to keep bringing it up: at best it's a distraction,
and at worst it gives the impression you might not understand basic
aspects of C and C++ -- an impression which, if true, would make it
all the more important that you _not_ go around tweaking things to
appease sanitizers before discussing and understanding them.

> So far in reply I get 'just noise from the tool' so maybe my messages
> were not clear enough and reaching authorities (from the clang community
> and C committee) not credible enough.

You successfully presented a case that it is technically undefined
behaviour.  This is not in dispute -- a month ago I cited the specific
place where the C standard neglects to define it[*] -- so I don't
understand why you continue to argue that case.  Inline asm is
technically undefined by the standard too but it would be worse than
useless if ubsan warned about every instance of that.

[*] https://mail-index.NetBSD.org/tech-kern/2020/02/25/msg026105.html

> > (b) Change how we invoke ubsan and the compiler by passing
> > -fno-delete-null-pointer-checks to clang.  joerg objected to this
> > but I don't recall the details off the top of my head; joerg, can
> > you expand on your argument against this, and which alternative
> > you would prefer?
> 
> I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I
> was asked to revert... after first getting acknowledge from you that it
> is a good idea...

This is an inaccurate representation of what happened.  What I said is
(https://mail-index.NetBSD.org/tech-kern/2020/03/08/msg026125.html):

  `Adding -fno-delete-null-pointer-checks for clang too sounds
   sensible to me in general, but please check with joerg first.  It
   remains unclear to me that it's necessary here.'

I asked you to revert because:
(a) the discussion was still ongoing without a conclusion,
(b) it remained unclear whether the change was necessary,
(c) you failed to check with joerg like I asked, and
(d) joerg objected.

I appreciate that sometimes it takes longer than you or I might like
to get a clear explanation out of joerg, but it is also fatiguing to
have to correct misrepresentations of positions in long meandering
threads in order to ensure changes you are itching to make -- or have
already made despite ongoing discussion -- are justified and correct.

> > (c) Change how we invoke ubsan, but just ubsan -- not the compiler.
> 
> UBSan is an integral part of a compiler.

What I meant is:  Change the options we pass for builds with the
sanitizer enabled, but not the options we pass for normal builds.  In
other words, either pass _both_ -fsanitize=undefined and
-fno-delete-null-pointer-checks (if MKSANITIZER=yes), or _neither_ of
them (if MKSANITIZER=no).

> If we introduce code that relies on UB assumptions it is rather opposite
> of comprehensible code. Handling UB imposes handling corner-cases that
> is rather a good style of programming.

The kernel relies on UB assumptions everywhere, because it's a _part_
of the C implementation -- large swaths of the kernel are responsible
for driving a physical machine to implement the C abstract machine.

As the maintainers of the kernel, we have to distinguish the UB that
actually may have bad consequences, not blindly reject all UB because
the letter of the standard doesn't define it in the C abstract
machine.

> > (f) Ditch ubsan.  Does it have enough _true_ alarms, detecting actual
> > bugs, to make it worth keeping, or are we wasting a lot of time to
> > appease a tool that isn't actually giving us much value?
> 
> 1. Signed 

Re: Avoid UB in pslist.h (NULL + 0)

2020-03-21 Thread Kamil Rytarowski
On 22.03.2020 01:50, Taylor R Campbell wrote:
>> Date: Sun, 22 Mar 2020 00:03:57 +0100
>> From: Kamil Rytarowski 
>>
>> I propose to change the fun(pointer + 0) logic with fun(pointer, 0).
> 
> I don't think this is a good approach -- it requires modifying code
> further and further away from the relevant part.
> 
> 
> But let's step back a moment.
> 
> So far, after several weeks of discussion, nobody has presented a case
> that there is a credible thread of a compiler actually misbehaving in
> this scenario.
> 

There are no public declarations (that was requested on a local ML) but
according to my IRC talks, there were plans to start optimizing on NULL
+ 0 operations, but is was/is considered as a chicken-egg issue. There
is a time span now to alert users and

It was also confirmed by two people that clang or a C compiler in
general is allowed to optimize the code.

> Yes, it is technically undefined behaviour -- nobody disputes that --
> but so is lots of other stuff that NetBSD relies on such as inline
> asm.  In C++, it is explicitly _not_ undefined behaviour; Clang
> specifically stopped short of exploiting it as undefined behaviour in
> C; nobody presented reasons why it should be undefined in C but
> defined in C++.
> 

Our kernel/rump code is mostly in C or at most a common subset of C and
C++. As the languages are not fully compatible in details, it's safe to
assume the C behavior that is valid for every C++ program, rather than
C++ one that is not valid for C.

Both languages can be synced here as the incompatibility was revealed
and discussed. Personally I wouldn't be surprised to see adoption of the
C behavior as nullptr + 0 is invalid in C++ (and it will be likely
invalid in future C).

> So this all serves to work around false alarms from ubsan -- not
> actual bugs, just noise from the tool.  Presumably the reason we use
> ubsan at all is that it helps find actual bugs -- true alarms.
> There's a few ways we might approach the false alarms:
> 

Out of 3 cases that I was requested, 3 of them were confirmed to be real
bugs (alignment, memcpy(NULL,NULL, 0), ptr + 0). Two of the cases can
generate crashing code now. One of them was researched by the clang
developers and C committee and confirmed that a compiler can impose
assumptions that would break misdesigned code.

I presented examples for the two UB issues that they are harmful now.

So far in reply I get 'just noise from the tool' so maybe my messages
were not clear enough and reaching authorities (from the clang community
and C committee) not credible enough.

> (a) Ignore them.  It's what we've been doing so far.
> 

This is not true. We already run our kernel with GCC with 0 UBSan
reports and perform syzbot fuzzing. We catch real problems there.

It already happened.

We are reaching to the point of running all ATF reports under UBSan.

If we cannot suppress noise from a tool (any kind of it), then the tool
is not much usable for signaling real problems.

> (b) Change how we invoke ubsan and the compiler by passing
> -fno-delete-null-pointer-checks to clang.  joerg objected to this
> but I don't recall the details off the top of my head; joerg, can
> you expand on your argument against this, and which alternative
> you would prefer?
> 

I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I
was asked to revert... after first getting acknowledge from you that it
is a good idea...

> (c) Change how we invoke ubsan, but just ubsan -- not the compiler.
> There's a cost to this: if we diverge from how we invoke the
> compiler, we might be disabling true alarms from ubsan that
> reflect how the compiler will actually behave.
> 

UBSan is an integral part of a compiler.

> (d) Patch ubsan to disable the false alarms.  This incurs a
> maintenance burden, but maybe leaves less of a sharp edge to trip
> on than setting compiler flags in the makefile -- updates that
> change the behaviour are more likely to lead to merge conflicts.
> 

I'm fine to patch the tool to disable false alarms or rather report them
upstream.

There are some indeed false positives sometimes when we depend on
assumptions that are imposed after compilation, during linking phase. In
these cases we disable sanitizing as there is no way to teach the tool.

> (e) Patch our own code to suppress the false alarms.  The cost to this
> churn is that it can introduce bugs of its own, and make the code
> harder to understand, and the complexity may become obsolete in
> the next version of the tool but will remain a Chesterton's fence
> (`why is there a dummy argument in all these functions? can we get
> rid of it?').
> 

Every single commit can introduce bugs on their own.

If we introduce code that relies on UB assumptions it is rather opposite
of comprehensible code. Handling UB imposes handling corner-cases that
is rather a good style of programming.

"complexity may become obsolete in the next version of 

Re: Avoid UB in pslist.h (NULL + 0)

2020-03-21 Thread Joerg Sonnenberger
On Sun, Mar 22, 2020 at 12:50:16AM +, Taylor R Campbell wrote:
> (b) Change how we invoke ubsan and the compiler by passing
> -fno-delete-null-pointer-checks to clang.  joerg objected to this
> but I don't recall the details off the top of my head; joerg, can
> you expand on your argument against this, and which alternative
> you would prefer?

I objected to using it *in general*. The committed change *always*
adding it, not just when using sanitizers.

Joerg


Re: Avoid UB in pslist.h (NULL + 0)

2020-03-21 Thread Taylor R Campbell
> Date: Sun, 22 Mar 2020 00:03:57 +0100
> From: Kamil Rytarowski 
> 
> I propose to change the fun(pointer + 0) logic with fun(pointer, 0).

I don't think this is a good approach -- it requires modifying code
further and further away from the relevant part.


But let's step back a moment.

So far, after several weeks of discussion, nobody has presented a case
that there is a credible thread of a compiler actually misbehaving in
this scenario.

Yes, it is technically undefined behaviour -- nobody disputes that --
but so is lots of other stuff that NetBSD relies on such as inline
asm.  In C++, it is explicitly _not_ undefined behaviour; Clang
specifically stopped short of exploiting it as undefined behaviour in
C; nobody presented reasons why it should be undefined in C but
defined in C++.

So this all serves to work around false alarms from ubsan -- not
actual bugs, just noise from the tool.  Presumably the reason we use
ubsan at all is that it helps find actual bugs -- true alarms.
There's a few ways we might approach the false alarms:

(a) Ignore them.  It's what we've been doing so far.

(b) Change how we invoke ubsan and the compiler by passing
-fno-delete-null-pointer-checks to clang.  joerg objected to this
but I don't recall the details off the top of my head; joerg, can
you expand on your argument against this, and which alternative
you would prefer?

(c) Change how we invoke ubsan, but just ubsan -- not the compiler.
There's a cost to this: if we diverge from how we invoke the
compiler, we might be disabling true alarms from ubsan that
reflect how the compiler will actually behave.

(d) Patch ubsan to disable the false alarms.  This incurs a
maintenance burden, but maybe leaves less of a sharp edge to trip
on than setting compiler flags in the makefile -- updates that
change the behaviour are more likely to lead to merge conflicts.

(e) Patch our own code to suppress the false alarms.  The cost to this
churn is that it can introduce bugs of its own, and make the code
harder to understand, and the complexity may become obsolete in
the next version of the tool but will remain a Chesterton's fence
(`why is there a dummy argument in all these functions? can we get
rid of it?').

(f) Ditch ubsan.  Does it have enough _true_ alarms, detecting actual
bugs, to make it worth keeping, or are we wasting a lot of time to
appease a tool that isn't actually giving us much value?

From memory, most of the changes I've seen to appease ubsan are to
replace (1 << n) by (1u << n) when n=31.  Some projects take
advantage of -ftrapv and use signed integer types to express that
overflow is a mistake.  Some projects want two's complement
everywhere and use -fwrapv.

We're not committed one way or another; maybe we should keep it
that way, and ubsan helps us to do so; maybe it would be easier to
commit to -fwrapv.  There is a cost to patching all of these
shifts -- it makes merging from upstreams more painful.

Other issues that I've seen raised by ubsan in practice, from
memory: memcpy(NULL,NULL,0), floating-point division by zero,
alignment issues.  The first two are non-issues in real
implementations; the last suggests that maybe the tests we're
running on, e.g., sparc64 aren't very extensive.

Maybe someone can present an argument that ubsan is worth the pain
of patching and working around false alarms and merging local
changes into upstream updates.  It's not a priori clear to me --
I'm not proposing that we ditch ubsan; I'm just saying I'm not the
one to argue the case that we should use it as a bug-finding tool
and take on the costs of working around its many false alarms as a
rule.

First, I'd like to see a clearer argument about these options before
we move forward with more churn to suppress false alarms.


_After_ that argument, if we choose (e), patching our code, I would
like to see a way to get the static type checking in the pslist macros
(which has detected real bugs, at least in the container_of version
and probably also for the pslist macros) that doesn't require nonlocal
changes.  Originally (for container_of) I did something like

(sizeof( - ), ...)

to verify that a and b have compatible types, but some compilers (gcc,
I think) objected to a left-hand expression with no side effects in a
comma operator.  So I switched to

(expression yielding a pointer) + 0*sizeof( - )

to suppress that warning.  However, coverity didn't like something
about the sizeof and patching coverity is not an option for us, so we
defined __validate_container_of(...) to expand to 0 for coverity and
to 0*sizeof( - ) for non-coverity -- which is fine because we
still get the type-checking for normal builds.

This is still a bit of a kludge -- it only works when some
subexpression is a pointer to a complete type, although since the
point is to assert that types are compatible that