Re: svn commit: r327495 - head/usr.sbin/rpcbind

2018-02-04 Thread Konstantin Belousov
On Mon, Feb 05, 2018 at 07:53:03AM +1100, Bruce Evans wrote:
> On Sun, 4 Feb 2018, Konstantin Belousov wrote:
> 
> > On Sun, Feb 04, 2018 at 04:15:16PM +1100, Bruce Evans wrote:
> >> sig_atomic_t is no better than plain int.  This behaviour now makes 
> >> complete
> >> sense.  It is just like the undefined behaviour with the ctype functions,
> >> except since we own terminate_wfd we can guarantee that it doesn't change
> >> while the handler is active (and is valid when the handler is entered).
> >> We could also use atomic ops.  However, the C standard doesn't require
> >> anything that we do to work (except maybe in C11, atomic ops might be
> >> explicitly or implicitly specifed to work for things like this).
> >
> > Atomics are atomic WRT the signal handlers as well, the usual guarantees
> > of no torn writes and no out of air values on read hold.  Since FreeBSD
> > memory model, as documented in atomic(9), claims that naturally aligned
> > machine-native integer types are atomic without special declarations on
> > access, all guarantees for the handler accesses are already provided.
> 
> C11's precise wording is:
> 
>  [for async signals] the behavior is undefined if the signal handler
>  refers to any object with static or thread storage duration that
>  is not a lock-free atomic object other than by assigning a value
>  to an object declared as volatile sig_atomic_t
> 
> i.e., the same as in C99 except the behaviour is not specifically undefined
> for accesses to lock-free atomic objects.
> 
> Do we document atomics in userland?  C11 atomics are too hard for me.
> "lock-free atomic object" is a technical term and I don't know of any
> userland documentation that associates this term with naive ideas of
> atomics.  atomic(9) doesn't mention this either.
C11 atomics are not exactly same as FreeBSD atomics, and they are implemented
by different API.  Our atomic(9) API works same in kernel and in userland.

A (C11) lock-free atomic is the object for which the atomic_is_lock_free()
returns true.

> 
> > C11 also has a tool to ensure weaker than usual consistency guarantee,
> > only between the thread and a signal handler executing in the context of
> > the thread.  I do not see it useful in the discussed case.
> 
> Is that just a check for the case if !async signals (ones that are the
> result of raise() and abort()?).
Probably it was too obscure.  I mean atomic_signal_fence().
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r327495 - head/usr.sbin/rpcbind

2018-02-04 Thread Bruce Evans

On Sun, 4 Feb 2018, Konstantin Belousov wrote:


On Sun, Feb 04, 2018 at 04:15:16PM +1100, Bruce Evans wrote:

sig_atomic_t is no better than plain int.  This behaviour now makes complete
sense.  It is just like the undefined behaviour with the ctype functions,
except since we own terminate_wfd we can guarantee that it doesn't change
while the handler is active (and is valid when the handler is entered).
We could also use atomic ops.  However, the C standard doesn't require
anything that we do to work (except maybe in C11, atomic ops might be
explicitly or implicitly specifed to work for things like this).


Atomics are atomic WRT the signal handlers as well, the usual guarantees
of no torn writes and no out of air values on read hold.  Since FreeBSD
memory model, as documented in atomic(9), claims that naturally aligned
machine-native integer types are atomic without special declarations on
access, all guarantees for the handler accesses are already provided.


C11's precise wording is:

[for async signals] the behavior is undefined if the signal handler
refers to any object with static or thread storage duration that
is not a lock-free atomic object other than by assigning a value
to an object declared as volatile sig_atomic_t

i.e., the same as in C99 except the behaviour is not specifically undefined
for accesses to lock-free atomic objects.

Do we document atomics in userland?  C11 atomics are too hard for me.
"lock-free atomic object" is a technical term and I don't know of any
userland documentation that associates this term with naive ideas of
atomics.  atomic(9) doesn't mention this either.


C11 also has a tool to ensure weaker than usual consistency guarantee,
only between the thread and a signal handler executing in the context of
the thread.  I do not see it useful in the discussed case.


Is that just a check for the case if !async signals (ones that are the
result of raise() and abort()?).

Bruce
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r327495 - head/usr.sbin/rpcbind

2018-02-04 Thread Konstantin Belousov
On Sun, Feb 04, 2018 at 04:15:16PM +1100, Bruce Evans wrote:
> sig_atomic_t is no better than plain int.  This behaviour now makes complete
> sense.  It is just like the undefined behaviour with the ctype functions,
> except since we own terminate_wfd we can guarantee that it doesn't change
> while the handler is active (and is valid when the handler is entered).
> We could also use atomic ops.  However, the C standard doesn't require
> anything that we do to work (except maybe in C11, atomic ops might be
> explicitly or implicitly specifed to work for things like this).

Atomics are atomic WRT the signal handlers as well, the usual guarantees
of no torn writes and no out of air values on read hold.  Since FreeBSD
memory model, as documented in atomic(9), claims that naturally aligned
machine-native integer types are atomic without special declarations on
access, all guarantees for the handler accesses are already provided.

C11 also has a tool to ensure weaker than usual consistency guarantee,
only between the thread and a signal handler executing in the context of
the thread.  I do not see it useful in the discussed case.
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r327495 - head/usr.sbin/rpcbind

2018-02-04 Thread Bruce Evans

On Sat, 3 Feb 2018, Conrad Meyer wrote:


On Sat, Feb 3, 2018 at 6:46 AM, Bruce Evans  wrote:

On Tue, 2 Jan 2018, Conrad Meyer wrote:

...

Today I learned from the POSIX list that the pselect() function is designed
to avoid this bug.  It takes a signal mask arg (like msleep() in the
kernel).

I haven't noticed any function related to poll() or kevent that takes a
signal mask.


There is the similar function ppoll(), although it complies only with
the Linux standard, not POSIX.


ppoll() is relatively new in FreeBSD (2014-11-13).  Its man page says
that it is first appeared in 11.0, but it is now in 10.x, and according
to bsd-family-tree it actually appeared in the 10.2 release more than a
year before it appeared in the 11.0 release:
- 2014-11-13  pre-11.0 source tree commit
- 2014-12-21  10.x source tree commit
- 2015-08-13  10.2 release
- 2016-10-10  11.0 release.


With kevent, you can simply mask all (or most) signals and watch on a
signal event.  Conversion to kevent is more complicated, though.


Programs broken by buggy conversion:
...
- rpcbind(8).  This is not interactive and normally doesn't use ttys
  which might block.  However, the -d flag makes it do fprintf() to
  stderr.  This may block forever (e.g., for flow control), and you
  would like to be able to kill it using a signal.  But write() will
  restart.  rpcbind also uses plain signal() and doesn't know anything
  about SA_RESTART.


This was not broken by conversion -- it was already broken in this
case.  If the signal delivery raced with an fprintf, we ended up
invoking the stdio atexit handlers via exit(3) call in terminate(),
which of course encountered corrupted state.  Now it is broken in a
slightly different way, yes, if signal delivery races fprintf *and*
fprintf is blocked in flow control.  This might happen with a slow
serial for stderr but seems extraordinarily unlikely otherwise.


Unsafe signal handlers are broken by their existence, but sometimes work.

It is unlikely for ttys too.  I forgot that stdio is clueless about ttys
so it doesn't wait for output to drain even for stderr.  So the output is
normally buffered in the kernel and write() returns.


...

perror()
is broken as designed since it uses stdio, so it is unsafe in signal
handlers.  The err() family copies this bug.  Even *s*printf() is not
required to be safe in signal handlers.  I would fix the latter first.


It does seem like the printf family of routines should be
signal-handler safe.  Unfortunately, they are implemented in terms of
the unsafe stdio streams :-(.


Even sprintf() uses the generic function __vfprintf() on a fake FILE.  I
think it has problems with locales.  No ctype functions are in the list
of async-signal-safe functions in at least the 2001 version of POSIX.  It
is easy to see why they might not be safe.  They access lots of static
storage, and setlocale() or a simpler function that changes state might
run in another or just a nested signal handler, so locking is required...

So the safe sprintf() would probably have to use the C locale.  The locale
is already passed to __vfprintf(), as needed for sprintf_l().


...

Modified: head/usr.sbin/rpcbind/rpcbind.c
...
terminate(int signum)
...
+   wr = write(terminate_wfd, &c, 1);



Not async-signal-safe.  Acccesses in signal handlers to objects with
static storage duration give undefined behaviour except for assignment to
objects of type volatile sig_atomic_t, but the access to terminate_wfd is
a read and the type is plain int.


The type can be changed to volatile sig_atomic_t if you believe plain
int will trigger nonsensical compiler behavior.  The value is
initialized once, before the signal handler is registered, so unless
the compiler does something totally insane it should be fine on all
architectures FreeBSD runs on.


sig_atomic_t is no better than plain int.  This behaviour now makes complete
sense.  It is just like the undefined behaviour with the ctype functions,
except since we own terminate_wfd we can guarantee that it doesn't change
while the handler is active (and is valid when the handler is entered).
We could also use atomic ops.  However, the C standard doesn't require
anything that we do to work (except maybe in C11, atomic ops might be
explicitly or implicitly specifed to work for things like this).


+   if (wr < 1)
+   _exit(2);



Best to not check for write errors, since the error handling of using
_exit()
is worse than none.  It loses stdio flushing to handle an almost-harmless
error.  The main problem with keeping everything in a safe handler is that
it
is impossible to keep stdio flushing there and we would prefer to not lost
the stdio flushing.


I don't necessarily agree.  If the write fails, we missed the signal
telling us to terminate the program and will never exit.  That said,
how would the write ever fail?


We still set the flag, so can only fail to exit() in the race case,
but the write() failure affects all cases.

I can't see how the wr

Re: svn commit: r327495 - head/usr.sbin/rpcbind

2018-02-03 Thread Conrad Meyer
On Sat, Feb 3, 2018 at 6:46 AM, Bruce Evans  wrote:
> On Tue, 2 Jan 2018, Conrad Meyer wrote:
>> ...
> Today I learned from the POSIX list that the pselect() function is designed
> to avoid this bug.  It takes a signal mask arg (like msleep() in the
> kernel).
>
> I haven't noticed any function related to poll() or kevent that takes a
> signal mask.

There is the similar function ppoll(), although it complies only with
the Linux standard, not POSIX.

With kevent, you can simply mask all (or most) signals and watch on a
signal event.  Conversion to kevent is more complicated, though.

> Programs broken by buggy conversion:
> ...
> - rpcbind(8).  This is not interactive and normally doesn't use ttys
>   which might block.  However, the -d flag makes it do fprintf() to
>   stderr.  This may block forever (e.g., for flow control), and you
>   would like to be able to kill it using a signal.  But write() will
>   restart.  rpcbind also uses plain signal() and doesn't know anything
>   about SA_RESTART.

This was not broken by conversion -- it was already broken in this
case.  If the signal delivery raced with an fprintf, we ended up
invoking the stdio atexit handlers via exit(3) call in terminate(),
which of course encountered corrupted state.  Now it is broken in a
slightly different way, yes, if signal delivery races fprintf *and*
fprintf is blocked in flow control.  This might happen with a slow
serial for stderr but seems extraordinarily unlikely otherwise.

> ...
> But this is too hard for most programs.  I think it is best to try to write
> safe signal handlers.  Unfortunately, APIs support this negatively.

Agreed.  The signal abstraction is pretty awful.  I think the safest
way to handle them is to block them entirely, then watch using
kqueue()/kevent().  That way you never have to deal with signal
context.  But that kind of conversion is more work.  You also have to
deal with EINTR or be ok with blocking signal handling indefinitely.

> perror()
> is broken as designed since it uses stdio, so it is unsafe in signal
> handlers.  The err() family copies this bug.  Even *s*printf() is not
> required to be safe in signal handlers.  I would fix the latter first.

It does seem like the printf family of routines should be
signal-handler safe.  Unfortunately, they are implemented in terms of
the unsafe stdio streams :-(.

> Then
> try to change the API of warn() and warnx() to be safe.  err() can't be
> change since it has to keep calling exit(), but it is easy to use the safe
> warn() followed by _exit() stdio is avoided, and important to know that it
> is avoided.

That sounds nice.  I'm on board with that.

>> ... Non-functional change highlighted ...
>
> I think this is too specialized and complicated.
>
> ...
>
> 1 millisecond is usually too small, but 1 second seems reasonable for
> most cases.  The timeout is only used after rarely-lost races unless it
> is small.

Feel free to change it to 1s yourself, if you think it is important.

>> Modified: head/usr.sbin/rpcbind/rpcbind.c
>>
>> ==
>> --- head/usr.sbin/rpcbind/rpcbind.c Tue Jan  2 16:50:57 2018
>> (r327494)
>> +++ head/usr.sbin/rpcbind/rpcbind.c Tue Jan  2 17:25:13 2018
>> (r327495)
>> ...
>> @@ -761,8 +774,13 @@ rbllist_add(rpcprog_t prog, rpcvers_t vers, struct
>> net
>> static void
>> terminate(int signum)
>> {
>> +   char c = '\0';
>> +   ssize_t wr;
>>
>> doterminate = signum;
>> +   wr = write(terminate_wfd, &c, 1);
>
>
> Not async-signal-safe.  Acccesses in signal handlers to objects with
> static storage duration give undefined behaviour except for assignment to
> objects of type volatile sig_atomic_t, but the access to terminate_wfd is
> a read and the type is plain int.

The type can be changed to volatile sig_atomic_t if you believe plain
int will trigger nonsensical compiler behavior.  The value is
initialized once, before the signal handler is registered, so unless
the compiler does something totally insane it should be fine on all
architectures FreeBSD runs on.

>> +   if (wr < 1)
>> +   _exit(2);
>
>
> Best to not check for write errors, since the error handling of using
> _exit()
> is worse than none.  It loses stdio flushing to handle an almost-harmless
> error.  The main problem with keeping everything in a safe handler is that
> it
> is impossible to keep stdio flushing there and we would prefer to not lost
> the stdio flushing.

I don't necessarily agree.  If the write fails, we missed the signal
telling us to terminate the program and will never exit.  That said,
how would the write ever fail?

Conrad
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r327495 - head/usr.sbin/rpcbind

2018-02-03 Thread Bruce Evans

On Tue, 2 Jan 2018, Conrad Meyer wrote:


Log:
 rpcbind: Fix race in signal termination


I have yet to see any application that was correctly converted from using
unsafe signal handlers to safe signal handlers that just set a flag.  That
is without even noticing this race problem before.


 If a signal was delivered while the main thread was not in poll(2) and after
 check was performed, we could reenter poll and never detect termination. Fix
 this with the pipefd trick.  (This race was introduced very recently, in
 r327482.)


poll() at least returns if the signal occurs after it is called.  Most other
syscalls related to i/o restart after a syscall when SA_RESTART is configured
for the syscall.  SA_RESTART is the default for BSD since handling EINTR after
any syscall is too hard.  Simple conversions from using unsafe conversions
break i/o by not turning off SA_RESTART (top(1) and rpcbind(8) are examples
-- details below).  Unsimple conversions do turn off SA_RESTART but this is
complicated and tends to give bugs (ping(8) is an example).  This race bug
shows that even turning off SA_RESTART doesn't help.  Since it doesn't help,
it just tends to gives bugs from its complications and shouldn't be used.

Today I learned from the POSIX list that the pselect() function is designed
to avoid this bug.  It takes a signal mask arg (like msleep() in the kernel).
The flag should only be checked while signals are masked, and signals must
not be unmasked before making any syscall that might sleep, but applications
rarely bother to mask signals before checking the flag, and there aren't
many i/o functions that take a signal mask.

I haven't noticed any function related to poll() or kevent that takes a
signal mask.

Programs broken by buggy conversion:
- top(1).  Run it interactively and type 's' followed by any non-null
  valid input (say '1').  Then try to abort it using ^C or ^\.  Neither
  works, because read() restarts.  You have to enter a newline to get
  read() to return.  The flag is then checked and top exits with the
  bogus exit status 0 and no message.  top still uses plain signal()
  so gets the default (SA_RESTART on BSD systems and possibly bugs
  from !SA_RESTART on non-BSD systems).  Races on all systems when the
  signal arrives between checking the flag and calling read().
- rpcbind(8).  This is not interactive and normally doesn't use ttys
  which might block.  However, the -d flag makes it do fprintf() to
  stderr.  This may block forever (e.g., for flow control), and you
  would like to be able to kill it using a signal.  But write() will
  restart.  rpcbind also uses plain signal() and doesn't know anything
  about SA_RESTART.
- ping(8).  This does know about SA_RESTART.  However, turning off
  SA_RESTART gives lots of EINTRs that few BSD programs and libraries
  handle properly and fewer libraries document their handling.  This
  caused hangs for some unreachable addresses.  The resolver library
  handled EINTR (from select() IIRC) by restarting, so ^C didn't work.
  It is unclear if either returning or restarting after EINTR is correct
  in a library function.  This was fixed long ago and I didn't know of
  any other similar bugs in ping.  It would be hard to check all syscalls
  in ping and library functions that it calls.  Now I know that there
  must be lots of races because turning off SA_RESTART doesn't really work.

I used to think that correct conversion required only the following modest
changes:
- don't change SA_RESTART or otherwise fiddle with signal handling global
- find all syscalls in the program and libraries that might block, and add
  turn off SA_RESTART around them.  Possibly also change other signal handling
  locally.

But this doesn't work since the race can be lost before making the syscalls.

Next try: add timeouts using alarm() or itimers to every syscall that might
block.  I think this fixes the race before calling syscalls too.

But this is too hard for most programs.  I think it is best to try to write
safe signal handlers.  Unfortunately, APIs support this negatively.  perror()
is broken as designed since it uses stdio, so it is unsafe in signal
handlers.  The err() family copies this bug.  Even *s*printf() is not
required to be safe in signal handlers.  I would fix the latter first.  Then
try to change the API of warn() and warnx() to be safe.  err() can't be
change since it has to keep calling exit(), but it is easy to use the safe
warn() followed by _exit() stdio is avoided, and important to know that it
is avoided.


...
Modified: head/usr.sbin/rpcbind/rpcb_svc_com.c
==
--- head/usr.sbin/rpcbind/rpcb_svc_com.cTue Jan  2 16:50:57 2018
(r327494)
+++ head/usr.sbin/rpcbind/rpcb_svc_com.cTue Jan  2 17:25:13 2018
(r327495)

@@ -1130,23 +1133,26 @@ my_svc_run(void)
fprintf(stderr, ">\n");
}
#endif
-   swit

Re: svn commit: r327495 - head/usr.sbin/rpcbind

2018-01-02 Thread Conrad Meyer
On Tue, Jan 2, 2018 at 9:25 AM, Conrad Meyer  wrote:
> Author: cem
> Date: Tue Jan  2 17:25:13 2018
> New Revision: 327495
> URL: https://svnweb.freebsd.org/changeset/base/327495
>
> Log:
>   rpcbind: Fix race in signal termination
>
>   If a signal was delivered while the main thread was not in poll(2) and after
>   check was performed, we could reenter poll and never detect termination. Fix
>   this with the pipefd trick.  (This race was introduced very recently, in
>   r327482.)
>
>   PR:   224503
>   Reported by:  kib
>   Reviewed by:  kib, markj
>   Sponsored by: Dell EMC Isilon

I forgot:
Differential Revision: https://reviews.freebsd.org/D13732

Thanks,
Conrad
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"