David, Thank you for detailed explanation. Could we put it as a comment right into the code?
-Dmitry On 2015-11-16 07:03, David Holmes wrote: > On 13/11/2015 11:38 PM, David Holmes wrote: >> On 13/11/2015 7:53 PM, Thomas Stüfe wrote: >>> Hi Goetz, >>> >>> sorry for not looking at this earlier. This is a nice cleanup. Some >>> remarks: >>> >>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html >>> >>> >>> >>> + if (sig > MAX2(SIGSEGV, SIGBUS) && // See 4355769. >>> + sig < NSIG) { // Must be legal signal and fit >>> into sigflags[]. >>> >>> I do not like much the MAX2() construct. I would like it better to >>> explicitly check whether the SR signal is one of the "forbidden" ones >>> the VM uses. >> >> I must confess I had not looked into 4355769 but this check seems rather >> spurious. It is not at all clear to me what signals could be used here - > > Okay should have looked into 4355769. The problem is how multiple > pending signals are handled. It seems that in the past (no idea if still > true) pending signals were handled in signal-number order (lowest > first), not FIFO. The problem scenario is this: > - thread accesses a null pointer in compiled Java code and the SEGV > handler will cause NPE to be thrown > - at the same time as the SEGV is being raised the thread is also hit > with the SR signal to suspend it. > - the SR signal will be delivered first and the SR handler starts to run > - with signals unblocked. > - the SEGV then gets delivered to the thread in the SR handler, and the > regular signal handler is run > - the regular signal handler tries to detect if we're running in Java > code so it can post the NPE, but the presence of the SR handler causes > that check to fail - so we abort thinking it is a real SEGV. > > I don't know how much of that is still true today. It seems strange to > me that a kill based directed signal can usurp a synchronous signal. > > Anyway the fix, rather workaround, for that problem, was to ensure that > the SR_signum is greater than any potential synchronous signal the VM > cares about. Why SIGBUS was included there I don't know give that: > > a) it is already a lower signal number than SIGUSR1, SIGSEGV and SIGUSR2 > b) we don't deliberately generate and use SIGBUS ... though perhaps > unsafe-fetch needs to be considered. > > A better fix in my opinion, and as mentioned in the bug, would have been > to disable delivery of SEGV whilst the SR handler is executing. But we > start to touch on some grey areas of the POSIX spec there, and likely > the implementation too. > > So I suggest that for this cleanup we simply leave this logic exactly as > is. > > Thanks, > David > >> other than SIGUSR1 or SIGUSR2 (if -Xrs is specified), or else a >> real-time signal (modulo discussion below). Hijacking anything else >> seems rather suspect. >> >>> Maybe keep a mask defined centrally for each platform which contains >>> signals the VM needs for itself ? >> >> Such masks already exist. >> >>> +sigset_t os::Aix::sigs = { 0 }; >>> >>> I would not initialize the signal set this way. sigset_t is an opaque >>> type; the only way to initialize it is with one of sigemptyset() or >>> sigfillset(). >> >> Good catch - I overlooked that. >> >>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html >>> >>> >>> >>> + static struct sigaction sigact[NSIG]; // saved preinstalled >>> sigactions >>> + static sigset_t sigs; // mask of signals that have >>> >>> + static int sigflags[NSIG]; >>> >>> I know this is not in the scope of your change, but I would like to see >>> those removed from os::Aix and put into os_aix.cpp at static filescope. >>> There is no need at all to export those, and you would get rid of the >>> signal.h dependency you know have when including os_aix.hpp. >>> >>> >>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/jsig.c.udiff.html >>> >>> >>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/os_bsd.cpp.udiff.html >>> >>> >>> >>> On BSD, we have realtime signals. >>> >>> http://fxr.watson.org/fxr/source/sys/signal.h >>> #define SIGRTMAX 126 >>> and NSIG does not contain them: >>> #define NSIG 32 >>> >>> The max. possible signal number would be 126, which unfortunately does >>> not even fit into a 64bit mask. >> >> So this simply limits the signal choice to not be a real-time signal - >> same as today. >> >>> So the code in jsig.c is broken for the case that someone wants to >>> register realtime signals, if the VM were to ever use realtime signals >>> itself, which now is not the case. >>> >>> The same is true for os_bsd.cpp, where signal chaining will not work if >>> the application did have handler for real time signals pre-installed >>> before jvm is loaded. >> >> Chaining is only used when the JVM will catch signals. Aren't all the >> real-time signals going to be blocked by the VM by default and so >> chaining is not needed as no handler will exist in the VM ?? (Unless a >> real-time signal is supplied for SR_signum) >> >> I must admit I don't know if any of this code actually works for >> real-time signals. >> >>> Solaris: >>> >>> The only platform where NSIG is missing? >>> >>> Here, we calculate the max. signal number dynamically in os_solaris.cpp, >>> presumably because SIGRTMAX is not a constant and can be changed using >>> system configuration. But then, on Linux we have the same situation >>> (SIGRTMAX is dynamic) and there we do not go through the trouble of >>> calculating the max. signal number dynamically. Instead we just use >>> NSIG=64 and rely on the fact that NSIG is larger than the largest >>> possible dynamic value for SIGRTMAX. >> >> Linux ensures that _NSIG (and thus NSIG) includes all the real-time >> signals. But libc can expose a subset and steal some for its own use. >> >>> Solaris does not seem to have NSIG defined, but I am sure there is also >>> a max. possible value for SIGRTMAX (the default seems to be 48). So, one >>> could probably safely define NSIG for Solaris too, so that we have NSIG >>> defined on all Posix platforms. >> >> Solaris doesn't have any of this SR_signum related code. A more general >> cleanup of signal related code would potentially involve a lot of >> cleanup. >> >> David >> ----- >> >>> >>> On Thu, Nov 12, 2015 at 8:24 PM, Lindenmaier, Goetz >>> <goetz.lindenma...@sap.com <mailto:goetz.lindenma...@sap.com>> wrote: >>> >>> Hi David, Dmitry, >>> >>> I've come up with a new webrev: >>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/ >>> >>> I hit on some more issues: >>> - As proposed, I replaced MAXSIGNUM by NSIG >>> - On AIX, NSIG=255. Therefore storing bits in a word does not >>> work. >>> I'm now using bitset functionality from signal.h as it's done >>> in other places. >>> sigset_t is >> NSIG on linux, so it's no good idea to use it >>> there. >>> >>> >>> Why do we not do this on all platforms, provided sigset_t contains all >>> signals (incl. realtime signals) ? >>> >>> - In the os files I found another bit vector that now is too >>> small: sigs. >>> I adapted that, too. Removed the dead declaration of this on >>> solaris. >>> >>> Best regards, >>> Goetz. >>> >>> >>> >>> Kind Regards, Thomas >>> >>> > -----Original Message----- >>> > From: Dmitry Samersoff [mailto:dmitry.samers...@oracle.com >>> <mailto:dmitry.samers...@oracle.com>] >>> > Sent: Donnerstag, 12. November 2015 10:05 >>> > To: Lindenmaier, Goetz; David Holmes; hotspot-runtime- >>> > d...@openjdk.java.net <mailto:d...@openjdk.java.net>; >>> serviceability-dev >>> > Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM >>> > >>> > Goetz, >>> > >>> > *BSD including OS X also defines NSIG (just checked) and if my >>> memory is >>> > not bogus, AIX defines it too. >>> > >>> > So you may consider to use NSIG on all platform. >>> > >>> > -Dmitry >>> > >>> > On 2015-11-12 11:36, Lindenmaier, Goetz wrote: >>> > > OK I'll change it to NSIG. That's used in other places in >>> os_linux, too. >>> > > So it's really more consistent. >>> > > >>> > > Best regards, >>> > > Goetz >>> > > >>> > >> -----Original Message----- >>> > >> From: Dmitry Samersoff [mailto:dmitry.samers...@oracle.com >>> <mailto:dmitry.samers...@oracle.com>] >>> > >> Sent: Donnerstag, 12. November 2015 09:22 >>> > >> To: David Holmes; Lindenmaier, Goetz; hotspot-runtime- >>> > >> d...@openjdk.java.net <mailto:d...@openjdk.java.net>; >>> serviceability-dev >>> > >> Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM >>> > >> >>> > >> David, >>> > >> >>> > >> I think it's better to use NSIG (without underscore) defined >>> in signal.h >>> > >> >>> > >> -Dmitry >>> > >> >>> > >> >>> > >> On 2015-11-12 10:35, David Holmes wrote: >>> > >>> Hi Goetz, >>> > >>> >>> > >>> Adding in serviceability-dev >>> > >>> >>> > >>> On 9/11/2015 6:22 PM, Lindenmaier, Goetz wrote: >>> > >>>> Hi, >>> > >>>> >>> > >>>> The environment variable _JAVA_SR_SIGNUM can be set to a >>> signal >>> > >> number >>> > >>>> do be used by the JVM's suspend/resume mechanism. >>> > >>>> >>> > >>>> If set, a signal handler is installed and the current signal >>> handler >>> > >>>> is saved to an array. >>> > >>>> On linux, this array had size MAXSIGNUM=32, and >>> _JAVA_SR_SIGNUM >>> > >> was >>> > >>>> allowed >>> > >>>> to range up to _NSIG=65. This could cause memory corruption. >>> > >>>> >>> > >>>> Further, in jsig.c, an unsinged int is used to set a bit for >>> signals. >>> > >>>> This also >>> > >>>> is too small, as only 32 signals can be supported. >>> Further, the >>> > >>>> signals are mapped >>> > >>>> wrong to these bits. '0' is not a valid signal, but '32' >>> was. 1<<32 >>> > >>>> happens to map to >>> > >>>> zero, so the signal could be stored, but this probably was >>> not >>> > >>>> intended that way. >>> > >>>> >>> > >>>> This change increases MAXSIGNUM to 65 on linux, and to 64 on >>> aix. It >>> > >>>> introduces >>> > >>>> proper checking of the signal read from the env var, and >>> issues a >>> > >>>> warning if it >>> > >>>> does not use the signal set. It adapts the data types in >>> jisig.c >>> > >>>> properly. >>> > >>>> >>> > >>>> Please review this change. I please need a sponsor. >>> > >>>> >>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.00 >>> > >>> >>> > >>> This all sounds very good to me. (I must find out why Solaris >>> is not >>> > >>> involved here :) ). >>> > >>> >>> > >>> On Linux you didn't add the bounds check to >>> os::Linux::set_our_sigflags. >>> > >>> >>> > >>> I'm also wondering about documenting where we are >>> determining the >>> > >>> maximum from? Is it simply _NSIG on some/all distributions? >>> And I see >>> > >>> _NSIG is supposed to be the biggest signal number + one. Also >>> linux >>> > >>> defines NSIG = _NSIG so which should we be using? >>> > >>> >>> > >>> Thanks, >>> > >>> David >>> > >>> >>> > >>>> Best regards, >>> > >>>> Goetz. >>> > >>>> >>> > >> >>> > >> >>> > >> -- >>> > >> Dmitry Samersoff >>> > >> Oracle Java development team, Saint Petersburg, Russia >>> > >> * I would love to change the world, but they won't give me the >>> sources. >>> > >>> > >>> > -- >>> > Dmitry Samersoff >>> > Oracle Java development team, Saint Petersburg, Russia >>> > * I would love to change the world, but they won't give me the >>> sources. >>> >>> -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources.