Goetz, On 2015-11-16 13:32, Lindenmaier, Goetz wrote: > Hi Dmitry, > > I think this is a rather complex issue which is better documented in the > bug than in the code. The code already references the bugID. > Also, I would have to copy the explanation into three files ... > I you don't object I'll leave the comment as is.
I'm OK with it. Leave it as is. -Dmitry > > Best regards, > Goetz. > > -----Original Message----- > From: Dmitry Samersoff [mailto:dmitry.samers...@oracle.com] > Sent: Montag, 16. November 2015 09:23 > To: David Holmes; Thomas Stüfe; Lindenmaier, Goetz > Cc: serviceability-dev; hotspot-runtime-...@openjdk.java.net > Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM > > David, > > Thank you for detailed explanation. Could we put it as a comment right > into the code? > > -Dmitry > > On 2015-11-16 07:03, David Holmes wrote: >> On 13/11/2015 11:38 PM, David Holmes wrote: >>> On 13/11/2015 7:53 PM, Thomas Stüfe wrote: >>>> Hi Goetz, >>>> >>>> sorry for not looking at this earlier. This is a nice cleanup. Some >>>> remarks: >>>> >>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html >>>> >>>> >>>> >>>> + if (sig > MAX2(SIGSEGV, SIGBUS) && // See 4355769. >>>> + sig < NSIG) { // Must be legal signal and fit >>>> into sigflags[]. >>>> >>>> I do not like much the MAX2() construct. I would like it better to >>>> explicitly check whether the SR signal is one of the "forbidden" ones >>>> the VM uses. >>> >>> I must confess I had not looked into 4355769 but this check seems rather >>> spurious. It is not at all clear to me what signals could be used here - >> >> Okay should have looked into 4355769. The problem is how multiple >> pending signals are handled. It seems that in the past (no idea if still >> true) pending signals were handled in signal-number order (lowest >> first), not FIFO. The problem scenario is this: >> - thread accesses a null pointer in compiled Java code and the SEGV >> handler will cause NPE to be thrown >> - at the same time as the SEGV is being raised the thread is also hit >> with the SR signal to suspend it. >> - the SR signal will be delivered first and the SR handler starts to run >> - with signals unblocked. >> - the SEGV then gets delivered to the thread in the SR handler, and the >> regular signal handler is run >> - the regular signal handler tries to detect if we're running in Java >> code so it can post the NPE, but the presence of the SR handler causes >> that check to fail - so we abort thinking it is a real SEGV. >> >> I don't know how much of that is still true today. It seems strange to >> me that a kill based directed signal can usurp a synchronous signal. >> >> Anyway the fix, rather workaround, for that problem, was to ensure that >> the SR_signum is greater than any potential synchronous signal the VM >> cares about. Why SIGBUS was included there I don't know give that: >> >> a) it is already a lower signal number than SIGUSR1, SIGSEGV and SIGUSR2 >> b) we don't deliberately generate and use SIGBUS ... though perhaps >> unsafe-fetch needs to be considered. >> >> A better fix in my opinion, and as mentioned in the bug, would have been >> to disable delivery of SEGV whilst the SR handler is executing. But we >> start to touch on some grey areas of the POSIX spec there, and likely >> the implementation too. >> >> So I suggest that for this cleanup we simply leave this logic exactly as >> is. >> >> Thanks, >> David >> >>> other than SIGUSR1 or SIGUSR2 (if -Xrs is specified), or else a >>> real-time signal (modulo discussion below). Hijacking anything else >>> seems rather suspect. >>> >>>> Maybe keep a mask defined centrally for each platform which contains >>>> signals the VM needs for itself ? >>> >>> Such masks already exist. >>> >>>> +sigset_t os::Aix::sigs = { 0 }; >>>> >>>> I would not initialize the signal set this way. sigset_t is an opaque >>>> type; the only way to initialize it is with one of sigemptyset() or >>>> sigfillset(). >>> >>> Good catch - I overlooked that. >>> >>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html >>>> >>>> >>>> >>>> + static struct sigaction sigact[NSIG]; // saved preinstalled >>>> sigactions >>>> + static sigset_t sigs; // mask of signals that have >>>> >>>> + static int sigflags[NSIG]; >>>> >>>> I know this is not in the scope of your change, but I would like to see >>>> those removed from os::Aix and put into os_aix.cpp at static filescope. >>>> There is no need at all to export those, and you would get rid of the >>>> signal.h dependency you know have when including os_aix.hpp. >>>> >>>> >>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/jsig.c.udiff.html >>>> >>>> >>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/os_bsd.cpp.udiff.html >>>> >>>> >>>> >>>> On BSD, we have realtime signals. >>>> >>>> http://fxr.watson.org/fxr/source/sys/signal.h >>>> #define SIGRTMAX 126 >>>> and NSIG does not contain them: >>>> #define NSIG 32 >>>> >>>> The max. possible signal number would be 126, which unfortunately does >>>> not even fit into a 64bit mask. >>> >>> So this simply limits the signal choice to not be a real-time signal - >>> same as today. >>> >>>> So the code in jsig.c is broken for the case that someone wants to >>>> register realtime signals, if the VM were to ever use realtime signals >>>> itself, which now is not the case. >>>> >>>> The same is true for os_bsd.cpp, where signal chaining will not work if >>>> the application did have handler for real time signals pre-installed >>>> before jvm is loaded. >>> >>> Chaining is only used when the JVM will catch signals. Aren't all the >>> real-time signals going to be blocked by the VM by default and so >>> chaining is not needed as no handler will exist in the VM ?? (Unless a >>> real-time signal is supplied for SR_signum) >>> >>> I must admit I don't know if any of this code actually works for >>> real-time signals. >>> >>>> Solaris: >>>> >>>> The only platform where NSIG is missing? >>>> >>>> Here, we calculate the max. signal number dynamically in os_solaris.cpp, >>>> presumably because SIGRTMAX is not a constant and can be changed using >>>> system configuration. But then, on Linux we have the same situation >>>> (SIGRTMAX is dynamic) and there we do not go through the trouble of >>>> calculating the max. signal number dynamically. Instead we just use >>>> NSIG=64 and rely on the fact that NSIG is larger than the largest >>>> possible dynamic value for SIGRTMAX. >>> >>> Linux ensures that _NSIG (and thus NSIG) includes all the real-time >>> signals. But libc can expose a subset and steal some for its own use. >>> >>>> Solaris does not seem to have NSIG defined, but I am sure there is also >>>> a max. possible value for SIGRTMAX (the default seems to be 48). So, one >>>> could probably safely define NSIG for Solaris too, so that we have NSIG >>>> defined on all Posix platforms. >>> >>> Solaris doesn't have any of this SR_signum related code. A more general >>> cleanup of signal related code would potentially involve a lot of >>> cleanup. >>> >>> David >>> ----- >>> >>>> >>>> On Thu, Nov 12, 2015 at 8:24 PM, Lindenmaier, Goetz >>>> <goetz.lindenma...@sap.com <mailto:goetz.lindenma...@sap.com>> wrote: >>>> >>>> Hi David, Dmitry, >>>> >>>> I've come up with a new webrev: >>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/ >>>> >>>> I hit on some more issues: >>>> - As proposed, I replaced MAXSIGNUM by NSIG >>>> - On AIX, NSIG=255. Therefore storing bits in a word does not >>>> work. >>>> I'm now using bitset functionality from signal.h as it's done >>>> in other places. >>>> sigset_t is >> NSIG on linux, so it's no good idea to use it >>>> there. >>>> >>>> >>>> Why do we not do this on all platforms, provided sigset_t contains all >>>> signals (incl. realtime signals) ? >>>> >>>> - In the os files I found another bit vector that now is too >>>> small: sigs. >>>> I adapted that, too. Removed the dead declaration of this on >>>> solaris. >>>> >>>> Best regards, >>>> Goetz. >>>> >>>> >>>> >>>> Kind Regards, Thomas >>>> >>>> > -----Original Message----- >>>> > From: Dmitry Samersoff [mailto:dmitry.samers...@oracle.com >>>> <mailto:dmitry.samers...@oracle.com>] >>>> > Sent: Donnerstag, 12. November 2015 10:05 >>>> > To: Lindenmaier, Goetz; David Holmes; hotspot-runtime- >>>> > d...@openjdk.java.net <mailto:d...@openjdk.java.net>; >>>> serviceability-dev >>>> > Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM >>>> > >>>> > Goetz, >>>> > >>>> > *BSD including OS X also defines NSIG (just checked) and if my >>>> memory is >>>> > not bogus, AIX defines it too. >>>> > >>>> > So you may consider to use NSIG on all platform. >>>> > >>>> > -Dmitry >>>> > >>>> > On 2015-11-12 11:36, Lindenmaier, Goetz wrote: >>>> > > OK I'll change it to NSIG. That's used in other places in >>>> os_linux, too. >>>> > > So it's really more consistent. >>>> > > >>>> > > Best regards, >>>> > > Goetz >>>> > > >>>> > >> -----Original Message----- >>>> > >> From: Dmitry Samersoff [mailto:dmitry.samers...@oracle.com >>>> <mailto:dmitry.samers...@oracle.com>] >>>> > >> Sent: Donnerstag, 12. November 2015 09:22 >>>> > >> To: David Holmes; Lindenmaier, Goetz; hotspot-runtime- >>>> > >> d...@openjdk.java.net <mailto:d...@openjdk.java.net>; >>>> serviceability-dev >>>> > >> Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM >>>> > >> >>>> > >> David, >>>> > >> >>>> > >> I think it's better to use NSIG (without underscore) defined >>>> in signal.h >>>> > >> >>>> > >> -Dmitry >>>> > >> >>>> > >> >>>> > >> On 2015-11-12 10:35, David Holmes wrote: >>>> > >>> Hi Goetz, >>>> > >>> >>>> > >>> Adding in serviceability-dev >>>> > >>> >>>> > >>> On 9/11/2015 6:22 PM, Lindenmaier, Goetz wrote: >>>> > >>>> Hi, >>>> > >>>> >>>> > >>>> The environment variable _JAVA_SR_SIGNUM can be set to a >>>> signal >>>> > >> number >>>> > >>>> do be used by the JVM's suspend/resume mechanism. >>>> > >>>> >>>> > >>>> If set, a signal handler is installed and the current signal >>>> handler >>>> > >>>> is saved to an array. >>>> > >>>> On linux, this array had size MAXSIGNUM=32, and >>>> _JAVA_SR_SIGNUM >>>> > >> was >>>> > >>>> allowed >>>> > >>>> to range up to _NSIG=65. This could cause memory corruption. >>>> > >>>> >>>> > >>>> Further, in jsig.c, an unsinged int is used to set a bit for >>>> signals. >>>> > >>>> This also >>>> > >>>> is too small, as only 32 signals can be supported. >>>> Further, the >>>> > >>>> signals are mapped >>>> > >>>> wrong to these bits. '0' is not a valid signal, but '32' >>>> was. 1<<32 >>>> > >>>> happens to map to >>>> > >>>> zero, so the signal could be stored, but this probably was >>>> not >>>> > >>>> intended that way. >>>> > >>>> >>>> > >>>> This change increases MAXSIGNUM to 65 on linux, and to 64 on >>>> aix. It >>>> > >>>> introduces >>>> > >>>> proper checking of the signal read from the env var, and >>>> issues a >>>> > >>>> warning if it >>>> > >>>> does not use the signal set. It adapts the data types in >>>> jisig.c >>>> > >>>> properly. >>>> > >>>> >>>> > >>>> Please review this change. I please need a sponsor. >>>> > >>>> >>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.00 >>>> > >>> >>>> > >>> This all sounds very good to me. (I must find out why Solaris >>>> is not >>>> > >>> involved here :) ). >>>> > >>> >>>> > >>> On Linux you didn't add the bounds check to >>>> os::Linux::set_our_sigflags. >>>> > >>> >>>> > >>> I'm also wondering about documenting where we are >>>> determining the >>>> > >>> maximum from? Is it simply _NSIG on some/all distributions? >>>> And I see >>>> > >>> _NSIG is supposed to be the biggest signal number + one. Also >>>> linux >>>> > >>> defines NSIG = _NSIG so which should we be using? >>>> > >>> >>>> > >>> Thanks, >>>> > >>> David >>>> > >>> >>>> > >>>> Best regards, >>>> > >>>> Goetz. >>>> > >>>> >>>> > >> >>>> > >> >>>> > >> -- >>>> > >> Dmitry Samersoff >>>> > >> Oracle Java development team, Saint Petersburg, Russia >>>> > >> * I would love to change the world, but they won't give me the >>>> sources. >>>> > >>>> > >>>> > -- >>>> > Dmitry Samersoff >>>> > Oracle Java development team, Saint Petersburg, Russia >>>> > * I would love to change the world, but they won't give me the >>>> sources. >>>> >>>> > > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources.