On 16/11/2015 7:45 PM, Thomas Stüfe wrote:
Hi David,

On Mon, Nov 16, 2015 at 5:03 AM, David Holmes <david.hol...@oracle.com
<mailto:david.hol...@oracle.com>> wrote:

    On 13/11/2015 11:38 PM, David Holmes wrote:

        On 13/11/2015 7:53 PM, Thomas Stüfe wrote:

            Hi Goetz,

            sorry for not looking at this earlier. This is a nice
            cleanup. Some
            remarks:

            
http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html


            +    if (sig > MAX2(SIGSEGV, SIGBUS) &&  // See 4355769.
            +        sig < NSIG) {                   // Must be legal
            signal and fit
            into sigflags[].

            I do not like much the MAX2() construct. I would like it
            better to
            explicitly check whether the SR signal is one of the
            "forbidden" ones
            the VM uses.


        I must confess I had not looked into 4355769 but this check
        seems rather
        spurious. It is not at all clear to me what signals could be
        used here -


    Okay should have looked into 4355769. The problem is how multiple
    pending signals are handled. It seems that in the past (no idea if
    still true) pending signals were handled in signal-number order
    (lowest first), not FIFO. The problem scenario is this:
    - thread accesses a null pointer in compiled Java code and the SEGV
    handler will cause NPE to be thrown
    - at the same time as the SEGV is being raised the thread is also
    hit with the SR signal to suspend it.
    - the SR signal will be delivered first and the SR handler starts to
    run - with signals unblocked.
    - the SEGV then gets delivered to the thread in the SR handler, and
    the regular signal handler is run
    - the regular signal handler tries to detect if we're running in
    Java code so it can post the NPE, but the presence of the SR handler
    causes that check to fail - so we abort thinking it is a real SEGV.

    I don't know how much of that is still true today. It seems strange
    to me that a kill based directed signal can usurp a synchronous signal.

    Anyway the fix, rather workaround, for that problem, was to ensure
    that the SR_signum is greater than any potential synchronous signal
    the VM cares about. Why SIGBUS was included there I don't know give
    that:

    a) it is already a lower signal number than SIGUSR1, SIGSEGV and SIGUSR2
    b) we don't deliberately generate and use SIGBUS ... though perhaps
    unsafe-fetch needs to be considered.

    A better fix in my opinion, and as mentioned in the bug, would have
    been to disable delivery of SEGV whilst the SR handler is executing.
    But we start to touch on some grey areas of the POSIX spec there,
    and likely the implementation too.


How would this work? I think the process just dies immediately if an
synchronous signal occurs while being blocked. At least that is why we
needed to fix https://bugs.openjdk.java.net/browse/JDK-8065895.

POSIX specifies that it is undefined if such a signal is blocked - hence my reference to grey areas of the spec and implementation. It looks like once upon a time it didn't cause the process to die immediately, but perhaps now it does. If so we can't block it. But it may also be that the original delivery order problem is not possible as well. Hard to know - hence my suggestion to just leave it as is, at least for now.

Cheers,
David

Kind Regards, Thomas


    So I suggest that for this cleanup we simply leave this logic
    exactly as is.

    Thanks,
    David


        other than SIGUSR1 or SIGUSR2 (if -Xrs is specified), or else a
        real-time signal (modulo discussion below). Hijacking anything else
        seems rather suspect.

            Maybe keep a mask defined centrally for each platform which
            contains
            signals the VM needs for itself ?


        Such masks already exist.

            +sigset_t os::Aix::sigs = { 0 };

            I would not initialize the signal set this way. sigset_t is
            an opaque
            type; the only way to initialize it is with one of
            sigemptyset() or
            sigfillset().


        Good catch - I overlooked that.

            
http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html


            +  static struct sigaction sigact[NSIG]; // saved
            preinstalled sigactions
            +  static sigset_t sigs;                 // mask of signals
            that have

            +  static int sigflags[NSIG];

            I know this is not in the scope of your change, but I would
            like to see
            those removed from os::Aix and put into os_aix.cpp at static
            filescope.
            There is no need at all to export those, and you would get
            rid of the
            signal.h dependency you know have when including os_aix.hpp.


            
http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/jsig.c.udiff.html

            
http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/os_bsd.cpp.udiff.html


            On BSD, we have realtime signals.

            http://fxr.watson.org/fxr/source/sys/signal.h
            #define SIGRTMAX 126
            and NSIG does not contain them:
            #define NSIG 32

            The max. possible signal number would be 126, which
            unfortunately does
            not even fit into a 64bit mask.


        So this simply limits the signal choice to not be a real-time
        signal -
        same as today.

            So the code in jsig.c is broken for the case that someone
            wants to
            register realtime signals, if the VM were to ever use
            realtime signals
            itself, which now is not the case.

            The same is true for os_bsd.cpp, where signal chaining will
            not work if
            the application did have handler for real time signals
            pre-installed
            before jvm is loaded.


        Chaining is only used when the JVM will catch signals. Aren't
        all the
        real-time signals going to be blocked by the VM by default and so
        chaining is not needed as no handler will exist in the VM ??
        (Unless a
        real-time signal is supplied for SR_signum)

        I must admit I don't know if any of this code actually works for
        real-time signals.

            Solaris:

            The only platform where NSIG is missing?

            Here, we calculate the max. signal number dynamically in
            os_solaris.cpp,
            presumably because SIGRTMAX is not a constant and can be
            changed using
            system configuration. But then, on Linux we have the same
            situation
            (SIGRTMAX is dynamic) and there we do not go through the
            trouble of
            calculating the max. signal number dynamically. Instead we
            just use
            NSIG=64 and rely on the fact that NSIG is larger than the
            largest
            possible dynamic value for SIGRTMAX.


        Linux ensures that _NSIG (and thus NSIG) includes all the real-time
        signals. But libc can expose a subset and steal some for its own
        use.

            Solaris does not seem to have NSIG defined, but I am sure
            there is also
            a max. possible value for SIGRTMAX (the default seems to be
            48). So, one
            could probably safely define NSIG for Solaris too, so that
            we have NSIG
            defined on all Posix platforms.


        Solaris doesn't have any of this SR_signum related code. A more
        general
        cleanup of signal related code would potentially involve a lot
        of cleanup.

        David
        -----


            On Thu, Nov 12, 2015 at 8:24 PM, Lindenmaier, Goetz
            <goetz.lindenma...@sap.com
            <mailto:goetz.lindenma...@sap.com>
            <mailto:goetz.lindenma...@sap.com
            <mailto:goetz.lindenma...@sap.com>>> wrote:

                 Hi David, Dmitry,

                 I've come up with a new webrev:
            http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/

                 I hit on some more issues:
                   - As proposed, I replaced MAXSIGNUM by NSIG
                   - On AIX, NSIG=255.  Therefore storing bits in a word
            does not
            work.
                      I'm now using bitset functionality from signal.h
            as it's done
                 in other places.
                     sigset_t is >> NSIG on linux, so it's no good idea
            to use it
            there.


            Why do we not do this on all platforms, provided sigset_t
            contains all
            signals (incl. realtime signals) ?

                   - In the os files I found another bit vector that now
            is too
                 small: sigs.
                     I adapted that, too.  Removed the dead declaration
            of this on
                 solaris.

                 Best regards,
                    Goetz.



            Kind Regards, Thomas

                 > -----Original Message-----
                 > From: Dmitry Samersoff
            [mailto:dmitry.samers...@oracle.com
            <mailto:dmitry.samers...@oracle.com>
            <mailto:dmitry.samers...@oracle.com
            <mailto:dmitry.samers...@oracle.com>>]
                  > Sent: Donnerstag, 12. November 2015 10:05
                  > To: Lindenmaier, Goetz; David Holmes; hotspot-runtime-
                  > d...@openjdk.java.net <mailto:d...@openjdk.java.net>
            <mailto:d...@openjdk.java.net <mailto:d...@openjdk.java.net>>;
                 serviceability-dev
                  > Subject: Re: RFR(M): 8141529: Fix handling of
            _JAVA_SR_SIGNUM
                  >
                  > Goetz,
                  >
                  > *BSD including OS X also defines NSIG (just checked)
            and if my
                 memory is
                  > not bogus, AIX defines it too.
                  >
                  > So you may consider to use NSIG on all platform.
                  >
                  > -Dmitry
                  >
                  > On 2015-11-12 11:36, Lindenmaier, Goetz wrote:
                  > > OK I'll change it to NSIG.  That's used in other
            places in
                 os_linux, too.
                  > > So it's really more consistent.
                  > >
                  > > Best regards,
                  > >   Goetz
                  > >
                  > >> -----Original Message-----
                  > >> From: Dmitry Samersoff
            [mailto:dmitry.samers...@oracle.com
            <mailto:dmitry.samers...@oracle.com>
                 <mailto:dmitry.samers...@oracle.com
            <mailto:dmitry.samers...@oracle.com>>]
                  > >> Sent: Donnerstag, 12. November 2015 09:22
                  > >> To: David Holmes; Lindenmaier, Goetz;
            hotspot-runtime-
                  > >> d...@openjdk.java.net
            <mailto:d...@openjdk.java.net> <mailto:d...@openjdk.java.net
            <mailto:d...@openjdk.java.net>>;
                 serviceability-dev
                  > >> Subject: Re: RFR(M): 8141529: Fix handling of
            _JAVA_SR_SIGNUM
                  > >>
                  > >> David,
                  > >>
                  > >> I think it's better to use NSIG (without
            underscore) defined
                 in signal.h
                  > >>
                  > >> -Dmitry
                  > >>
                  > >>
                  > >> On 2015-11-12 10:35, David Holmes wrote:
                  > >>> Hi Goetz,
                  > >>>
                  > >>> Adding in serviceability-dev
                  > >>>
                  > >>> On 9/11/2015 6:22 PM, Lindenmaier, Goetz wrote:
                  > >>>> Hi,
                  > >>>>
                  > >>>> The environment variable _JAVA_SR_SIGNUM can be
            set to a
            signal
                  > >> number
                  > >>>> do be used by the JVM's suspend/resume mechanism.
                  > >>>>
                  > >>>> If set, a signal handler is installed and the
            current signal
                 handler
                  > >>>> is saved to an array.
                  > >>>> On linux, this array had size MAXSIGNUM=32, and
            _JAVA_SR_SIGNUM
                  > >> was
                  > >>>> allowed
                  > >>>> to range up to _NSIG=65. This could cause
            memory corruption.
                  > >>>>
                  > >>>> Further, in jsig.c, an unsinged int is used to
            set a bit for
                 signals.
                  > >>>> This also
                  > >>>> is too small, as only 32 signals can be supported.
            Further, the
                  > >>>> signals are mapped
                  > >>>> wrong to these bits.  '0' is not a valid
            signal, but '32'
                 was.  1<<32
                  > >>>> happens to map to
                  > >>>> zero, so the signal could be stored, but this
            probably was
            not
                  > >>>> intended that way.
                  > >>>>
                  > >>>> This change increases MAXSIGNUM to 65 on linux,
            and to 64 on
                 aix. It
                  > >>>> introduces
                  > >>>> proper checking of the signal read from the env
            var, and
                 issues a
                  > >>>> warning if it
                  > >>>> does not use the signal set.  It adapts the
            data types in
                 jisig.c
                  > >>>> properly.
                  > >>>>
                  > >>>> Please review this change.  I please need a
            sponsor.
                  > >>>>
            http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.00
                  > >>>
                  > >>> This all sounds very good to me. (I must find
            out why Solaris
                 is not
                  > >>> involved here :) ).
                  > >>>
                  > >>> On Linux you didn't add the bounds check to
                 os::Linux::set_our_sigflags.
                  > >>>
                  > >>> I'm also wondering about documenting where we are
            determining the
                  > >>> maximum from? Is it simply _NSIG on some/all
            distributions?
                 And I see
                  > >>> _NSIG is supposed to be the biggest signal
            number + one. Also
                 linux
                  > >>> defines NSIG = _NSIG so which should we be using?
                  > >>>
                  > >>> Thanks,
                  > >>> David
                  > >>>
                  > >>>> Best regards,
                  > >>>>    Goetz.
                  > >>>>
                  > >>
                  > >>
                  > >> --
                  > >> Dmitry Samersoff
                  > >> Oracle Java development team, Saint Petersburg,
            Russia
                  > >> * I would love to change the world, but they
            won't give me the
                 sources.
                  >
                  >
                  > --
                  > Dmitry Samersoff
                  > Oracle Java development team, Saint Petersburg, Russia
                  > * I would love to change the world, but they won't
            give me the
                 sources.



Reply via email to