Re: RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand

Olivier Lagneau Fri, 12 Jul 2013 06:25:32 -0700

Hi Peter,

Don't have time to review it, but just noticed that Copyright headersare not updated (2013).


My 0.005 cents help,
Olivier.

Peter Allwin said  on date 7/11/2013 4:14 PM:

Hello,

Thank you everyone for the feedback, I've incorporated the recommendations
into a new revision:

http://cr.openjdk.java.net/~allwin/7162400/webrev.02/

Changes:
  - Fixed speling misteaks
  - Added jtreg regression test using Mikael's excellent suggestion of
-XX:+PauseAtStartup, tested locally on linux and solaris.
  - Reverted use of MAX_PATH+1 vs. UNIX_MAX_PATH


Also thanks to Christian Törnqvist for helping out with the jtreg test!

/peter

-----Original Message-----
From: Mikael Gerdin [mailto:mikael.ger...@oracle.com]
Sent: Tuesday, July 9, 2013 7:13 PM
To: Peter Allwin
Cc: serviceability-dev@openjdk.java.net; hotspot-runtime-
d...@openjdk.java.net
Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file

number

during HotSpotVirtualMachine.executeCommand

On 07/09/2013 05:25 PM, Peter Allwin wrote:

Mikael,

That's a good point, unfortunately attach uses os::get_temp_directory
which is hardcoded to use /tmp. We could add a whitebox API to allow
us to override this but now we're on the border to noreg-hard land again

IMO.

Any other opinions on this?

Can you use the "-XX:+PauseAtStartup" vm flag it will create a
vm.paused.<pid>  file in the current work directory. You could extract the

pid,

touch the correct attach file in /tmp and then remove the vm.paused to let
the VM resume.

I didn't check if PauseAtStartup stops the VM early enough though.

An alternate, even more hacky approach is to do something like (in bash):
(bash -c 'echo $$; touch /tmp/.java_pid$$; exec java -version') Where you
can extract the pid of the subshell process with $$ and then exec into the
java launcher and keep the same pid (at least on Linux, unsure about the
Solaris launcher).

/Mikael


Thanks!

/peter

-----Original Message-----
From: Mikael Gerdin [mailto:mikael.ger...@oracle.com]
Sent: Tuesday, July 9, 2013 2:49 PM
To: Peter Allwin
Cc: serguei.spit...@oracle.com; daniel.daughe...@oracle.com;
serviceability-dev@openjdk.java.net; hotspot-runtime-
d...@openjdk.java.net
Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file

number

during HotSpotVirtualMachine.executeCommand

Peter,

On 2013-07-09 14:25, Peter Allwin wrote:

Hello!

It is reproducible by letting the test create .java_pid* files for
all possible process id's on the system, setting correct access
flags, launching the target VM and attempting to connect. There are
some caveats though but it should be doable.

I'll convert the repro script to JTREG and add it to the webrev.

It's probably not a good idea to have a test which taints the system
with

stale

.java_pid* files.
If the test execution times out and the script isn't allowed to clean
up I imagine that other subsequent executions could fail.
Is there a way to tell the attach api to use a specific directory so
you

won't

need to taint /tmp?

/Mikael

Thanks for the reviews!

/peter

*From:*serguei.spit...@oracle.com
[mailto:serguei.spit...@oracle.com]
*Sent:* Tuesday, July 9, 2013 1:26 AM
*To:* daniel.daughe...@oracle.com
*Cc:* Peter Allwin; serviceability-dev@openjdk.java.net;
hotspot-runtime-...@openjdk.java.net
*Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad
file number during HotSpotVirtualMachine.executeCommand

Ok, thanks!

Peter, did you manage to reproduce this issue with your script?
If so, then, please, include it into the bug report and remove the
"noreg-sqe" label.

It is Ok if you did not reproduce it, though.

Thanks,
Serguei


On 7/8/13 4:20 PM, Daniel D. Daugherty wrote:

      I definitely don't insist... :-)

      BTW, I noticed this in Peter's e-mail:

      >  Testing:
      >  JPRT, reproducing script on Solaris, Linux.

      so maybe Peter already has this covered with "reproducing

script"...

      Dan

      On 7/8/13 5:07 PM, serguei.spit...@oracle.com
      <mailto:serguei.spit...@oracle.com>  wrote:

          Dan,

          Dan, thank you for the recommendation.
          But I'm still not sure it is a right thing to do.
          Even though, there are multiple test cases associated with

this

          bug they
          can not be used to verify that fix because an additional

condition

          must be present as well.
          This condition is a presence of stale door file which is not
          that easy to reproduce.

          However, if you insist then I can change the lable to the
          "noreg-sqe"
          with the corresponding comment.

          Thanks,
          Serguei


          On 7/8/13 3:46 PM, Daniel D. Daugherty wrote:

              Serguei,

              There are a number of existing tests associated with this
              bug. I don't
              think that 'noreg-hard' is the right label. I think
              'noreg-sqe' is
              the right one:

              noreg-sqe
                   Change can be verified by running an existing SQE

test

              suite; the bug
                   should identify the suite and the specific test

case(s).

              Dan

              On 7/8/13 12:59 PM, serguei.spit...@oracle.com
              <mailto:serguei.spit...@oracle.com>  wrote:

                  Peter,

                  I've added the label "noreg-hard" with the comment to
                  the report.
                  It is not easy to reproduce the issue and demonstrate
                  the fix in a regression test.

                  Thanks,
                  Serguei


                  On 7/8/13 11:36 AM, serguei.spit...@oracle.com
                  <mailto:serguei.spit...@oracle.com>  wrote:

                      Hi Peter,

                      The fix looks good.

                      Thanks,
                      Serguei

                      On 7/8/13 6:54 AM, Peter Allwin wrote:

                          Hello!

                          Looking for reviews of this change:

http://cr.openjdk.java.net/~allwin/7162400/webrev.01/

<http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/>

                          For CR:


http://bugs.sun.com/view_bug.do?bug_id=7162400


https://jbs.oracle.com/bugs/browse/JDK-7162400

                          Summary:

                          This change addresses an issue in the Attach

API

                          on Solaris, Linux and BSD where an attaching
                          application can receive IOExceptions such as
                          "Bad file number" (Solaris), "Connection
                          refused" (Linux/BSD), or "well-known file is

not

                          secure".

                          The attach process uses a file in the

temporary

                          directory as a door (Solaris) or domain

socket

                          (Linux,BSD) to communicate with the VM. In
                          certain circumstances stale files can be left

in

                          the file system which can cause the attaching
                          application to believe that the VM is ready

to

                          receive a connection when it's not. With this
                          change the stale file will be removed during

VM

                          startup.

                          Note that there is still an issue if we don't
                          have permission to remove the stale file, the
                          attaching process will fail to connect.

                          Testing:

                          JPRT, reproducing script on Solaris, Linux.

                          Credits:

                          Thanks to Staffan Larsen who worked on this
                          issue with me.

                          Regards,


                          Peter

Re: RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand

Reply via email to