I'll add an assert on the return value and fix the if-formatting before the push if that's OK.
Thanks again for the reviews! /peter > -----Original Message----- > From: David Holmes [mailto:david.hol...@oracle.com] > Sent: Friday, July 12, 2013 4:31 AM > To: Peter Allwin > Cc: Dmitry Samersoff; serviceability-dev@openjdk.java.net; hotspot- > runtime-...@openjdk.java.net > Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file number > during HotSpotVirtualMachine.executeCommand > > On 12/07/2013 1:17 AM, Dmitry Samersoff wrote: > > Peter, > > > > As UNIX_PATH_MAX is sort (108 bytes) we *must* check return value of > > snprintf (e.g. ll.446 of attachListener_linux.cpp) and make sure it's > > less or equal UNIX_PATH_MAX, otherwise we unlink wrong file in case of > > UNIX_PATH_MAX overflow. > > As all the parts of the path are fixed in length I think an assert would suffice. > > Also minor style nit: > > if(ret > > needs space after "if". > > Thanks, > David > > > -Dmitry > > > > > > On 2013-07-11 18:14, Peter Allwin wrote: > >> Hello, > >> > >> Thank you everyone for the feedback, I've incorporated the > >> recommendations into a new revision: > >> > >> http://cr.openjdk.java.net/~allwin/7162400/webrev.02/ > >> > >> Changes: > >> - Fixed speling misteaks > >> - Added jtreg regression test using Mikael's excellent suggestion > >> of -XX:+PauseAtStartup, tested locally on linux and solaris. > >> - Reverted use of MAX_PATH+1 vs. UNIX_MAX_PATH > >> > >> > >> Also thanks to Christian Törnqvist for helping out with the jtreg test! > >> > >> /peter > >> > >>> -----Original Message----- > >>> From: Mikael Gerdin [mailto:mikael.ger...@oracle.com] > >>> Sent: Tuesday, July 9, 2013 7:13 PM > >>> To: Peter Allwin > >>> Cc: serviceability-dev@openjdk.java.net; hotspot-runtime- > >>> d...@openjdk.java.net > >>> Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file > >> number > >>> during HotSpotVirtualMachine.executeCommand > >>> > >>> On 07/09/2013 05:25 PM, Peter Allwin wrote: > >>>> Mikael, > >>>> > >>>> That's a good point, unfortunately attach uses > >>>> os::get_temp_directory which is hardcoded to use /tmp. We could add > >>>> a whitebox API to allow us to override this but now we're on the > >>>> border to noreg-hard land again > >>> IMO. > >>>> > >>>> Any other opinions on this? > >>> > >>> Can you use the "-XX:+PauseAtStartup" vm flag it will create a > >>> vm.paused.<pid> file in the current work directory. You could > >>> extract the > >> pid, > >>> touch the correct attach file in /tmp and then remove the vm.paused > >>> to let the VM resume. > >>> > >>> I didn't check if PauseAtStartup stops the VM early enough though. > >>> > >>> An alternate, even more hacky approach is to do something like (in > bash): > >>> (bash -c 'echo $$; touch /tmp/.java_pid$$; exec java -version') > >>> Where you can extract the pid of the subshell process with $$ and > >>> then exec into the java launcher and keep the same pid (at least on > >>> Linux, unsure about the Solaris launcher). > >>> > >>> /Mikael > >>> > >>>> > >>>> > >>>> Thanks! > >>>> > >>>> /peter > >>>> > >>>>> -----Original Message----- > >>>>> From: Mikael Gerdin [mailto:mikael.ger...@oracle.com] > >>>>> Sent: Tuesday, July 9, 2013 2:49 PM > >>>>> To: Peter Allwin > >>>>> Cc: serguei.spit...@oracle.com; daniel.daughe...@oracle.com; > >>>>> serviceability-dev@openjdk.java.net; hotspot-runtime- > >>>>> d...@openjdk.java.net > >>>>> Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad > >>>>> file > >>>> number > >>>>> during HotSpotVirtualMachine.executeCommand > >>>>> > >>>>> Peter, > >>>>> > >>>>> On 2013-07-09 14:25, Peter Allwin wrote: > >>>>>> Hello! > >>>>>> > >>>>>> It is reproducible by letting the test create .java_pid* files > >>>>>> for all possible process id's on the system, setting correct > >>>>>> access flags, launching the target VM and attempting to connect. > >>>>>> There are some caveats though but it should be doable. > >>>>>> > >>>>>> I'll convert the repro script to JTREG and add it to the webrev. > >>>>> > >>>>> It's probably not a good idea to have a test which taints the > >>>>> system with > >>>> stale > >>>>> .java_pid* files. > >>>>> If the test execution times out and the script isn't allowed to > >>>>> clean up I imagine that other subsequent executions could fail. > >>>>> Is there a way to tell the attach api to use a specific directory > >>>>> so you > >>>> won't > >>>>> need to taint /tmp? > >>>>> > >>>>> /Mikael > >>>>> > >>>>>> > >>>>>> Thanks for the reviews! > >>>>>> > >>>>>> /peter > >>>>>> > >>>>>> *From:*serguei.spit...@oracle.com > >>>>>> [mailto:serguei.spit...@oracle.com] > >>>>>> *Sent:* Tuesday, July 9, 2013 1:26 AM > >>>>>> *To:* daniel.daughe...@oracle.com > >>>>>> *Cc:* Peter Allwin; serviceability-dev@openjdk.java.net; > >>>>>> hotspot-runtime-...@openjdk.java.net > >>>>>> *Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad > >>>>>> file number during HotSpotVirtualMachine.executeCommand > >>>>>> > >>>>>> Ok, thanks! > >>>>>> > >>>>>> Peter, did you manage to reproduce this issue with your script? > >>>>>> If so, then, please, include it into the bug report and remove > >>>>>> the "noreg-sqe" label. > >>>>>> > >>>>>> It is Ok if you did not reproduce it, though. > >>>>>> > >>>>>> Thanks, > >>>>>> Serguei > >>>>>> > >>>>>> > >>>>>> On 7/8/13 4:20 PM, Daniel D. Daugherty wrote: > >>>>>> > >>>>>> I definitely don't insist... :-) > >>>>>> > >>>>>> BTW, I noticed this in Peter's e-mail: > >>>>>> > >>>>>> > Testing: > >>>>>> > JPRT, reproducing script on Solaris, Linux. > >>>>>> > >>>>>> so maybe Peter already has this covered with "reproducing > >> script"... > >>>>>> > >>>>>> Dan > >>>>>> > >>>>>> On 7/8/13 5:07 PM, serguei.spit...@oracle.com > >>>>>> <mailto:serguei.spit...@oracle.com> wrote: > >>>>>> > >>>>>> Dan, > >>>>>> > >>>>>> Dan, thank you for the recommendation. > >>>>>> But I'm still not sure it is a right thing to do. > >>>>>> Even though, there are multiple test cases associated > >>>>>> with > >> this > >>>>>> bug they > >>>>>> can not be used to verify that fix because an > >>>>>> additional > >>>> condition > >>>>>> must be present as well. > >>>>>> This condition is a presence of stale door file which is not > >>>>>> that easy to reproduce. > >>>>>> > >>>>>> However, if you insist then I can change the lable to the > >>>>>> "noreg-sqe" > >>>>>> with the corresponding comment. > >>>>>> > >>>>>> Thanks, > >>>>>> Serguei > >>>>>> > >>>>>> > >>>>>> On 7/8/13 3:46 PM, Daniel D. Daugherty wrote: > >>>>>> > >>>>>> Serguei, > >>>>>> > >>>>>> There are a number of existing tests associated with this > >>>>>> bug. I don't > >>>>>> think that 'noreg-hard' is the right label. I think > >>>>>> 'noreg-sqe' is > >>>>>> the right one: > >>>>>> > >>>>>> noreg-sqe > >>>>>> Change can be verified by running an existing > >>>>>> SQE > >> test > >>>>>> suite; the bug > >>>>>> should identify the suite and the specific > >>>>>> test > >>>> case(s). > >>>>>> > >>>>>> Dan > >>>>>> > >>>>>> On 7/8/13 12:59 PM, serguei.spit...@oracle.com > >>>>>> <mailto:serguei.spit...@oracle.com> wrote: > >>>>>> > >>>>>> Peter, > >>>>>> > >>>>>> I've added the label "noreg-hard" with the comment to > >>>>>> the report. > >>>>>> It is not easy to reproduce the issue and demonstrate > >>>>>> the fix in a regression test. > >>>>>> > >>>>>> Thanks, > >>>>>> Serguei > >>>>>> > >>>>>> > >>>>>> On 7/8/13 11:36 AM, serguei.spit...@oracle.com > >>>>>> <mailto:serguei.spit...@oracle.com> wrote: > >>>>>> > >>>>>> Hi Peter, > >>>>>> > >>>>>> The fix looks good. > >>>>>> > >>>>>> Thanks, > >>>>>> Serguei > >>>>>> > >>>>>> On 7/8/13 6:54 AM, Peter Allwin wrote: > >>>>>> > >>>>>> Hello! > >>>>>> > >>>>>> Looking for reviews of this change: > >>>>>> > >>>>>> > >>>> http://cr.openjdk.java.net/~allwin/7162400/webrev.01/ > >>>>>> > >>>>>> <http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/> > >>>>>> > >>>>>> For CR: > >>>>>> > >>>>>> > >>>>>> http://bugs.sun.com/view_bug.do?bug_id=7162400 > >>>>>> > >>>>>> > >>>>>> https://jbs.oracle.com/bugs/browse/JDK-7162400 > >>>>>> > >>>>>> Summary: > >>>>>> > >>>>>> This change addresses an issue in the > >>>>>> Attach > >> API > >>>>>> on Solaris, Linux and BSD where an attaching > >>>>>> application can receive IOExceptions such as > >>>>>> "Bad file number" (Solaris), "Connection > >>>>>> refused" (Linux/BSD), or "well-known > >>>>>> file is > >> not > >>>>>> secure". > >>>>>> > >>>>>> The attach process uses a file in the > >> temporary > >>>>>> directory as a door (Solaris) or domain > >> socket > >>>>>> (Linux,BSD) to communicate with the VM. In > >>>>>> certain circumstances stale files can > >>>>>> be left > >> in > >>>>>> the file system which can cause the attaching > >>>>>> application to believe that the VM is > >>>>>> ready > >> to > >>>>>> receive a connection when it's not. With this > >>>>>> change the stale file will be removed > >>>>>> during > >> VM > >>>>>> startup. > >>>>>> > >>>>>> Note that there is still an issue if we don't > >>>>>> have permission to remove the stale file, the > >>>>>> attaching process will fail to connect. > >>>>>> > >>>>>> Testing: > >>>>>> > >>>>>> JPRT, reproducing script on Solaris, Linux. > >>>>>> > >>>>>> Credits: > >>>>>> > >>>>>> Thanks to Staffan Larsen who worked on this > >>>>>> issue with me. > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> > >>>>>> Peter > >>>>>> > >>>> > >> > >> > > > >