Re: (11) RFR (XS): 8205966: [testbug] New Nestmates JDI test times out with Xcomp on sparc
On 6/07/2018 3:29 PM, Mikael Vidstedt wrote: Looks good. Nice speedup! Thanks for looking at it Mikael! Still the second longest test in com/sun/jdi at 21 minutes!!! David Cheers, Mikael On Jul 5, 2018, at 10:21 PM, David Holmes wrote: Bug: https://bugs.openjdk.java.net/browse/JDK-8205966 webrev: http://cr.openjdk.java.net/~dholmes/8205966/webrev/ One of the @run variants was taking around 15x longer to execute. That variant uses the InMemoryJavaCompiler which involves a lot of classes and code execution. The test was enabling method entry event generation for all of main, resulting in the massive slowdown. The fix is to add a new breakpoint() function that gets called after the in-memory compilation setup is done, and we initially run the test to that point before enabling the events. The problem @run now only takes 2x the other tests and so should avoid the timeouts. Testing: mach5 tier4 solaris-sparc mach5 tier 1-3 Thanks, David
Re: (11) RFR (XS): 8205966: [testbug] New Nestmates JDI test times out with Xcomp on sparc
Looks good. Nice speedup! Cheers, Mikael > On Jul 5, 2018, at 10:21 PM, David Holmes wrote: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8205966 > webrev: http://cr.openjdk.java.net/~dholmes/8205966/webrev/ > > One of the @run variants was taking around 15x longer to execute. That > variant uses the InMemoryJavaCompiler which involves a lot of classes and > code execution. The test was enabling method entry event generation for all > of main, resulting in the massive slowdown. > > The fix is to add a new breakpoint() function that gets called after the > in-memory compilation setup is done, and we initially run the test to that > point before enabling the events. > > The problem @run now only takes 2x the other tests and so should avoid the > timeouts. > > Testing: mach5 tier4 solaris-sparc > mach5 tier 1-3 > > Thanks, > David
(11) RFR (XS): 8205966: [testbug] New Nestmates JDI test times out with Xcomp on sparc
Bug: https://bugs.openjdk.java.net/browse/JDK-8205966 webrev: http://cr.openjdk.java.net/~dholmes/8205966/webrev/ One of the @run variants was taking around 15x longer to execute. That variant uses the InMemoryJavaCompiler which involves a lot of classes and code execution. The test was enabling method entry event generation for all of main, resulting in the massive slowdown. The fix is to add a new breakpoint() function that gets called after the in-memory compilation setup is done, and we initially run the test to that point before enabling the events. The problem @run now only takes 2x the other tests and so should avoid the timeouts. Testing: mach5 tier4 solaris-sparc mach5 tier 1-3 Thanks, David
Re: RFR (S) 8205608: Fix 'frames()' in ThreadReferenceImpl.c to prevent quadratic runtime behavior
On 7/5/18 19:17, serguei.spit...@oracle.com wrote: Hi Ralf, I have some questions. http://cr.openjdk.java.net/~simonis/webrevs/2018/8205608/src/jdk.jdwp.agent/share/native/libjdwp/ThreadReferenceImpl.c.frames.html 288 if (length != count) { 289 error = JVMTI_ERROR_INTERNAL; 290 } The above looks as incorrect because it does not account for the startIndex value. Also, if the condition is true then it should skip the loop at the lines 292-303. 292 for (fnum = 0; (fnum < count) && (error == JVMTI_ERROR_NONE); ++fnum) { 293 WITH_LOCAL_REFS(env, 1) { 294 jclass clazz; 295 error = methodClass(frames[fnum].method, &clazz); 296 297 if (error == JVMTI_ERROR_NONE) { 298 FrameID frame = createFrameID(thread, fnum + startIndex); 299 outStream_writeFrameID(out, frame); 300 writeCodeLocation(out, clazz, frames[fnum].method, frames[fnum].location); 301 } 302 } END_WITH_LOCAL_REFS(env); 303 } It seems, there has to be a break in the loop at lines 292-303 above if the condition at the line 297 is not true. Please, find a similarity in the original code lines 303-304: 303 if (error != JVMTI_ERROR_NONE) 304 break; Also, it looks as incorrect to send the packages if the such an error was identified. But this issue exists in the original implementation as well. Please, skip the entire last comment as incorrect. (Thanks to David Holmes for the catch!) I've overlooked the error check in the loop header. The part about lines 303-304 is incorrect too. Thanks, Serguei On 7/5/18 15:37, Chris Plummer wrote: Hi Ralf, Overall looks good, but I do have a few comments and questions. Please update the copyright. What testing have you done? +1 How long does this test take to run. What happens if for some reason SOE is never thrown? It's not clear to me what the script would do in this case. In answer to the ShellScaffold.sh question, there is already work underway to convert to pure java tests. See JDK-8201652. I'm not certain if it is ok for you to just submit this new shell script, or if should be rewritten in pure java. Most of the work to convert the scripts has already been done but was put on hold. Maybe Serguei can comment and guide you on how it would be done in java. The test has to be the TestScaffold based. You can find many examples in the jdk/com/sun/jdi folder. Thanks, Serguei thanks, Chris On 7/3/18 3:43 AM, Schmelter, Ralf wrote: Hi All, Please review the fix for the bughttps://bugs.openjdk.java.net/browse/JDK-8205608 . The webref is athttp://cr.openjdk.java.net/~simonis/webrevs/2018/8205608/ . This fixes the quadratic runtime (in the number of frames) of the frames() method, making it linear instead. It uses additional memory proportional to the number of frames on the stack. I've included a jtreg test, which would time out in the old implementation (since it takes minutes to get the stack frames). I'm not sure how useful this is. Best regards, Ralf Schmelter
Re: RFR (S) 8205608: Fix 'frames()' in ThreadReferenceImpl.c to prevent quadratic runtime behavior
Hi Ralf, I have some questions. http://cr.openjdk.java.net/~simonis/webrevs/2018/8205608/src/jdk.jdwp.agent/share/native/libjdwp/ThreadReferenceImpl.c.frames.html 288 if (length != count) { 289 error = JVMTI_ERROR_INTERNAL; 290 } The above looks as incorrect because it does not account for the startIndex value. Also, if the condition is true then it should skip the loop at the lines 292-303. 292 for (fnum = 0; (fnum < count) && (error == JVMTI_ERROR_NONE); ++fnum) { 293 WITH_LOCAL_REFS(env, 1) { 294 jclass clazz; 295 error = methodClass(frames[fnum].method, &clazz); 296 297 if (error == JVMTI_ERROR_NONE) { 298 FrameID frame = createFrameID(thread, fnum + startIndex); 299 outStream_writeFrameID(out, frame); 300 writeCodeLocation(out, clazz, frames[fnum].method, frames[fnum].location); 301 } 302 } END_WITH_LOCAL_REFS(env); 303 } It seems, there has to be a break in the loop at lines 292-303 above if the condition at the line 297 is not true. Please, find a similarity in the original code lines 303-304: 303 if (error != JVMTI_ERROR_NONE) 304 break; Also, it looks as incorrect to send the packages if the such an error was identified. But this issue exists in the original implementation as well. On 7/5/18 15:37, Chris Plummer wrote: Hi Ralf, Overall looks good, but I do have a few comments and questions. Please update the copyright. What testing have you done? +1 How long does this test take to run. What happens if for some reason SOE is never thrown? It's not clear to me what the script would do in this case. In answer to the ShellScaffold.sh question, there is already work underway to convert to pure java tests. See JDK-8201652. I'm not certain if it is ok for you to just submit this new shell script, or if should be rewritten in pure java. Most of the work to convert the scripts has already been done but was put on hold. Maybe Serguei can comment and guide you on how it would be done in java. The test has to be the TestScaffold based. You can find many examples in the jdk/com/sun/jdi folder. Thanks, Serguei thanks, Chris On 7/3/18 3:43 AM, Schmelter, Ralf wrote: Hi All, Please review the fix for the bughttps://bugs.openjdk.java.net/browse/JDK-8205608 . The webref is athttp://cr.openjdk.java.net/~simonis/webrevs/2018/8205608/ . This fixes the quadratic runtime (in the number of frames) of the frames() method, making it linear instead. It uses additional memory proportional to the number of frames on the stack. I've included a jtreg test, which would time out in the old implementation (since it takes minutes to get the stack frames). I'm not sure how useful this is. Best regards, Ralf Schmelter
Re: RFR: JDK-8206007: nsk/jdb/exclude001 test is taking a long time on some builds
Hi Gary, One thing is not clear. The 8206007 is linked to the 8197938 which tags this test in the ProblemList.txt. This line is removed: -vmTestbase/nsk/jdb/exclude/exclude001/exclude001.java 8197938 windows-all but the bug 8197938 is still open. Is it intentional or some kind of a typo? Or maybe we have to close the 8197938 as a dup of 8206007? Otherwise, the fix looks good to me. Thank you for the extra testing! Thanks, Serguei On 7/5/18 07:48, Gary Adams wrote: A simple test run using "exclude none" shows 625K methods are being observed. The bulk of those methods were due to the last class accessed in the test - VirtualMachineManager. It's not important that this particular call is used. The test is simply demonstrating that filters work for other packages than java and javax. This proposed fix uses a simpler lookup for GregorianCalendar. Issue: https://bugs.openjdk.java.net/browse/JDK-8206007 Webrev: http://cr.openjdk.java.net/~gadams/8206007/webrev.00/
Re: RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
Hi David, Looks good. Regarding the test being in a package, looks like this was the convention for the nsk tests, so that's why I noted it. thanks, Chris On 7/5/18 3:40 PM, David Holmes wrote: Hi Chris, Thanks for looking at this. Updated webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/ Only real changes in ji05t001.c. (And fixed typo in the new test) More below ... On 6/07/2018 7:55 AM, Chris Plummer wrote: Hi David, Solaris problems aside, overall it looks fine. Some minor things I noted: I noticed that exitCode is never modified in agentA() or agentB(), so there isn't much point to having it. If you reach the bottom of the function, it passed, so PASSED can be returned. The code would be more clear if it did this. As-is it is implied that you can reach the bottom when it fails. I resisted any and all urges to do any kind of unrelated code cleanup in the tests - once you start you may end up doing a full rewrite. Is detaching the threads along the failure paths really needed? exit() is called, so this would seem to make it unnecessary. You're right that isn't necessary. I'll remove the changes from before the exits in ji05t001.c I prefer assignments not to be embedded inside the "if" condition. The DetachCurrentThread code in THREAD_return() is much more readable than the similar code in agentA() and agentB(). It's an existing style already used in that test e.g. 287 if ((res = 288 JNI_ENV_PTR(vm)->AttachCurrentThread( 289 JNI_ENV_ARG(vm, (void **) &env), (void *) 0)) != 0) { and I don't mind it, so I'd prefer not to change it. In the test: 54 // Generally as long as we don't crash of throw unexpected 55 // exceptions then the test passes. In some cases we know exactly "of" should be "or". Well spotted. Thanks. Shouldn't you be catching exceptions for all the Thread methods you are calling? Otherwise the test will exit if one is thrown, and the above comment indicates that you don't want this. I'm not expecting there to be any exceptions from any of the called methods. That would potentially indicate a problem in handling the terminated native thread, so would indicate a test failure. Don't we normally put these tests in a package? Doesn't seem to be any hard and fast rule. I only uses packages when they are important for the test. In runtime we have 905 java files and only 116 have a package statement. It varies elsewhere. Thanks, David thanks, Chris On 7/5/18 2:58 AM, David Holmes wrote: Solaris compiler complains about doing a return from inside a do-while loop. I'll have to rework part of the fix tomorrow. David On 5/07/2018 6:19 PM, David Holmes wrote: Bug: https://bugs.openjdk.java.net/browse/JDK-8205878 Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/ Problem: The tests create native threads that attach to the VM through JNI_AttachCurrentThread but which then terminate without detaching themselves. When the VM exits and we're using Flight Recorder "dumponexit" this leads to a call to VM_PrintThreads that in part wants to print the per-thread CPU usage. When we encounter the threads that have terminated already the low level pthread_getcpuclockid calls returns ESRCH but the code doesn't expect that and so fails an assert in debug mode and can SEGV in product mode. Solution: Serviceability-side: fix the tests Change the tests so that the threads detach before terminating. The two tests are (surprisingly) written in completely different styles, so the solution also takes on two different styles. Runtime-side: make the VM more robust in the fact of JNI attached threads that terminate before detaching, and add a regression test I took a good look at the low-level code for interacting with arbitrary threads and as far as I can see the problem only exists for this one case of pthread_getcpuclockid on Linux. Elsewhere the potential for a library call failure just reports an error value (such as -1 for the cpu time used). So the fix is simply to allow for ESRCH when calling pthread_getcpuclockid and return -1 for the cpu usage in that case. I created a new regression test to create a new native thread, attach it and then let it terminate while still attached. The java code then calls various Thread and ThreadMXBean functions on it to ensure there are no crashes or unexpected exceptions. Testing: - old tests with fixed run-time - old run-time with fixed tests - mach tier4 (which exposed the problem - that's where we enable Flight recorder for the tests) [in progress] - mach5 tier 1-3 for good measure [in progress] - new regression test Thanks, David
Re: RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
Hi Chris, Thanks for looking at this. Updated webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/ Only real changes in ji05t001.c. (And fixed typo in the new test) More below ... On 6/07/2018 7:55 AM, Chris Plummer wrote: Hi David, Solaris problems aside, overall it looks fine. Some minor things I noted: I noticed that exitCode is never modified in agentA() or agentB(), so there isn't much point to having it. If you reach the bottom of the function, it passed, so PASSED can be returned. The code would be more clear if it did this. As-is it is implied that you can reach the bottom when it fails. I resisted any and all urges to do any kind of unrelated code cleanup in the tests - once you start you may end up doing a full rewrite. Is detaching the threads along the failure paths really needed? exit() is called, so this would seem to make it unnecessary. You're right that isn't necessary. I'll remove the changes from before the exits in ji05t001.c I prefer assignments not to be embedded inside the "if" condition. The DetachCurrentThread code in THREAD_return() is much more readable than the similar code in agentA() and agentB(). It's an existing style already used in that test e.g. 287 if ((res = 288 JNI_ENV_PTR(vm)->AttachCurrentThread( 289 JNI_ENV_ARG(vm, (void **) &env), (void *) 0)) != 0) { and I don't mind it, so I'd prefer not to change it. In the test: 54 // Generally as long as we don't crash of throw unexpected 55 // exceptions then the test passes. In some cases we know exactly "of" should be "or". Well spotted. Thanks. Shouldn't you be catching exceptions for all the Thread methods you are calling? Otherwise the test will exit if one is thrown, and the above comment indicates that you don't want this. I'm not expecting there to be any exceptions from any of the called methods. That would potentially indicate a problem in handling the terminated native thread, so would indicate a test failure. Don't we normally put these tests in a package? Doesn't seem to be any hard and fast rule. I only uses packages when they are important for the test. In runtime we have 905 java files and only 116 have a package statement. It varies elsewhere. Thanks, David thanks, Chris On 7/5/18 2:58 AM, David Holmes wrote: Solaris compiler complains about doing a return from inside a do-while loop. I'll have to rework part of the fix tomorrow. David On 5/07/2018 6:19 PM, David Holmes wrote: Bug: https://bugs.openjdk.java.net/browse/JDK-8205878 Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/ Problem: The tests create native threads that attach to the VM through JNI_AttachCurrentThread but which then terminate without detaching themselves. When the VM exits and we're using Flight Recorder "dumponexit" this leads to a call to VM_PrintThreads that in part wants to print the per-thread CPU usage. When we encounter the threads that have terminated already the low level pthread_getcpuclockid calls returns ESRCH but the code doesn't expect that and so fails an assert in debug mode and can SEGV in product mode. Solution: Serviceability-side: fix the tests Change the tests so that the threads detach before terminating. The two tests are (surprisingly) written in completely different styles, so the solution also takes on two different styles. Runtime-side: make the VM more robust in the fact of JNI attached threads that terminate before detaching, and add a regression test I took a good look at the low-level code for interacting with arbitrary threads and as far as I can see the problem only exists for this one case of pthread_getcpuclockid on Linux. Elsewhere the potential for a library call failure just reports an error value (such as -1 for the cpu time used). So the fix is simply to allow for ESRCH when calling pthread_getcpuclockid and return -1 for the cpu usage in that case. I created a new regression test to create a new native thread, attach it and then let it terminate while still attached. The java code then calls various Thread and ThreadMXBean functions on it to ensure there are no crashes or unexpected exceptions. Testing: - old tests with fixed run-time - old run-time with fixed tests - mach tier4 (which exposed the problem - that's where we enable Flight recorder for the tests) [in progress] - mach5 tier 1-3 for good measure [in progress] - new regression test Thanks, David
Re: RFR (S) 8205608: Fix 'frames()' in ThreadReferenceImpl.c to prevent quadratic runtime behavior
Hi Ralf, Overall looks good, but I do have a few comments and questions. Please update the copyright. What testing have you done? How long does this test take to run. What happens if for some reason SOE is never thrown? It's not clear to me what the script would do in this case. In answer to the ShellScaffold.sh question, there is already work underway to convert to pure java tests. See JDK-8201652. I'm not certain if it is ok for you to just submit this new shell script, or if should be rewritten in pure java. Most of the work to convert the scripts has already been done but was put on hold. Maybe Serguei can comment and guide you on how it would be done in java. thanks, Chris On 7/3/18 3:43 AM, Schmelter, Ralf wrote: Hi All, Please review the fix for the bughttps://bugs.openjdk.java.net/browse/JDK-8205608 . The webref is athttp://cr.openjdk.java.net/~simonis/webrevs/2018/8205608/ . This fixes the quadratic runtime (in the number of frames) of the frames() method, making it linear instead. It uses additional memory proportional to the number of frames on the stack. I've included a jtreg test, which would time out in the old implementation (since it takes minutes to get the stack frames). I'm not sure how useful this is. Best regards, Ralf Schmelter
Re: RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
On 5/07/2018 7:58 PM, David Holmes wrote: Solaris compiler complains about doing a return from inside a do-while loop. I'll have to rework part of the fix tomorrow. Webrev updated in-place. The only change is to the makefile to disable a warning: + ifeq ($(TOOLCHAIN_TYPE), solstudio) + BUILD_HOTSPOT_JTREG_LIBRARIES_CFLAGS_libji06t001 += -erroff=E_END_OF_LOOP_CODE_NOT_REACHED + endif + David - David On 5/07/2018 6:19 PM, David Holmes wrote: Bug: https://bugs.openjdk.java.net/browse/JDK-8205878 Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/ Problem: The tests create native threads that attach to the VM through JNI_AttachCurrentThread but which then terminate without detaching themselves. When the VM exits and we're using Flight Recorder "dumponexit" this leads to a call to VM_PrintThreads that in part wants to print the per-thread CPU usage. When we encounter the threads that have terminated already the low level pthread_getcpuclockid calls returns ESRCH but the code doesn't expect that and so fails an assert in debug mode and can SEGV in product mode. Solution: Serviceability-side: fix the tests Change the tests so that the threads detach before terminating. The two tests are (surprisingly) written in completely different styles, so the solution also takes on two different styles. Runtime-side: make the VM more robust in the fact of JNI attached threads that terminate before detaching, and add a regression test I took a good look at the low-level code for interacting with arbitrary threads and as far as I can see the problem only exists for this one case of pthread_getcpuclockid on Linux. Elsewhere the potential for a library call failure just reports an error value (such as -1 for the cpu time used). So the fix is simply to allow for ESRCH when calling pthread_getcpuclockid and return -1 for the cpu usage in that case. I created a new regression test to create a new native thread, attach it and then let it terminate while still attached. The java code then calls various Thread and ThreadMXBean functions on it to ensure there are no crashes or unexpected exceptions. Testing: - old tests with fixed run-time - old run-time with fixed tests - mach tier4 (which exposed the problem - that's where we enable Flight recorder for the tests) [in progress] - mach5 tier 1-3 for good measure [in progress] - new regression test Thanks, David
Re: RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
Hi David, Solaris problems aside, overall it looks fine. Some minor things I noted: I noticed that exitCode is never modified in agentA() or agentB(), so there isn't much point to having it. If you reach the bottom of the function, it passed, so PASSED can be returned. The code would be more clear if it did this. As-is it is implied that you can reach the bottom when it fails. Is detaching the threads along the failure paths really needed? exit() is called, so this would seem to make it unnecessary. I prefer assignments not to be embedded inside the "if" condition. The DetachCurrentThread code in THREAD_return() is much more readable than the similar code in agentA() and agentB(). In the test: 54 // Generally as long as we don't crash of throw unexpected 55 // exceptions then the test passes. In some cases we know exactly "of" should be "or". Shouldn't you be catching exceptions for all the Thread methods you are calling? Otherwise the test will exit if one is thrown, and the above comment indicates that you don't want this. Don't we normally put these tests in a package? thanks, Chris On 7/5/18 2:58 AM, David Holmes wrote: Solaris compiler complains about doing a return from inside a do-while loop. I'll have to rework part of the fix tomorrow. David On 5/07/2018 6:19 PM, David Holmes wrote: Bug: https://bugs.openjdk.java.net/browse/JDK-8205878 Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/ Problem: The tests create native threads that attach to the VM through JNI_AttachCurrentThread but which then terminate without detaching themselves. When the VM exits and we're using Flight Recorder "dumponexit" this leads to a call to VM_PrintThreads that in part wants to print the per-thread CPU usage. When we encounter the threads that have terminated already the low level pthread_getcpuclockid calls returns ESRCH but the code doesn't expect that and so fails an assert in debug mode and can SEGV in product mode. Solution: Serviceability-side: fix the tests Change the tests so that the threads detach before terminating. The two tests are (surprisingly) written in completely different styles, so the solution also takes on two different styles. Runtime-side: make the VM more robust in the fact of JNI attached threads that terminate before detaching, and add a regression test I took a good look at the low-level code for interacting with arbitrary threads and as far as I can see the problem only exists for this one case of pthread_getcpuclockid on Linux. Elsewhere the potential for a library call failure just reports an error value (such as -1 for the cpu time used). So the fix is simply to allow for ESRCH when calling pthread_getcpuclockid and return -1 for the cpu usage in that case. I created a new regression test to create a new native thread, attach it and then let it terminate while still attached. The java code then calls various Thread and ThreadMXBean functions on it to ensure there are no crashes or unexpected exceptions. Testing: - old tests with fixed run-time - old run-time with fixed tests - mach tier4 (which exposed the problem - that's where we enable Flight recorder for the tests) [in progress] - mach5 tier 1-3 for good measure [in progress] - new regression test Thanks, David
Re: RFR: JDK-8206007: nsk/jdb/exclude001 test is taking a long time on some builds
Hi Gary, The changes look good. How much is the reducing execution by? thanks, Chris On 7/5/18 7:48 AM, Gary Adams wrote: A simple test run using "exclude none" shows 625K methods are being observed. The bulk of those methods were due to the last class accessed in the test - VirtualMachineManager. It's not important that this particular call is used. The test is simply demonstrating that filters work for other packages than java and javax. This proposed fix uses a simpler lookup for GregorianCalendar. Issue: https://bugs.openjdk.java.net/browse/JDK-8206007 Webrev: http://cr.openjdk.java.net/~gadams/8206007/webrev.00/
RFR: JDK-8206007: nsk/jdb/exclude001 test is taking a long time on some builds
A simple test run using "exclude none" shows 625K methods are being observed. The bulk of those methods were due to the last class accessed in the test - VirtualMachineManager. It's not important that this particular call is used. The test is simply demonstrating that filters work for other packages than java and javax. This proposed fix uses a simpler lookup for GregorianCalendar. Issue: https://bugs.openjdk.java.net/browse/JDK-8206007 Webrev: http://cr.openjdk.java.net/~gadams/8206007/webrev.00/
Re: RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
Solaris compiler complains about doing a return from inside a do-while loop. I'll have to rework part of the fix tomorrow. David On 5/07/2018 6:19 PM, David Holmes wrote: Bug: https://bugs.openjdk.java.net/browse/JDK-8205878 Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/ Problem: The tests create native threads that attach to the VM through JNI_AttachCurrentThread but which then terminate without detaching themselves. When the VM exits and we're using Flight Recorder "dumponexit" this leads to a call to VM_PrintThreads that in part wants to print the per-thread CPU usage. When we encounter the threads that have terminated already the low level pthread_getcpuclockid calls returns ESRCH but the code doesn't expect that and so fails an assert in debug mode and can SEGV in product mode. Solution: Serviceability-side: fix the tests Change the tests so that the threads detach before terminating. The two tests are (surprisingly) written in completely different styles, so the solution also takes on two different styles. Runtime-side: make the VM more robust in the fact of JNI attached threads that terminate before detaching, and add a regression test I took a good look at the low-level code for interacting with arbitrary threads and as far as I can see the problem only exists for this one case of pthread_getcpuclockid on Linux. Elsewhere the potential for a library call failure just reports an error value (such as -1 for the cpu time used). So the fix is simply to allow for ESRCH when calling pthread_getcpuclockid and return -1 for the cpu usage in that case. I created a new regression test to create a new native thread, attach it and then let it terminate while still attached. The java code then calls various Thread and ThreadMXBean functions on it to ensure there are no crashes or unexpected exceptions. Testing: - old tests with fixed run-time - old run-time with fixed tests - mach tier4 (which exposed the problem - that's where we enable Flight recorder for the tests) [in progress] - mach5 tier 1-3 for good measure [in progress] - new regression test Thanks, David
RFR (11): 8205878: pthread_getcpuclockid is expected to return 0 code
Bug: https://bugs.openjdk.java.net/browse/JDK-8205878 Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/ Problem: The tests create native threads that attach to the VM through JNI_AttachCurrentThread but which then terminate without detaching themselves. When the VM exits and we're using Flight Recorder "dumponexit" this leads to a call to VM_PrintThreads that in part wants to print the per-thread CPU usage. When we encounter the threads that have terminated already the low level pthread_getcpuclockid calls returns ESRCH but the code doesn't expect that and so fails an assert in debug mode and can SEGV in product mode. Solution: Serviceability-side: fix the tests Change the tests so that the threads detach before terminating. The two tests are (surprisingly) written in completely different styles, so the solution also takes on two different styles. Runtime-side: make the VM more robust in the fact of JNI attached threads that terminate before detaching, and add a regression test I took a good look at the low-level code for interacting with arbitrary threads and as far as I can see the problem only exists for this one case of pthread_getcpuclockid on Linux. Elsewhere the potential for a library call failure just reports an error value (such as -1 for the cpu time used). So the fix is simply to allow for ESRCH when calling pthread_getcpuclockid and return -1 for the cpu usage in that case. I created a new regression test to create a new native thread, attach it and then let it terminate while still attached. The java code then calls various Thread and ThreadMXBean functions on it to ensure there are no crashes or unexpected exceptions. Testing: - old tests with fixed run-time - old run-time with fixed tests - mach tier4 (which exposed the problem - that's where we enable Flight recorder for the tests) [in progress] - mach5 tier 1-3 for good measure [in progress] - new regression test Thanks, David