Hi Serguei,
> The implementation looks good to me.
Thanks.
> But I do not understand what the test is doing with all this counters and
> recursions.
The @comment gives an explanation: the target thread builds a stack as large as
possible to prolong the unsafe stackwalk. This is done by means of recursion.
> For instance, these fragments:
> 86 recursions = 0;
> 87 try {
> 88 recursiveMethod(1<<20);
> 89 } catch (StackOverflowError e) {
> 90 msg("Caught StackOverflowError as expected");
> 91 }
> 92 int target_depth = recursions-100; // spaces are missed
> around the '-' sigh
Here we calibrate the test for maximum stack depth. We measure how large the
stack can grow by calling recursiveMethod() until a StackOverflowError is
raised. We use recursions - 100 as target_depth to avoid the StackOverflowError
during the actual test.
> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
> recursiveMethod(M);
> int target_depth = M - 100;
> Then the variable 'recursions' can be removed or become local.
> This method will be:
> 47 private static final int M = 1 << 20;
> ...
> 121 public long recursiveMethod(int depth) {
> 123 if (depth == 0) {
> 124 notifyAgentToGetLocalAndWaitShortly(M - 100,
> waitTimeInNativeAfterNotify);
> 126 } else {
> 127 recursiveMethod(--depth);
> 128 }
> 129 }
I don't think this would work. A StackOverflowError would be raised
before notifyAgentToGetLocalAndWaitShortly() is called.
> At least, he test is missing the comments explaining all these.
The arguments to the msg() method serve as documentation too. I would not want
to repeat the msg strings in comments.
Thanks, Richard.
-------------------------------------------------------------
From: [email protected] <[email protected]>
Sent: Montag, 10. August 2020 19:22
To: David Holmes <[email protected]>; Reingruber, Richard
<[email protected]>; [email protected]
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in
VM_GetOrSetLocal::doit_prologue()
Hi Richard and David,
The implementation looks good to me.
But I do not understand what the test is doing with all this counters and
recursions.
For instance, these fragments:
86 recursions = 0;
87 try {
88 recursiveMethod(1<<20);
89 } catch (StackOverflowError e) {
90 msg("Caught StackOverflowError as expected");
91 }
92 int target_depth = recursions-100; // spaces are missed around
the '-' sigh
It is not obvious that the 'recursion' is updated in the recursiveMethod.
I would suggestto make it more explicit:
recursiveMethod(M);
int target_depth = M - 100;
Then the variable 'recursions' can be removed or become local.
This method will be:
47 private static final int M = 1 << 20;
...
121 public long recursiveMethod(int depth) {
123 if (depth == 0) {
124 notifyAgentToGetLocalAndWaitShortly(M - 100,
waitTimeInNativeAfterNotify);
126 } else {
127 recursiveMethod(--depth);
128 }
129 }
At least, he test is missing the comments explaining all these.
Thanks,
Serguei
On 8/9/20 22:35, David Holmes wrote:
Hi Richard,
On 31/07/2020 5:28 pm, Reingruber, Richard wrote:
Hi,
I rebase the fix after JDK-8250042.
New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/
The general fix for this seems good. A minor nit:
588 if (!is_assignable(signature, ob_k, Thread::current())) {
You know that the current thread is the VMThread so can use
VMThread::vm_thread().
Similarly for this existing code:
694 Thread* current_thread = Thread::current();
---
Looking at the test code ... I'm less clear on exactly what is happening and
the use of spin-waits raises some red-flags for me in terms of test reliability
on different platforms. The "while (--waitCycles > 0)" loop in particular
offers no certainty that the agent thread is executing anything in particular.
And the use of the spin_count as a guide to future waiting time seems somewhat
arbitrary. In all seriousness I got a headache trying to work out how the test
was expecting to operate. Some parts could be simplified using raw monitors, I
think. But there's no sure way to know the agent thread is in the midst of the
stackwalk when the target thread wants to leave the native code. So I
understand what you are trying to achieve here, I'm just not sure how reliably
it will actually achieve it.
test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp
32 static volatile jlong spinn_count = 0;
Using a 64-bit counter seems like it will be a problem on 32-bit systems.
Should be spin_count not spinn_count.
36 // Agent thread waits for value != 0, then performas the JVMTI call to get
local variable.
typo: performas
Thanks,
David
-----
Thanks, Richard.
-----Original Message-----
From: serviceability-dev mailto:[email protected] On
Behalf Of Reingruber, Richard
Sent: Montag, 27. Juli 2020 09:45
To: mailto:[email protected];
mailto:[email protected]
Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in
VM_GetOrSetLocal::doit_prologue()
Hi Serguei,
> I tested it on Linux and Windows but not yet on MacOS.
The test succeeded now on all platforms.
Thanks, Richard.
-----Original Message-----
From: Reingruber, Richard
Sent: Freitag, 24. Juli 2020 15:04
To: mailto:[email protected];
mailto:[email protected]
Subject: RE: RFR(S) 8249293: Unsafe stackwalk in
VM_GetOrSetLocal::doit_prologue()
Hi Serguei,
The fix itself looks good to me.
thanks for looking at the fix.
I still need another look at new test.
Could you, please, convert the agent of new test to C++?
It will make it a little bit simpler.
Sure, here is the new webrev.1 with a C++ version of the test agent:
http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/
I tested it on Linux and Windows but not yet on MacOS.
Thanks,
Richard.
-----Original Message-----
From: mailto:[email protected] mailto:[email protected]
Sent: Freitag, 24. Juli 2020 00:00
To: Reingruber, Richard mailto:[email protected];
mailto:[email protected]
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in
VM_GetOrSetLocal::doit_prologue()
Hi Richard,
Thank you for filing the CR and taking care about it!
The fix itself looks good to me.
I still need another look at new test.
Could you, please, convert the agent of new test to C++?
It will make it a little bit simpler.
Thanks,
Serguei
On 7/20/20 01:15, Reingruber, Richard wrote:
Hi,
please help review this fix for VM_GetOrSetLocal. It moves the unsafe stackwalk
from the vm
operation prologue before the safepoint into the doit() method executed at the
safepoint.
Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8249293
According to the JVMTI spec on local variable access it is not required to
suspend the target thread
T [1]. The operation will simply fail with JVMTI_ERROR_NO_MORE_FRAMES if T is
executing
bytecodes. It will succeed though if T is blocked because of synchronization or
executing some native
code.
The issue is that in the latter case the stack walk in
VM_GetOrSetLocal::doit_prologue() to prepare
the access to the local variable is unsafe, because it is done before the
safepoint and it races
with T returning to execute bytecodes making its stack not walkable. The
included test shows that
this can crash the VM if T wins the race.
Manual testing:
- new test
test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
- test/hotspot/jtreg/vmTestbase/nsk/jvmti
- test/hotspot/jtreg/serviceability/jvmti
Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, SPECjvm2008,
SPECjbb2015,
Renaissance Suite, SAP specific tests with fastdebug and release builds on all
platforms
Thanks, Richard.
[1] https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local