On 7/4/19 5:38 AM, David Holmes wrote:
Hi Erik,
On 4/07/2019 6:08 pm, Erik Österlund wrote:
Hi David,
When you run without TLH, this copying mechanism is used to
synchronize the safepoint while JavaThreads are running. The
interpreter doesn't emit any polls then. Instead it clobbers the
dispatch table. JavaThreads will be reading from the dispatch table
while it is being (non-atomically) modified. That could crash. For
example with the Solaris + studio + SPARC - TLH configuration, the
compiler will almost certainly emit a memcpy (this transformation has
been observed in practice), the memcpy will use BIS instructions
(observed in practice) for performance, with out-of-thin-air values
(observed in practice), and the JavaThreads will occasionally crash
during safepoint synchronization due to said out-of-thin-air values.
So I guess the problem might be larger back when TLH was not default.
But this seems conceptually wrong.
I always thought there were two dispatch tables and we simply switched
between them - not copied anything!
Based on my search back into the TeamWare repos, it looks like we have
always copied the table...
Dan
David
/Erik
On 2019-07-04 09:17, David Holmes wrote:
Hi Erik,
On 4/07/2019 5:10 pm, Erik Österlund wrote:
Hi Dan,
Thanks for picking this up. The change looks good.
However, when reviewing this, I looked at the code for actually
restoring the table (ignore/notice safepoints). It copies the
dispatch table for the interpreter. There is a comment stating it
is important the copying is atomic for MT-safety, and I can
definitely see why. However, the copying the line after that
comment is in fact not atomic.
Is it assuming "atomicity" by virtue of executing at a safepoint?
David
-----
Here is the copying code in templateInterpreter.cpp:
static inline void copy_table(address* from, address* to, int size) {
// Copy non-overlapping tables. The copy has to occur word wise
for MT safety.
while (size-- > 0) *to++ = *from++;
}
Copying using a loop of non-volatile loads and stores can and
definitely will on some compilers turn into memcpy calls instead as
the compiler (correctly) considers that an equivalent
transformation. And memcpy does not guarantee atomicity. Indeed on
some platforms it is not atomic. On some platforms it will even
enjoy out-of-thin-air values. Perhaps Copy::disjoint_words_atomic()
would be a better choice for atomic word copying. If not, at the
very least we should use Atomic::load/store here.
Having said that, the fix for that issue seems like a separate RFE,
because it has been sitting there for a lot longer than TLH has
been around.
Thanks,
/Erik
On 2019-07-04 04:04, Daniel D. Daugherty wrote:
Greetings,
Robbin recently discovered this issue with Thread Local
Handshakes. Since
he's not available at the moment, I'm handling the issue:
JDK-8227117 normal interpreter table is not restored after
single stepping with TLH
https://bugs.openjdk.java.net/browse/JDK-8227117
When using Thread Local Handshakes, the normal interpreter table is
not restored after single stepping. This issue is caused by the
VM_ChangeSingleStep VM-op relying on SafepointSynchronize::end() to
restore the normal interpreter table for the "off" case.
Prior to Thread Local Handshakes, this was a valid assumption to
make.
SafepointSynchronize::end() has been refactored into
disarm_safepoint() and it only calls Interpreter::ignore_safepoints()
on the global safepoint branch. That matches up with the call to
Interpreter::notice_safepoints() that is also on the global safepoint
branch.
The solution is for the VM_ChangeSingleStep VM-op for the "off" case
to call Interpreter::ignore_safepoints() directly.
Here's the webrev URL:
http://cr.openjdk.java.net/~dcubed/8227117-webrev/0_for_jdk14/
The fix is just a small addition to VM_ChangeSingleStep::doit():
if (_on) {
Interpreter::notice_safepoints();
+ } else {
+ Interpreter::ignore_safepoints();
}
Everything else is just new logging support for future debugging of
interpreter table management and single stepping.
Tested this fix with Mach5 Tier[1-3] on the standard Oracle
platforms.
Mach5 Tier[4-6] on standard Oracle platforms is running now.
Thanks, in advance, for questions, comments or suggestions.
Dan