Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v6]

2022-04-26 Thread Erik Österlund
On Mon, 25 Apr 2022 13:19:49 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains ten commits:
> 
>  - Refresh cf561073f48fad58e931a5285b92629aa83c9bd1
>  - Merge with jdk-19+19
>  - Refresh
>  - Refresh
>  - Refresh
>  - Refresh
>  - Merge with jdk-19+18
>  - Refresh
>  - Initial push

I have done a long and extensive hands-on review of most of the JVM code 
(excluding most of the JVMTI changes). There is still room for improvements, 
but I think this is in a good enough shape to be integrated now, plus minus 
some patches in the ether between the fibers branch and this PR which I'm sure 
will pop up soon. So this looks good to me. Ship it!

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark

2021-11-29 Thread Erik Österlund
On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore  wrote:

> This change seems to keep the test case in the bug from crashing in the 
> ResourceMark destructor.  We have a ResourceMark during stack walking in 
> AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix 
> care of @fisk.
> Testing tier1-6 in progress.

Looks good.

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6606


Integrated: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper

2021-11-23 Thread Erik Österlund
On Mon, 22 Nov 2021 13:49:02 GMT, Erik Österlund  wrote:

> The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is 
> constructed right before running a parallel operation with a work gang, but 
> freed in the destructor of the VM_HeapDumper. This means it is created on one 
> thread and deleted on another thread. This becomes a bit problematic when a 
> parallel object iterator implementation uses a ThreadsListHandle (which is 
> indeed the case for ZGC). This patch changes ParallelObjectIterator to be a 
> StackObj, carrying a ParallelObjectIteratorImpl object, which is never 
> exposed publicly. This ensures that construction and destruction of the 
> internal object iterator is scoped like RAII objects, hence complying with 
> how ThreadsListHandle is supposed to be used.

This pull request has now been integrated.

Changeset: f4dc03ea
Author:Erik Österlund 
URL:   
https://git.openjdk.java.net/jdk/commit/f4dc03ea6de327425ff265c3d2ec16ea7b0e1634
Stats: 70 lines in 15 files changed: 35 ins; 11 del; 24 mod

8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper

Reviewed-by: pliden, stefank

-

PR: https://git.openjdk.java.net/jdk/pull/6501


Re: RFR: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper

2021-11-23 Thread Erik Österlund
On Tue, 23 Nov 2021 09:42:10 GMT, Per Liden  wrote:

>> The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is 
>> constructed right before running a parallel operation with a work gang, but 
>> freed in the destructor of the VM_HeapDumper. This means it is created on 
>> one thread and deleted on another thread. This becomes a bit problematic 
>> when a parallel object iterator implementation uses a ThreadsListHandle 
>> (which is indeed the case for ZGC). This patch changes 
>> ParallelObjectIterator to be a StackObj, carrying a 
>> ParallelObjectIteratorImpl object, which is never exposed publicly. This 
>> ensures that construction and destruction of the internal object iterator is 
>> scoped like RAII objects, hence complying with how ThreadsListHandle is 
>> supposed to be used.
>
> Looks good.

Thanks for the reviews, @pliden and @stefank!

-

PR: https://git.openjdk.java.net/jdk/pull/6501


RFR: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper

2021-11-22 Thread Erik Österlund
The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is 
constructed right before running a parallel operation with a work gang, but 
freed in the destructor of the VM_HeapDumper. This means it is created on one 
thread and deleted on another thread. This becomes a bit problematic when a 
parallel object iterator implementation uses a ThreadsListHandle (which is 
indeed the case for ZGC). This patch changes ParallelObjectIterator to be a 
StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed 
publicly. This ensures that construction and destruction of the internal object 
iterator is scoped like RAII objects, hence complying with how 
ThreadsListHandle is supposed to be used.

-

Commit messages:
 - 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper

Changes: https://git.openjdk.java.net/jdk/pull/6501/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=6501=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276696
  Stats: 70 lines in 15 files changed: 35 ins; 11 del; 24 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6501.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6501/head:pull/6501

PR: https://git.openjdk.java.net/jdk/pull/6501


Re: RFR: 8272788: Nonleaf ranked locks should not be safepoint_check_never [v2]

2021-08-25 Thread Erik Österlund
On Mon, 23 Aug 2021 13:03:10 GMT, Coleen Phillimore  wrote:

>> I moved nonleaf ranked locks to be leaf (or leaf+something).  Many of the 
>> leaf locks are safepoint_check_never.  Segregating this rank into safepoint 
>> checking and non-safepoint checking is left for a future RFE.
>> Tested with tier1-3.  Tier 4-6 testing in progress.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove JfrSream_lock and rerun JFR tests.

So you are enforcing the relationship between safepoint checking and rank. In a 
way we already had that for special locks. I like it.

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5203


Re: RFR: 8265148: StackWatermarkSet being updated during AsyncGetCallTrace

2021-05-31 Thread Erik Österlund
On Tue, 1 Jun 2021 04:23:43 GMT, Leonid Mesnik  wrote:

> Thank you for your prompt response. I meant that it makes sense to update 
> comments for RegisterMap to mention this. Currently, the doc says how to use 
> it and how to disable update:
> 
> "Updating of the RegisterMap can be turned off by instantiating the
> 
> //  register map as: RegisterMap map(thread, false);"
> 
> But it says nothing about why and how  process_frames should be set.
> 
> It might make sense to put this info there so anyone could easily find and 
> read it. I think it is better to put it there rather than in forte.cpp.

Yes I agree - that does make sense.

-

PR: https://git.openjdk.java.net/jdk/pull/4217


Re: RFR: 8265148: StackWatermarkSet being updated during AsyncGetCallTrace

2021-05-31 Thread Erik Österlund
On Tue, 1 Jun 2021 03:00:23 GMT, Leonid Mesnik  wrote:

> Let me check with Erik if it makes sense to put more generic comments about 
> the usage of stackwatermark frame processing in RegisterMap, frames etc.  
> They can't be updated in an arbitrary thread state. It makes sense describe 
> this info in stackwatermarking.

You could say something generic like "StackWatermark can only be used when at 
points where the stack can be parsed by the GC", or something like that.

-

PR: https://git.openjdk.java.net/jdk/pull/4217


Re: RFR: 8265148: StackWatermarkSet being updated during AsyncGetCallTrace

2021-05-28 Thread Erik Österlund
On Thu, 27 May 2021 04:01:17 GMT, Leonid Mesnik  wrote:

> 8265148: StackWatermarkSet being updated during AsyncGetCallTrace

Looks good, but please test with ZGC before integrating.

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/4217


Re: RFR: 8267464: Circular-dependency resilient inline headers [v2]

2021-05-24 Thread Erik Österlund
On Fri, 21 May 2021 09:11:09 GMT, Stefan Karlsson  wrote:

>> I'd like to propose a small adjustment to how we write .inline.hpp files, in 
>> the hope to reduce include problems because of circular dependencies between 
>> inline headers.
>> 
>> This is a small, contrived example to show the problem:
>> 
>> // a.hpp
>> #pragma once
>> 
>> void a1();
>> void a2();
>> 
>> 
>> // a.inline.hpp
>> #pragma once
>> 
>> #include "a.hpp"
>> #include "b.inline.hpp"
>> 
>> inline void a1() {
>>   b1();
>> }
>> 
>> inline void a2() {
>> }
>> 
>> 
>> // b.hpp
>> #pragma once
>> 
>> void b1();
>> void b2();
>> 
>> 
>> // b.inline.hpp
>> #pragma once
>> 
>> #include "a.inline.hpp"
>> #include "b.hpp"
>> 
>> inline void b1() {
>> }
>> 
>> inline void b2() {
>>   a1();
>> }
>> 
>> The following code compiles fine:
>> 
>> // a.cpp
>> #include "a.inline.hpp"
>> 
>> int main() {
>>   a1();
>> 
>>   return 0;
>> }
>> 
>> But the following:
>> 
>> // b.cpp
>> #include "b.inline.hpp"
>> 
>> int main() {
>>   b1();
>> 
>>   return 0;
>> }
>> 
>> 
>> fails with the this error message:
>> 
>> In file included from b.inline.hpp:3,
>>  from b.cpp:1:
>> a.inline.hpp: In function ‘void a1()’:
>> a.inline.hpp:8:3: error: ‘b1’ was not declared in this scope; did you mean 
>> ‘a1’?
>> 
>> We can see the problem with g++ -E:
>> 
>> # 1 "a.inline.hpp" 1
>> # 1 "a.hpp" 1
>> 
>> void a1();
>> void a2();
>> 
>> # 4 "a.inline.hpp" 2
>> 
>> inline void a1() {
>>   b1();
>> }
>> 
>> inline void a2() {
>> }
>> 
>> # 4 "b.inline.hpp" 2
>> # 1 "b.hpp" 1
>> 
>> void b1();
>> void b2();
>> 
>> # 5 "b.inline.hpp" 2
>> 
>> inline void b1() {
>> }
>> 
>> inline void b2() {
>>   a1();
>> }
>> 
>> # 2 "b.cpp" 2
>> 
>> int main() {
>>   b1();
>> 
>>   return 0;
>> }
>> 
>> 
>> b1() is called before it has been declared. b.inline.hpp inlined 
>> a.inline.hpp, which uses functions declared in b.hpp. However, at that point 
>> we've not included enough of b.inline.hpp to have declared b1().
>> 
>> This is a small and easy example. In the JVM the circular dependencies 
>> usually involves more than two files, and often from different sub-systems 
>> of the JVM. We have so-far solved this by restructuring the code, making 
>> functions out-of-line, creating proxy objects, etc. However, the more we use 
>> templates, the more this is going to become a reoccurring nuisance. And in 
>> experiments with lambdas, which requires templates, this very quickly showed 
>> up as a problem.
>> 
>> I propose that we solve most (all?) of these issues by adding a rule that 
>> states that .inline.hpp files should start by including the corresponding 
>> .hpp, and that the .hpp should contain the declarations of all externally 
>> used parts of the .inline.hpp file. This should guarantee that the 
>> declarations are always present before any subsequent include can create a 
>> circular include dependency back to the original file.
>> 
>> In the example above, b.inline.hpp would become:
>> 
>> // b.inline.hpp
>> #pragma once
>> 
>> #include "b.hpp"
>> #include "a.inline.hpp"
>> 
>> inline void b1() {
>> }
>> 
>> inline void b2() {
>>   a1();
>> }
>> 
>> and now both a.cpp and b.cpp compiles. The generated output now looks like 
>> this:
>> 
>> # 1 "b.inline.hpp" 1
>> # 1 "b.hpp" 1
>> 
>> void b1();
>> void b2();
>> 
>> # 4 "b.inline.hpp" 2
>> # 1 "a.inline.hpp" 1
>> 
>> # 1 "a.hpp" 1
>> 
>> void a1();
>> void a2();
>> 
>> # 4 "a.inline.hpp" 2
>> 
>> inline void a1() {
>>   b1();
>> }
>> 
>> inline void a2() {
>> }
>> 
>> # 5 "b.inline.hpp" 2
>> 
>> inline void b1() {
>> }
>> 
>> inline void b2() {
>>   a1();
>> }
>> 
>> # 2 "b.cpp" 2
>> 
>> int main() {
>>   b1();
>> 
>>   return 0;
>> }
>> 
>> The declarations come first, and the compiler is happy. 
>> 
>> An alternative to this, that has been proposed by other HotSpot GC devs have 
>> been to always include all .hpp files first, and then have a section of 
>> .inline.hpp includes. I've experimented with that as well, but I think it 
>> requires more maintenance and vigilance of  *users* .inline.hpp files (add 
>> .hpp include to the first section, add .inline.hpp section to second). The 
>> proposed structures only requires extra care from *writers* of .inline.hpp 
>> files. All others can include .hpp and .inline.hpp as we've been doing for a 
>> long time now.
>> 
>> Some notes about this patch:
>> 1) Some externally-used declarations have been placed in .inline.hpp files, 
>> and not in the .hpp. Those should be moved. I have not done that.
>> 2) Some .inline.hpp files don't have a corresponding .hpp file. I could 
>> either try to fix that in this patch, or we could take care of that as 
>> separate patches later.
>> 3) The style I chose was to add the .hpp include and a extra blank line, 
>> separating it from the rest of the includes. I think I like that style, 
>> because it makes it easy for those writing .inline.hpp to recognize this 
>> pattern. And it removes the confusion why this include isn't 

Re: RFR: 8259008: ArithmeticException was thrown at "Monitor Cache Dump" on HSDB [v3]

2021-01-25 Thread Erik Österlund
On Sat, 23 Jan 2021 03:23:58 GMT, Yasumasa Suenaga  wrote:

>> I saw the exception as following when I chose "Monitor Cache Dump" menu on 
>> HSDB:
>> 
>> Exception in thread "AWT-EventQueue-0" java.lang.ExceptionInInitializerError
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.ui.MonitorCacheDumpPanel.dumpOn(MonitorCacheDumpPanel.java:92)
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.ui.MonitorCacheDumpPanel.(MonitorCacheDumpPanel.java:59)
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.HSDB.showMonitorCacheDumpPanel(HSDB.java:1556)
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.HSDB$16.actionPerformed(HSDB.java:338)
>> at 
>> java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1972)
>> at 
>> java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2313)
>> at 
>> java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
>> at 
>> java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
>> at 
>> java.desktop/javax.swing.AbstractButton.doClick(AbstractButton.java:374)
>> at 
>> java.desktop/javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1022)
>> at 
>> java.desktop/javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1066)
>> at 
>> java.desktop/java.awt.Component.processMouseEvent(Component.java:6617)
>> at 
>> java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342)
>> at java.desktop/java.awt.Component.processEvent(Component.java:6382)
>> at java.desktop/java.awt.Container.processEvent(Container.java:2264)
>> at 
>> java.desktop/java.awt.Component.dispatchEventImpl(Component.java:4993)
>> at 
>> java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2322)
>> at java.desktop/java.awt.Component.dispatchEvent(Component.java:4825)
>> at 
>> java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4934)
>> at 
>> java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4563)
>> at 
>> java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4504)
>> at 
>> java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2308)
>> at java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2773)
>> at java.desktop/java.awt.Component.dispatchEvent(Component.java:4825)
>> at 
>> java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:772)
>> at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721)
>> at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715)
>> at 
>> java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
>> at 
>> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
>> at 
>> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:95)
>> at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:745)
>> at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:743)
>> at 
>> java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
>> at 
>> java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
>> at 
>> java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
>> at 
>> java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
>> at 
>> java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
>> at 
>> java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
>> at 
>> java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
>> at 
>> java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>> at 
>> java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)
>> Caused by: java.lang.ArithmeticException: / by zero
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.runtime.ObjectSynchronizer.initialize(ObjectSynchronizer.java:55)
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.runtime.ObjectSynchronizer$1.update(ObjectSynchronizer.java:40)
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.runtime.VM.registerVMInitializedObserver(VM.java:578)
>> at 
>> jdk.hotspot.agent/sun.jvm.hotspot.runtime.ObjectSynchronizer.(ObjectSynchronizer.java:38)
>> ... 41 more
>> 
>> This exception seems to be caused by 
>> [JDK-8253064](https://bugs.openjdk.java.net/browse/JDK-8253064). It has 
>> changed `ObjectSynchronizer`, but it has not changed SA code.
>
> Yasumasa Suenaga has updated the pull request 

Re: RFR: 8231627: runtime/ErrorHandling/ThreadsListHandleInErrorHandlingTest.java fails because error occurred during printing all threads

2020-12-24 Thread Erik Österlund
On Thu, 24 Dec 2020 22:01:38 GMT, Erik Österlund  wrote:

>> A small robustness fix in ThreadsSMRSupport::print_info_on() to reduce the
>> likelihood of crashes during error reporting. Uses Threads_lock->try_lock()
>> for safety and restricts some reporting to when the Threads_lock has been
>> acquired (depends on JDK-8256383). Uses a ThreadsListHandle to make
>> the current ThreadsList safe for reporting (depends on JDK-8258284). Also
>> detects when the system ThreadsList (_java_thread_list) has changed and
>> will warn that some of the reported info may now be stale.
>> 
>> Two existing tests have been updated to reflect the use of a 
>> ThreadsListHandle
>> in ThreadsSMRSupport::print_info_on(). Mach5 Tier[1-6] testing has no 
>> regressions.
>
> Looks good. We have something similar in the precious GC log code during 
> error reporting.

> @fisk - Thanks for the review! And Merry Christmas Eve!!

Merry Christmas to you too Dan!

-

PR: https://git.openjdk.java.net/jdk/pull/1891


Re: RFR: 8231627: runtime/ErrorHandling/ThreadsListHandleInErrorHandlingTest.java fails because error occurred during printing all threads

2020-12-24 Thread Erik Österlund
On Thu, 24 Dec 2020 17:33:21 GMT, Daniel D. Daugherty  
wrote:

> A small robustness fix in ThreadsSMRSupport::print_info_on() to reduce the
> likelihood of crashes during error reporting. Uses Threads_lock->try_lock()
> for safety and restricts some reporting to when the Threads_lock has been
> acquired (depends on JDK-8256383). Uses a ThreadsListHandle to make
> the current ThreadsList safe for reporting (depends on JDK-8258284). Also
> detects when the system ThreadsList (_java_thread_list) has changed and
> will warn that some of the reported info may now be stale.
> 
> Two existing tests have been updated to reflect the use of a ThreadsListHandle
> in ThreadsSMRSupport::print_info_on(). Mach5 Tier[1-6] testing has no 
> regressions.

Looks good. We have something similar in the precious GC log code during error 
reporting.

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/1891


Re: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6]

2020-11-11 Thread Erik Österlund
On Wed, 11 Nov 2020 15:23:15 GMT, Daniel D. Daugherty  
wrote:

>> Our int types are really confused.  AvgMonitorsPerThreadEstimate is defined 
>> as an intx which is intptr_t and the range of it is 0..max_jint which is 0 
>> .. 0x7fff . jint is long on windows (the problematic type) and int on 
>> unix.  Since this is a new declaration, it probably should be something 
>> other than jint but what?
>> At any rate, it should be declared as 'static'.
>
> @coleenp - Nice catch on the missing 'static'.

I typically use size_t for entities that can scale with the size of the 
machine's memory, so I don't have to think about whether there are enough bits. 
Could AvgMonitorsPerThreadEstimate be uintx instead of intx? And then maybe we 
don't need to declare a range, as the natural range of the uintx seems 
perfectly valid.

-

PR: https://git.openjdk.java.net/jdk/pull/642


Re: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4]

2020-11-10 Thread Erik Österlund
On Tue, 10 Nov 2020 02:35:17 GMT, Daniel D. Daugherty  
wrote:

>> Changes from @fisk and @dcubed-ojdk to:
>> 
>> - simplify ObjectMonitor list management
>> - get rid of Type-Stable Memory (TSM)
>> 
>> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new 
>> regressions.
>> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint,
>> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano)
>>   - a few minor regressions (<= -0.24%)
>>   - Volano is 6.8% better
>> 
>> Eric C. has also done promotion perf runs on these bits and says "the 
>> results look fine".
>
> Daniel D. Daugherty has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   dholmes-ora - convert inner while loop to do-while loop in 
> unlink_deflated().

Looks very good! Thanks for picking this up and taking it all the way!

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/642


Re: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2]

2020-11-09 Thread Erik Österlund
On Sat, 7 Nov 2020 17:22:21 GMT, Daniel D. Daugherty  wrote:

>> src/hotspot/share/runtime/synchronizer.cpp line 1520:
>> 
>>> 1518: // deflated in this cycle.
>>> 1519: size_t deleted_count = 0;
>>> 1520: for (ObjectMonitor* monitor: delete_list) {
>> 
>> I didn't realize C++ has a "foreach" loop construct! Is this in our allowed 
>> C++ usage?
>
> @fisk - this one is for you... :-)

Yeah this is one of the new cool features we can use. I thought it is allowed, 
because it is neither in the excluded nor undecided list of features in our 
doc/hotspot-style.md file.

-

PR: https://git.openjdk.java.net/jdk/pull/642


Re: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2]

2020-11-09 Thread Erik Österlund
On Sat, 7 Nov 2020 17:11:42 GMT, Daniel D. Daugherty  wrote:

>> src/hotspot/share/runtime/synchronizer.cpp line 94:
>> 
>>> 92:   // Find next live ObjectMonitor.
>>> 93:   ObjectMonitor* next = m;
>>> 94:   while (next != NULL && next->is_being_async_deflated()) {
>> 
>> Nit: This loop seems odd. Given we know m is_being_async_deflated, this 
>> should either be a do/while loop, or else we should initialize:
>> 
>> ObjectMonitor* next = m->next_om();
>> 
>> and dispense with the awkwardly named next_next.
>
> @fisk - I'm leaving this one for you for now.

Changing it to a do/while loop makes sense. The while condition is always true 
the first iteration, so doing a while or do/while loop is equivalent. If you 
find the do/while loop easier to read, then that sounds good to me.

-

PR: https://git.openjdk.java.net/jdk/pull/642


Re: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2]

2020-11-09 Thread Erik Österlund
On Sat, 7 Nov 2020 17:01:42 GMT, Daniel D. Daugherty  wrote:

>> src/hotspot/share/runtime/objectMonitor.cpp line 380:
>> 
>>> 378:   if (event.should_commit()) {
>>> 379: event.set_monitorClass(object()->klass());
>>> 380: event.set_address((uintptr_t)this);
>> 
>> This looks wrong - the event should refer to the Object whose monitor we 
>> have entered, not the OM itself.
>
> I noticed that in my preliminary review of Erik's changes. He checked
> with the JFR guys and they said it just needed to be an address and
> does not have to refer to the Object.
> 
> @fisk - can you think of a comment we should add here?

We could write something along the lines of "An address that is 'unique 
enough', such that events close in time and with the same address are likely 
(but not guaranteed) to belong to the same object". This uniqueness property 
has always been more of a heuristic thing than anything else, as deflation 
shuffles the addresses around. Taking the this pointer vs an offset into the 
this pointer does however serve the exact same purpose; there was never any 
correlation to the contents of the object field.

>> src/hotspot/share/runtime/objectMonitor.cpp line 1472:
>> 
>>> 1470:   event->set_monitorClass(monitor->object()->klass());
>>> 1471:   event->set_timeout(timeout);
>>> 1472:   event->set_address((uintptr_t)monitor);
>> 
>> Again the event should refer to the Object, not the OM.
>
> I noticed that in my preliminary review of Erik's changes. He checked
> with the JFR guys and they said it just needed to be an address and
> does not have to refer to the Object.
> 
> @fisk - can you think of a comment we should add here?

I wrote one in the section above, hope it is useful.

-

PR: https://git.openjdk.java.net/jdk/pull/642


Integrated: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

2020-11-05 Thread Erik Österlund
On Thu, 29 Oct 2020 12:44:58 GMT, Erik Österlund  wrote:

> The imasm::remove_activation() call does not deal with safepoints very well. 
> However, when the MethodExit JVMTI event is being called, we call into the 
> runtime in the middle of remove_activation(). If the value being returned is 
> an object type, then the top-of-stack contains the oop. However, the GC does 
> not traverse said oop in any oop map, because it is simply not expected that 
> we safepoint in the middle of remove_activation().
> 
> The JvmtiExport::post_method_exit() function we end up calling, reads the 
> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
> that eventually call Java and a bunch of stuff that safepoints. So after the 
> JVMTI callback, we can expect the top-of-stack oop to be broken. 
> Unfortunately, when we continue, we therefore end up returning a broken oop.
> 
> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
> is wrong, as we can safepoint on the way back to Java, which will break the 
> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving 
> the transition to VM and back, into a block of code that is protected against 
> GC. Before the JRT_BLOCK is called, we stash away the return oop, and after 
> the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when 
> InterpreterRuntime::post_method_exit is called when throwing an exception, we 
> don't have the same problem of retaining an oop result, and hence the 
> JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is 
> the same as before for this path. 
> 
> This is a JVMTI bug that has probably been around for a long time. It crashes 
> with all GCs, but was discovered recently after concurrent stack processing, 
> as StefanK has been running better GC stressing code in JVMTI, and the bug 
> reproduced more easily with concurrent stack processing, as the timings were 
> a bit different. The following reproducer failed pretty much 100% of the time:
> while true; do make test JTREG="RETAIN=all" 
> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
> 
> With my fix I can run this repeatedly without any more failures. I have also 
> sanity checked the patch by running tier 1-5, so that it does not introduces 
> any new issues on its own. I have also used Stefan's nice external GC 
> stressing with jcmd technique that was used to trigger crashes with other 
> GCs, to make sure said crashes no longer reproduce either.

This pull request has now been integrated.

Changeset: 3a02578b
Author:Erik Österlund 
URL:   https://git.openjdk.java.net/jdk/commit/3a02578b
Stats: 58 lines in 3 files changed: 42 ins; 12 del; 4 mod

8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

Reviewed-by: coleenp, dlong, rrich, sspitsyn

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8212879: Make JVMTI TagMap table concurrent [v4]

2020-11-05 Thread Erik Österlund
On Wed, 4 Nov 2020 13:22:57 GMT, Coleen Phillimore  wrote:

>> For the GCs that call the num_dead notification in a pause it is much worse 
>> than what we had. As I pointed out elsewhere, it used to be that tagmap 
>> processing was all-in-one, as a single serial subtask taken by the first 
>> thread that reached it in WeakProcessor processing. Other threads would find 
>> that subtask taken and move on to processing oopstores in parallel with the 
>> tagmap processing. Now everything except the oopstorage-based clearing of 
>> dead entries is a single threaded serial task done by the VMThread, after 
>> all the parallel WeakProcessor work is done, because that's where the 
>> num-dead callbacks are invoked. WeakProcessor's parallel oopstorage 
>> processing doesn't have a way to do the num-dead callbacks by the last 
>> thread out of each parallel oopstorage processing. Instead it's left to the 
>> end, on the assumption that the callbacks are relatively cheap.  But that 
>> could still be much worse than the old code, since the tagmap oopstorage 
>> could be late in the order of processing,
  and so still effectively be a serial subtask after all the parallel subtasks 
are done or mostly done.
>
> Yes, you are right that the processing will be done serially and not by a 
> parallel worker thread.  This is could spawn a new GC worker thread to 
> process the posts, as you suggest.  We could do that if we find a customer 
> that has a complaint about the pause time of this processing.

So both before and now, this task is a single threaded task. The difference is 
that before that single threaded task could be performed in parallel to other 
tasks. So if the table is small, you probably won't be able to notice any 
difference as small table implies not much to do. And if the table is large, 
you still probably won't be able to notice any difference as a large table 
implies it will dominate the pause with both the old and new approach. Any 
difference at all is bounded at 2x processing time, as it was serial both 
before and after. But now if we have a perfectly medium balanced table, we can 
at the very worst observe a theoretical 2x worse processing of this JVMTI 
table. I think that if we truly did care about this difference, and that it is 
important to keep this code as well performed as possible, then we would not 
have a serial phase for this at all. The fact that this has been serial 
suggests to me that it is not a path that is critical, and therefore I don't 
think op
 timizing the theoretical max 2x worse processing times for perfectly medium 
sized JVMTI tag map tables, is worth the hassle. At least I can't see why this 
would be of any importance.

-

PR: https://git.openjdk.java.net/jdk/pull/967


Re: RFR: 8212879: Make JVMTI TagMap table concurrent [v4]

2020-11-05 Thread Erik Österlund
On Wed, 4 Nov 2020 13:32:07 GMT, Coleen Phillimore  wrote:

>> I know of nothing that leads to "presumably during GC" being a requirement. 
>> Having all pending events of some type occur before that type of event is 
>> disabled seems like a reasonable requirement, but just means that event 
>> disabling also requires the table to be "up to date", in the sense that any 
>> GC-cleared entries need to be dealt with. That can be handled just like 
>> other operations that use the table contents, rather than during the GC.  
>> That is, use post_dead_object_on_vm_thread if there are or might be any 
>> pending dead objects, before disabling the event.
>
> Ok, so there were many test failures with other approaches.  Having GC 
> trigger the posting was the most reliable way to post the events when the 
> tests (and presumably the jvmti customers) expected the events to be posted.  
> We could revisit during event disabling if a customer complains about GC 
> pause times.

The point of this change was not necessarily to be lazy about updating the 
tagmap, until someone uses it. The point was to get rid of the last annoying 
serial GC phase. Doing it all lazily would certainly also achieve that. But it 
would also lead to situations where no event is ever posted from GC to GC. So 
you would get the event 20 GCs later, which might come as a surprise. It did 
come as a surprise to some tests, so it is reasonable to assume it would come 
as a surprise to users too. And I don't think we want such surprises unless we 
couldn't deal with them. And we can.

-

PR: https://git.openjdk.java.net/jdk/pull/967


Re: RFR: 8212879: Make JVMTI TagMap table concurrent [v4]

2020-11-05 Thread Erik Österlund
On Tue, 3 Nov 2020 21:14:04 GMT, Coleen Phillimore  wrote:

>> src/hotspot/share/prims/jvmtiTagMap.cpp line 3018:
>> 
>>> 3016: }
>>> 3017: // Later GC code will relocate the oops, so defer rehashing 
>>> until then.
>>> 3018: tag_map->_needs_rehashing = true;
>> 
>> This is wrong for some collectors. I think all collectors ought to be 
>> calling set_needs_rehashing in appropriate places, and it can't be be 
>> correctly piggybacked on the num-dead callback. (See discussion above for 
>> that function.)
>> 
>> For example, G1 remark pause does weak processing (including weak 
>> oopstorage) and will call the num-dead callback, but does not move objects, 
>> so does not require tagmap rehashing.
>> 
>> (I think CMS oldgen remark may have been similar, for what that's worth.)
>
> Ok, so I'm going to need help to know where in all the different GCs to make 
> this call.  This seemed simpler at the expense of maybe causing a rehash at 
> some points when it might not be necessary.

For what GC is this wrong? I can see that it might yield more work than 
required, when performing a full GC, but not that it would do too little work. 
In other words, I can't see how it is wrong, as opposed to inaccurate. 
Littering GCs with JVMTI hooks so that we can optimize away an operation we do 
every young GC, from a full GC, does not really seem worth it IMO.

-

PR: https://git.openjdk.java.net/jdk/pull/967


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2]

2020-11-04 Thread Erik Österlund
On Tue, 3 Nov 2020 17:49:38 GMT, Serguei Spitsyn  wrote:

> Hi Erik,
> 
> I'm not sure, if this fragment is still needed:
> 
> ```
> 1620   if (state == NULL || !state->is_interp_only_mode()) {
> 1621 // for any thread that actually wants method exit, interp_only_mode 
> is set
> 1622 return;
> 1623   }
> ```

Seems like it is not needed. I removed it.

> Also, can it be that this condition is true:
> ` (state == NULL || !state->is_interp_only_mode())`
> but the top frame is interpreted?
> If so, then should we still safe/restore the result oop over a possible 
> safepoint?

It could definitely be that the top frame is interpreted even though that 
condition is true. However, we now enter InterpreterRuntime::post_method_exit 
as a JRT_BLOCK_ENTRY call, which performs no transition (similar to JRT_LEAF). 
So if so we should just return back to the caller without doing anything, and 
no GC will happen in this path then. It is only when we perform the JRT_BLOCK 
and JRT_BLOCK_END that we allow GCs to happen, and we save/restore the result 
across that section.

Thanks,


> Thanks,
> Serguei

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v4]

2020-11-04 Thread Erik Österlund
> The imasm::remove_activation() call does not deal with safepoints very well. 
> However, when the MethodExit JVMTI event is being called, we call into the 
> runtime in the middle of remove_activation(). If the value being returned is 
> an object type, then the top-of-stack contains the oop. However, the GC does 
> not traverse said oop in any oop map, because it is simply not expected that 
> we safepoint in the middle of remove_activation().
> 
> The JvmtiExport::post_method_exit() function we end up calling, reads the 
> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
> that eventually call Java and a bunch of stuff that safepoints. So after the 
> JVMTI callback, we can expect the top-of-stack oop to be broken. 
> Unfortunately, when we continue, we therefore end up returning a broken oop.
> 
> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
> is wrong, as we can safepoint on the way back to Java, which will break the 
> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving 
> the transition to VM and back, into a block of code that is protected against 
> GC. Before the JRT_BLOCK is called, we stash away the return oop, and after 
> the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when 
> InterpreterRuntime::post_method_exit is called when throwing an exception, we 
> don't have the same problem of retaining an oop result, and hence the 
> JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is 
> the same as before for this path. 
> 
> This is a JVMTI bug that has probably been around for a long time. It crashes 
> with all GCs, but was discovered recently after concurrent stack processing, 
> as StefanK has been running better GC stressing code in JVMTI, and the bug 
> reproduced more easily with concurrent stack processing, as the timings were 
> a bit different. The following reproducer failed pretty much 100% of the time:
> while true; do make test JTREG="RETAIN=all" 
> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
> 
> With my fix I can run this repeatedly without any more failures. I have also 
> sanity checked the patch by running tier 1-5, so that it does not introduces 
> any new issues on its own. I have also used Stefan's nice external GC 
> stressing with jcmd technique that was used to trigger crashes with other 
> GCs, to make sure said crashes no longer reproduce either.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Serguei CR2: Don't check interpreted only

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/930/files
  - new: https://git.openjdk.java.net/jdk/pull/930/files/4d68c624..95306514

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=930=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=930=02-03

  Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/930.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2]

2020-11-03 Thread Erik Österlund
On Mon, 2 Nov 2020 21:00:23 GMT, Serguei Spitsyn  wrote:

> Erik,
> 
> Thank you for the update! It looks more elegant.
> 
> One concern is that after the move of this fragment to the 
> post_method_exit_inner:
> 
> ```
> 1614   if (state == NULL || !state->is_interp_only_mode()) {
> 1615 // for any thread that actually wants method exit, interp_only_mode 
> is set
> 1616 return;
> 1617   }
> ```
> 
> there is no guarantee that the current frame is interpreted below:
> 
> ```
> 1580 if (!exception_exit) {
> 1581   oop oop_result;
> 1582   BasicType type = 
> current_frame.interpreter_frame_result(_result, );
>  . . .
> 1597   if (result.not_null() && !mh->is_native()) {
> 1598 // We have to restore the oop on the stack for interpreter frames
> 1599 *(oop*)current_frame.interpreter_frame_tos_address() = result();
> 1600   }
> ```
> 
> Probably, extra checks for current_frame.is_interpreted_frame() in these 
> fragments will be sufficient.

That makes sense. Added a check in the latest version that we are in interp 
only mode.

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v3]

2020-11-03 Thread Erik Österlund
> The imasm::remove_activation() call does not deal with safepoints very well. 
> However, when the MethodExit JVMTI event is being called, we call into the 
> runtime in the middle of remove_activation(). If the value being returned is 
> an object type, then the top-of-stack contains the oop. However, the GC does 
> not traverse said oop in any oop map, because it is simply not expected that 
> we safepoint in the middle of remove_activation().
> 
> The JvmtiExport::post_method_exit() function we end up calling, reads the 
> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
> that eventually call Java and a bunch of stuff that safepoints. So after the 
> JVMTI callback, we can expect the top-of-stack oop to be broken. 
> Unfortunately, when we continue, we therefore end up returning a broken oop.
> 
> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
> is wrong, as we can safepoint on the way back to Java, which will break the 
> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving 
> the transition to VM and back, into a block of code that is protected against 
> GC. Before the JRT_BLOCK is called, we stash away the return oop, and after 
> the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when 
> InterpreterRuntime::post_method_exit is called when throwing an exception, we 
> don't have the same problem of retaining an oop result, and hence the 
> JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is 
> the same as before for this path. 
> 
> This is a JVMTI bug that has probably been around for a long time. It crashes 
> with all GCs, but was discovered recently after concurrent stack processing, 
> as StefanK has been running better GC stressing code in JVMTI, and the bug 
> reproduced more easily with concurrent stack processing, as the timings were 
> a bit different. The following reproducer failed pretty much 100% of the time:
> while true; do make test JTREG="RETAIN=all" 
> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
> 
> With my fix I can run this repeatedly without any more failures. I have also 
> sanity checked the patch by running tier 1-5, so that it does not introduces 
> any new issues on its own. I have also used Stefan's nice external GC 
> stressing with jcmd technique that was used to trigger crashes with other 
> GCs, to make sure said crashes no longer reproduce either.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Serguei CR1: Check interpreted only

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/930/files
  - new: https://git.openjdk.java.net/jdk/pull/930/files/ae6355fd..4d68c624

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=930=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=930=01-02

  Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/930.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8212879: Make JVMTI TagMap table concurrent [v3]

2020-11-03 Thread Erik Österlund
On Tue, 3 Nov 2020 12:58:22 GMT, Coleen Phillimore  wrote:

>> This change turns the HashTable that JVMTI uses for object tagging into a 
>> regular Hotspot hashtable - the one in hashtable.hpp with resizing and 
>> rehashing.   Instead of pointing directly to oops so that GC has to walk the 
>> table to follow oops and then to rehash the table, this table points to 
>> WeakHandle.  GC walks the backing OopStorages concurrently.
>> 
>> The hash function for the table is a hash of the lower 32 bits of the 
>> address.  A flag is set during GC (gc_notification if in a safepoint, and 
>> through a call to JvmtiTagMap::needs_processing()) so that the table is 
>> rehashed at the next use.
>> 
>> The gc_notification mechanism of weak oop processing is used to notify Jvmti 
>> to post ObjectFree events.  In concurrent GCs there can be a window of time 
>> between weak oop marking where the oop is unmarked, so dead (the phantom 
>> load in peek returns NULL) but the gc_notification hasn't been done yet.  In 
>> this window, a heap walk or GetObjectsWithTags call would not find an object 
>> before the ObjectFree event is posted.  This is dealt with in two ways:
>> 
>> 1. In the Heap walk, there's an unconditional table walk to post events if 
>> events are needed to post.
>> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is 
>> required, we use the VM thread to post the event.
>> 
>> Event posting cannot be done in a JavaThread because the posting needs to be 
>> done while holding the table lock, so that the JvmtiEnv state doesn't change 
>> before posting is done.  ObjectFree callbacks are limited in what they can 
>> do as per the JVMTI Specification.  The allowed callbacks to the VM already 
>> have code to allow NonJava threads.
>> 
>> To avoid rehashing, I also tried to use object->identity_hash() but this 
>> breaks because entries can be added to the table during heapwalk, where the 
>> objects use marking.  The starting markWord is saved and restored.  Adding a 
>> hashcode during this operation makes restoring the former markWord (locked, 
>> inflated, etc) too complicated.  Plus we don't want all these objects to 
>> have hashcodes because locking operations after tagging would have to always 
>> use inflated locks.
>> 
>> Much of this change is to remove serial weak oop processing for the 
>> weakProcessor, ZGC and Shenandoah.  The GCs have been stress tested with 
>> jvmti code.
>> 
>> It has also been tested with tier1-6.
>> 
>> Thank you to Stefan, Erik and Kim for their help with this change.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   More review comments from Stefan and ErikO

Marked as reviewed by eosterlund (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/967


Re: RFR: 8212879: Make JVMTI TagMap table concurrent [v2]

2020-11-03 Thread Erik Österlund
On Mon, 2 Nov 2020 16:22:57 GMT, Coleen Phillimore  wrote:

>> src/hotspot/share/prims/jvmtiTagMap.cpp line 345:
>> 
>>> 343: 
>>> 344:   // Check if we have to process for concurrent GCs.
>>> 345:   check_hashmap(false);
>> 
>> Maybe add a comment stating the parameter name, as was done in other 
>> callsites for check_hashmap.
>
> Ok, will I run afoul of the ZGC people putting the parameter name after the 
> parameter and the rest of the code, it is before?

ZGC people passionately place the comment after the argument value.

-

PR: https://git.openjdk.java.net/jdk/pull/967


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2]

2020-11-02 Thread Erik Österlund
On Mon, 2 Nov 2020 16:19:59 GMT, Coleen Phillimore  wrote:

>> Erik Österlund has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Coleen CR1: Refactoring
>
> src/hotspot/share/prims/jvmtiExport.cpp line 1570:
> 
>> 1568:   // return a flag when a method terminates by throwing an exception
>> 1569:   // i.e. if an exception is thrown and it's not caught by the current 
>> method
>> 1570:   bool exception_exit = state->is_exception_detected() && 
>> !state->is_exception_caught();
> 
> So this only applies to the case where the post_method_exit comes from 
> remove_activation?  Good to have it only on this path in this case.

I'm not sure. There might be other cases, when remove_activation is called by 
the exception code. That's why I didn't want to change it to just true in this 
path.

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8212879: Make JVMTI TagMap table concurrent [v2]

2020-11-02 Thread Erik Österlund
On Mon, 2 Nov 2020 12:50:23 GMT, Coleen Phillimore  wrote:

>> src/hotspot/share/prims/jvmtiTagMap.cpp line 954:
>> 
>>> 952:  o->klass()->external_name());
>>> 953: return;
>>> 954:   }
>> 
>> Why is this done as a part of this RFE? Is this a bug fix that should be 
>> done as a separate patch?
>
> Because it crashed with my changes and didn't without.  I cannot recollect 
> why.

I thought that we didn't load the archived heap from CDS, if JVMTI heap walker 
capabilities are in place, as we didn't want this kind of interactions. But 
maybe I'm missing something, since you said having this if statement here made 
a difference.

-

PR: https://git.openjdk.java.net/jdk/pull/967


Re: RFR: 8212879: Make JVMTI TagMap table concurrent [v2]

2020-11-02 Thread Erik Österlund
On Mon, 2 Nov 2020 15:58:15 GMT, Coleen Phillimore  wrote:

>> This change turns the HashTable that JVMTI uses for object tagging into a 
>> regular Hotspot hashtable - the one in hashtable.hpp with resizing and 
>> rehashing.   Instead of pointing directly to oops so that GC has to walk the 
>> table to follow oops and then to rehash the table, this table points to 
>> WeakHandle.  GC walks the backing OopStorages concurrently.
>> 
>> The hash function for the table is a hash of the lower 32 bits of the 
>> address.  A flag is set during GC (gc_notification if in a safepoint, and 
>> through a call to JvmtiTagMap::needs_processing()) so that the table is 
>> rehashed at the next use.
>> 
>> The gc_notification mechanism of weak oop processing is used to notify Jvmti 
>> to post ObjectFree events.  In concurrent GCs there can be a window of time 
>> between weak oop marking where the oop is unmarked, so dead (the phantom 
>> load in peek returns NULL) but the gc_notification hasn't been done yet.  In 
>> this window, a heap walk or GetObjectsWithTags call would not find an object 
>> before the ObjectFree event is posted.  This is dealt with in two ways:
>> 
>> 1. In the Heap walk, there's an unconditional table walk to post events if 
>> events are needed to post.
>> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is 
>> required, we use the VM thread to post the event.
>> 
>> Event posting cannot be done in a JavaThread because the posting needs to be 
>> done while holding the table lock, so that the JvmtiEnv state doesn't change 
>> before posting is done.  ObjectFree callbacks are limited in what they can 
>> do as per the JVMTI Specification.  The allowed callbacks to the VM already 
>> have code to allow NonJava threads.
>> 
>> To avoid rehashing, I also tried to use object->identity_hash() but this 
>> breaks because entries can be added to the table during heapwalk, where the 
>> objects use marking.  The starting markWord is saved and restored.  Adding a 
>> hashcode during this operation makes restoring the former markWord (locked, 
>> inflated, etc) too complicated.  Plus we don't want all these objects to 
>> have hashcodes because locking operations after tagging would have to always 
>> use inflated locks.
>> 
>> Much of this change is to remove serial weak oop processing for the 
>> weakProcessor, ZGC and Shenandoah.  The GCs have been stress tested with 
>> jvmti code.
>> 
>> It has also been tested with tier1-6.
>> 
>> Thank you to Stefan, Erik and Kim for their help with this change.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Code review comments from StefanK.

Looks great in general. Great work Coleen, and thanks again for fixing this. I 
like all the red lines in the GC code. I added a few nits/questions.

test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/cm02t001.cpp
 line 656:

> 654: result = NSK_FALSE;
> 655: 
> 656: printf("Object free events %d\n", ObjectFreeEventsCount);

Is this old debug info you forgot to remove? Other code seems to use 
NSK_DISPLAY macros instead.

src/hotspot/share/prims/jvmtiTagMap.cpp line 345:

> 343: 
> 344:   // Check if we have to process for concurrent GCs.
> 345:   check_hashmap(false);

Maybe add a comment stating the parameter name, as was done in other callsites 
for check_hashmap.

src/hotspot/share/prims/jvmtiTagMap.cpp line 3009:

> 3007:   // Lock each hashmap from concurrent posting and cleaning
> 3008:   MutexLocker ml(tag_map->lock(), 
> Mutex::_no_safepoint_check_flag);
> 3009:   tag_map->hashmap()->unlink_and_post(tag_map->env());

This could call unlink_and_post_locked instead of manually locking.

-

Changes requested by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/967


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2]

2020-11-02 Thread Erik Österlund
On Sat, 31 Oct 2020 09:54:09 GMT, Serguei Spitsyn  wrote:

> Hi Erik,
> 
> Nice discovery! Indeed, this is a long standing issue. It looks good in 
> general.
> I agree with Coleen, it would be nice if there is an elegant way to move the 
> oop_result saving/restoring code to InterpreterRuntime::post_method_exit. 
> Otherwise, I'm okay with what you have now.
> It is also nice discovery of the issue with clearing the expression stack. I 
> think, it was my mistake in the initial implementation of the 
> ForceEarlyReturn when I followed the PopFrame implementation pattern. It is 
> good to separate it from the current fix.
> 
> Thanks,
> Serguei

Thanks for reviewing this Serguei. And thanks for confirming our suspicions 
regarding clearing of the expression stack. I wasn't sure if anyone would be 
around that knew how it ended up there!
I made the refactoring that you and Coleen wanted, I think. Hope you like it!

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2]

2020-11-02 Thread Erik Österlund
On Fri, 30 Oct 2020 14:09:46 GMT, Erik Österlund  wrote:

>> Oh that's actually horrible.   I wonder if it's possible to hoist saving the 
>> result oop into the InterpreterRuntime entry.  And pass the Handle into 
>> JvmtiExport::post_method_exit().
>
> I tried that first, and ended up with a bunch of non-trivial code duplication 
> instead, as reading the oop is done in both paths but for different reasons. 
> One to preserve/restore it (interpreter remove_activation entry), but also 
> inside of JvmtiExport::post_method_exit() so that it can be passed into the 
> MethodExit. I will give it another shot and see if it is possible to refactor 
> it in a better way.

I uploaded a CR that does pretty much what you suggested, ish. Hope you like it!

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2]

2020-11-02 Thread Erik Österlund
On Sat, 31 Oct 2020 09:54:09 GMT, Serguei Spitsyn  wrote:

>> Erik Österlund has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Coleen CR1: Refactoring
>
> Hi Erik,
> 
> Nice discovery! Indeed, this is a long standing issue. It looks good in 
> general.
> I agree with Coleen, it would be nice if there is an elegant way to move the 
> oop_result saving/restoring code to InterpreterRuntime::post_method_exit. 
> Otherwise, I'm okay with what you have now.
> It is also nice discovery of the issue with clearing the expression stack. I 
> think, it was my mistake in the initial implementation of the 
> ForceEarlyReturn when I followed the PopFrame implementation pattern. It is 
> good to separate it from the current fix.
> 
> Thanks,
> Serguei

I uploaded a new commit to perform some refactoring as requested by Coleen and 
Serguei. I made the oop save/restore + JRT_BLOCK logic belong only to the path 
taken from InterpreterRuntime::post_method_exit. An inner posting method is 
called both from that path and from 
JvmtiExport::notice_unwind_due_to_exception. I think the result is an 
improvement in terms of how clear it is. I didn't want to move logic all the 
way back to InterpreterRuntime::post_method_exit though, as I don't think it 
looks pretty to have large chunks of JVMTI implementation details in the 
interpreterRuntime.cpp file. So I did basically what you suggested, with the 
slight difference of moving all the JVMTI implementation into the JVMTI file 
instead, which is just called from InterpreterRuntime::post_method_exit. Hope 
you are okay with this!

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2]

2020-11-02 Thread Erik Österlund
> The imasm::remove_activation() call does not deal with safepoints very well. 
> However, when the MethodExit JVMTI event is being called, we call into the 
> runtime in the middle of remove_activation(). If the value being returned is 
> an object type, then the top-of-stack contains the oop. However, the GC does 
> not traverse said oop in any oop map, because it is simply not expected that 
> we safepoint in the middle of remove_activation().
> 
> The JvmtiExport::post_method_exit() function we end up calling, reads the 
> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
> that eventually call Java and a bunch of stuff that safepoints. So after the 
> JVMTI callback, we can expect the top-of-stack oop to be broken. 
> Unfortunately, when we continue, we therefore end up returning a broken oop.
> 
> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
> is wrong, as we can safepoint on the way back to Java, which will break the 
> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving 
> the transition to VM and back, into a block of code that is protected against 
> GC. Before the JRT_BLOCK is called, we stash away the return oop, and after 
> the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when 
> InterpreterRuntime::post_method_exit is called when throwing an exception, we 
> don't have the same problem of retaining an oop result, and hence the 
> JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is 
> the same as before for this path. 
> 
> This is a JVMTI bug that has probably been around for a long time. It crashes 
> with all GCs, but was discovered recently after concurrent stack processing, 
> as StefanK has been running better GC stressing code in JVMTI, and the bug 
> reproduced more easily with concurrent stack processing, as the timings were 
> a bit different. The following reproducer failed pretty much 100% of the time:
> while true; do make test JTREG="RETAIN=all" 
> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
> 
> With my fix I can run this repeatedly without any more failures. I have also 
> sanity checked the patch by running tier 1-5, so that it does not introduces 
> any new issues on its own. I have also used Stefan's nice external GC 
> stressing with jcmd technique that was used to trigger crashes with other 
> GCs, to make sure said crashes no longer reproduce either.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Coleen CR1: Refactoring

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/930/files
  - new: https://git.openjdk.java.net/jdk/pull/930/files/d7500082..ae6355fd

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=930=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=930=00-01

  Stats: 41 lines in 2 files changed: 13 ins; 19 del; 9 mod
  Patch: https://git.openjdk.java.net/jdk/pull/930.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

2020-10-30 Thread Erik Österlund
On Fri, 30 Oct 2020 16:02:41 GMT, Erik Österlund  wrote:

> > Hi Erik,
> > is it possible for GC to mistake a primitive value for a reference when 
> > posting the exit event?
> > My understanding is: we are at a random bci of a method that is forced to 
> > return early. The expression stack is emptied and the return value is 
> > pushed on the expression stack then we call into the interpreter runtime to 
> > post the JVMTI method exit event during which we come to a safepoint for 
> > GC. The oop map for the bci does not cover this forced early return and if 
> > the return value is an object then the reference pushed on the expression 
> > stack before is not updated by GC. With your fix the value is updated if it 
> > is a reference.
> > If this is correct then to me it appears as if GC can also crash because 
> > the oop map for the random bci tells there has to be a reference at the 
> > stack position of the return value if it actually is a primitive value.
> 
> I think what you are saying is true. Note though that the return value of 
> ForceEarlyReturn is installed with a handshake. The handshake polls of the 
> interpreter are emitted in loop backedges and returns. At loop backedges, the 
> expression stack is empty (required by OSR), and at returns the types match 
> correctly. However, if an arbitrary bytecode performs a runtime call with 
> call_VM() while the bottom of the expression stack is an oop, then I think 
> there is an issue. At that call_VM, the early return value could get 
> installed, and when the C++ function returns, we check for early returns, 
> further dispatching to an unwind routine that posts the MethodExit 
> notification. If we GC during this MethodExit notification, then I think you 
> can crash the GC. The GC code generates an oop map for the frame, checking 
> what the types in the expression stack should be. The early return int is 
> pushed on the slot intersecting with the bottom entry in the expression 
> stack. That bottom entry could be an
  oop, and the early return value could be an int. Then the early return int 
will be passed to the oop closure, which should result in a crash.
> 
> So I suspect that almost always, the handshake installing the 
> ForceEarlyReturn value is installed with a handshake in a bytecode backedge 
> or at a return, where the interpreter safepoint polls for the fast path code. 
> Then you won't notice the issue. But in the rare scenario that the 
> ForceEarlyReturn value is installed in a slow path call from a random 
> bytecode... I can't see how that would work correctly.

Looking more closely, I gotta say I have no idea why there is a call to clear 
the expression stack at all in 
TemplateInterpreterGenerator::generate_earlyret_entry_for(). It is unclear to 
me what problem if any that solves. It does however seem like it introduces 
this problem of having a forced int return value intersect with an oop in the 
expression stack for a BCI performing slow-path calls into the VM. Simply 
removing the code that clears the expression stack, removes the issue, and I 
can't see that it introduces any other issue. Anyway, I think this is a 
separate bug. Do you mind if I push the fix for that bug, as a different RFR? 
It will likely involve poking around at more platform dependent code.

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

2020-10-30 Thread Erik Österlund
On Fri, 30 Oct 2020 14:20:42 GMT, Erik Österlund  wrote:

>> Marked as reviewed by dlong (Reviewer).
>
>> I think you've discovered JDK-6449023. And you fix looks like the workaround 
>> I tried:
>> https://bugs.openjdk.java.net/browse/JDK-6449023?focusedCommentId=14206078=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14206078
> 
> Oh wow. I had no idea people have been having issues with this since 2009! 
> Thanks for the pointer. Well, let's hope we can finally close it now after 
> marinating the bug for 11 years.

> Hi Erik,
> 
> is it possible for GC to mistake a primitive value for a reference when 
> posting the exit event?
> 
> My understanding is: we are at a random bci of a method that is forced to 
> return early. The expression stack is emptied and the return value is pushed 
> on the expression stack then we call into the interpreter runtime to post the 
> JVMTI method exit event during which we come to a safepoint for GC. The oop 
> map for the bci does not cover this forced early return and if the return 
> value is an object then the reference pushed on the expression stack before 
> is not updated by GC. With your fix the value is updated if it is a reference.
> 
> If this is correct then to me it appears as if GC can also crash because the 
> oop map for the random bci tells there has to be a reference at the stack 
> position of the return value if it actually is a primitive value.

I think what you are saying is true. Note though that the return value of 
ForceEarlyReturn is installed with a handshake. The handshake polls of the 
interpreter are emitted in loop backedges and returns. At loop backedges, the 
expression stack is empty (required by OSR), and at returns the types match 
correctly. However, if an arbitrary bytecode performs a runtime call with 
call_VM() while the bottom of the expression stack is an oop, then I think 
there is an issue. At that call_VM, the early return value could get installed, 
and when the C++ function returns, we check for early returns, further 
dispatching to an unwind routine that posts the MethodExit notification. If we 
GC during this MethodExit notification, then I think you can crash the GC. The 
GC code generates an oop map for the frame, checking what the types in the 
expression stack should be. The early return int is pushed on the slot 
intersecting with the bottom entry in the expression stack. That bottom entry 
could be an o
 op, and the early return value could be an int. Then the early return int will 
be passed to the oop closure, which should result in a crash.

So I suspect that almost always, the handshake installing the ForceEarlyReturn 
value is installed with a handshake in a bytecode backedge or at a return, 
where the interpreter safepoint polls for the fast path code. Then you won't 
notice the issue. But in the rare scenario that the ForceEarlyReturn value is 
installed in a slow path call from a random bytecode... I can't see how that 
would work correctly.

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

2020-10-30 Thread Erik Österlund
On Fri, 30 Oct 2020 08:49:08 GMT, Dean Long  wrote:

>> The imasm::remove_activation() call does not deal with safepoints very well. 
>> However, when the MethodExit JVMTI event is being called, we call into the 
>> runtime in the middle of remove_activation(). If the value being returned is 
>> an object type, then the top-of-stack contains the oop. However, the GC does 
>> not traverse said oop in any oop map, because it is simply not expected that 
>> we safepoint in the middle of remove_activation().
>> 
>> The JvmtiExport::post_method_exit() function we end up calling, reads the 
>> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
>> that eventually call Java and a bunch of stuff that safepoints. So after the 
>> JVMTI callback, we can expect the top-of-stack oop to be broken. 
>> Unfortunately, when we continue, we therefore end up returning a broken oop.
>> 
>> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
>> is wrong, as we can safepoint on the way back to Java, which will break the 
>> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, 
>> moving the transition to VM and back, into a block of code that is protected 
>> against GC. Before the JRT_BLOCK is called, we stash away the return oop, 
>> and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path 
>> when InterpreterRuntime::post_method_exit is called when throwing an 
>> exception, we don't have the same problem of retaining an oop result, and 
>> hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the 
>> logic is the same as before for this path. 
>> 
>> This is a JVMTI bug that has probably been around for a long time. It 
>> crashes with all GCs, but was discovered recently after concurrent stack 
>> processing, as StefanK has been running better GC stressing code in JVMTI, 
>> and the bug reproduced more easily with concurrent stack processing, as the 
>> timings were a bit different. The following reproducer failed pretty much 
>> 100% of the time:
>> while true; do make test JTREG="RETAIN=all" 
>> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
>> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
>> 
>> With my fix I can run this repeatedly without any more failures. I have also 
>> sanity checked the patch by running tier 1-5, so that it does not introduces 
>> any new issues on its own. I have also used Stefan's nice external GC 
>> stressing with jcmd technique that was used to trigger crashes with other 
>> GCs, to make sure said crashes no longer reproduce either.
>
> Marked as reviewed by dlong (Reviewer).

> I think you've discovered JDK-6449023. And you fix looks like the workaround 
> I tried:
> https://bugs.openjdk.java.net/browse/JDK-6449023?focusedCommentId=14206078=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14206078

Oh wow. I had no idea people have been having issues with this since 2009! 
Thanks for the pointer. Well, let's hope we can finally close it now after 
marinating the bug for 11 years.

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

2020-10-30 Thread Erik Österlund
On Fri, 30 Oct 2020 00:58:06 GMT, Coleen Phillimore  wrote:

>> Thanks for having a look coleen. In fact, not doing the JRT_BLOCK for the 
>> exception entry is intentional, because that entry goes through a different 
>> JRT_ENTRY (not JRT_BLOCK_ENTRY), that already transitions. So if I do the 
>> JRT_BLOCK for the exception path, it asserts saying hey you are already in 
>> VM.
>
> Oh that's actually horrible.   I wonder if it's possible to hoist saving the 
> result oop into the InterpreterRuntime entry.  And pass the Handle into 
> JvmtiExport::post_method_exit().

I tried that first, and ended up with a bunch of non-trivial code duplication 
instead, as reading the oop is done in both paths but for different reasons. 
One to preserve/restore it (interpreter remove_activation entry), but also 
inside of JvmtiExport::post_method_exit() so that it can be passed into the 
MethodExit. I will give it another shot and see if it is possible to refactor 
it in a better way.

-

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

2020-10-29 Thread Erik Österlund
On Thu, 29 Oct 2020 21:23:12 GMT, Coleen Phillimore  wrote:

>> The imasm::remove_activation() call does not deal with safepoints very well. 
>> However, when the MethodExit JVMTI event is being called, we call into the 
>> runtime in the middle of remove_activation(). If the value being returned is 
>> an object type, then the top-of-stack contains the oop. However, the GC does 
>> not traverse said oop in any oop map, because it is simply not expected that 
>> we safepoint in the middle of remove_activation().
>> 
>> The JvmtiExport::post_method_exit() function we end up calling, reads the 
>> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, 
>> that eventually call Java and a bunch of stuff that safepoints. So after the 
>> JVMTI callback, we can expect the top-of-stack oop to be broken. 
>> Unfortunately, when we continue, we therefore end up returning a broken oop.
>> 
>> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, 
>> is wrong, as we can safepoint on the way back to Java, which will break the 
>> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, 
>> moving the transition to VM and back, into a block of code that is protected 
>> against GC. Before the JRT_BLOCK is called, we stash away the return oop, 
>> and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path 
>> when InterpreterRuntime::post_method_exit is called when throwing an 
>> exception, we don't have the same problem of retaining an oop result, and 
>> hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the 
>> logic is the same as before for this path. 
>> 
>> This is a JVMTI bug that has probably been around for a long time. It 
>> crashes with all GCs, but was discovered recently after concurrent stack 
>> processing, as StefanK has been running better GC stressing code in JVMTI, 
>> and the bug reproduced more easily with concurrent stack processing, as the 
>> timings were a bit different. The following reproducer failed pretty much 
>> 100% of the time:
>> while true; do make test JTREG="RETAIN=all" 
>> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>>  TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
>> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 
>> 
>> With my fix I can run this repeatedly without any more failures. I have also 
>> sanity checked the patch by running tier 1-5, so that it does not introduces 
>> any new issues on its own. I have also used Stefan's nice external GC 
>> stressing with jcmd technique that was used to trigger crashes with other 
>> GCs, to make sure said crashes no longer reproduce either.
>
> src/hotspot/share/prims/jvmtiExport.cpp line 1600:
> 
>> 1598: 
>> 1599:   if (exception_exit) {
>> 1600: post_method_exit_inner(thread, mh, state, exception_exit, 
>> current_frame, result, value);
> 
> I think for exception exit, you also need JRT_BLOCK because you want the 
> transition to thread_in_VM for this code, since JRT_BLOCK_ENTRY doesn't do 
> the transition.  It should be safe for exception exit and retain the old 
> behavior.

Thanks for having a look coleen. In fact, not doing the JRT_BLOCK for the 
exception entry is intentional, because that entry goes through a different 
JRT_ENTRY (not JRT_BLOCK_ENTRY), that already transitions. So if I do the 
JRT_BLOCK for the exception path, it asserts saying hey you are already in VM.

-

PR: https://git.openjdk.java.net/jdk/pull/930


Integrated: 8255243: Reinforce escape barrier interactions with ZGC conc stack processing

2020-10-29 Thread Erik Österlund
On Fri, 23 Oct 2020 10:25:43 GMT, Erik Österlund  wrote:

> The escape barrier reallocates scalarized objects potentially deep into the 
> stack of a remote thread. Each allocation can safepoint, causing referenced 
> frames to be invalid. Some sprinklings were added that deal with that, but I 
> believe it was subsequently broken with the integration of the new vector 
> API, that has its own new deoptimization code that did not know about this. 
> Not surprisingly, the integration of the new vector API had no idea about 
> this subtlety, and allocates an object, and then reads an object deep from 
> the stack of a remote thread (using an escape barrier). I suppose the issue 
> is that all these 3 things were integrated at almost the same time. The 
> problematic code sequence is in VectorSupport::allocate_vector() in 
> vectorSupport.cpp, which is called from Deoptimization::realloc_objects(). It 
> first allocates an oop (possibly safepointing), and then reads a vector oop 
> from the stack. This is usually fine, but not through the escape barrier, 
> with concurrent stack sc
 anning. While I have not seen any crashes yet, I can see from code inspection, 
that there is no way that this works correctly.
> 
> In order to make this less fragile for future changes, we should really have 
> a RAII object that keeps the target thread's stack of the escape barrier, 
> stable and processed, across safepoints. This patch fixes that. Then it 
> becomes much easier to reason about its correctness, compared to hoping the 
> various hooks are applied after each safepoint.
> 
> With this new robustness fix, the thread running the escape barrier, keeps 
> the target thread stack processed, straight through safepoints on the 
> requesting thread, making it easy and intuitive to understand why this works 
> correctly. The RAII object basically just has to cover the code block that 
> pokes at the remote stack and goes in and out of safepoints, arbitrarily. 
> Arguably, this escape barrier doesn't need to be blazingly fast, and can 
> afford keeping stacks sane through its operation.

This pull request has now been integrated.

Changeset: 5b185585
Author:Erik Österlund 
URL:   https://git.openjdk.java.net/jdk/commit/5b185585
Stats: 268 lines in 15 files changed: 164 ins; 72 del; 32 mod

8255243: Reinforce escape barrier interactions with ZGC conc stack processing

Co-authored-by: Richard Reingruber 
Reviewed-by: rrich, sspitsyn

-

PR: https://git.openjdk.java.net/jdk/pull/832


RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

2020-10-29 Thread Erik Österlund
The imasm::remove_activation() call does not deal with safepoints very well. 
However, when the MethodExit JVMTI event is being called, we call into the 
runtime in the middle of remove_activation(). If the value being returned is an 
object type, then the top-of-stack contains the oop. However, the GC does not 
traverse said oop in any oop map, because it is simply not expected that we 
safepoint in the middle of remove_activation().

The JvmtiExport::post_method_exit() function we end up calling, reads the 
top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that 
eventually call Java and a bunch of stuff that safepoints. So after the JVMTI 
callback, we can expect the top-of-stack oop to be broken. Unfortunately, when 
we continue, we therefore end up returning a broken oop.

Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is 
wrong, as we can safepoint on the way back to Java, which will break the return 
oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the 
transition to VM and back, into a block of code that is protected against GC. 
Before the JRT_BLOCK is called, we stash away the return oop, and after the 
JRT_BLOCK_END, we restore the top-of-stack oop. In the path when 
InterpreterRuntime::post_method_exit is called when throwing an exception, we 
don't have the same problem of retaining an oop result, and hence the 
JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the 
same as before for this path. 

This is a JVMTI bug that has probably been around for a long time. It crashes 
with all GCs, but was discovered recently after concurrent stack processing, as 
StefanK has been running better GC stressing code in JVMTI, and the bug 
reproduced more easily with concurrent stack processing, as the timings were a 
bit different. The following reproducer failed pretty much 100% of the time:
while true; do make test JTREG="RETAIN=all" 
TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
 TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 
-XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done 

With my fix I can run this repeatedly without any more failures. I have also 
sanity checked the patch by running tier 1-5, so that it does not introduces 
any new issues on its own. I have also used Stefan's nice external GC stressing 
with jcmd technique that was used to trigger crashes with other GCs, to make 
sure said crashes no longer reproduce either.

-

Commit messages:
 - 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop

Changes: https://git.openjdk.java.net/jdk/pull/930/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=930=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8255452
  Stats: 46 lines in 3 files changed: 39 ins; 4 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/930.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930

PR: https://git.openjdk.java.net/jdk/pull/930


Re: RFR: 8255243: Reinforce escape barrier interactions with ZGC conc stack processing [v2]

2020-10-28 Thread Erik Österlund
On Wed, 28 Oct 2020 19:44:33 GMT, Serguei Spitsyn  wrote:

> Hi Erik and Richard,
> 
> Changes in the serviceability files looks fine.
> 
> Thanks,
> 
> Serguei
> 
> 

Thanks for the review Serguei!

-

PR: https://git.openjdk.java.net/jdk/pull/832


Re: RFR: 8255243: Reinforce escape barrier interactions with ZGC conc stack processing [v2]

2020-10-27 Thread Erik Österlund
> The escape barrier reallocates scalarized objects potentially deep into the 
> stack of a remote thread. Each allocation can safepoint, causing referenced 
> frames to be invalid. Some sprinklings were added that deal with that, but I 
> believe it was subsequently broken with the integration of the new vector 
> API, that has its own new deoptimization code that did not know about this. 
> Not surprisingly, the integration of the new vector API had no idea about 
> this subtlety, and allocates an object, and then reads an object deep from 
> the stack of a remote thread (using an escape barrier). I suppose the issue 
> is that all these 3 things were integrated at almost the same time. The 
> problematic code sequence is in VectorSupport::allocate_vector() in 
> vectorSupport.cpp, which is called from Deoptimization::realloc_objects(). It 
> first allocates an oop (possibly safepointing), and then reads a vector oop 
> from the stack. This is usually fine, but not through the escape barrier, 
> with concurrent stack sc
 anning. While I have not seen any crashes yet, I can see from code inspection, 
that there is no way that this works correctly.
> 
> In order to make this less fragile for future changes, we should really have 
> a RAII object that keeps the target thread's stack of the escape barrier, 
> stable and processed, across safepoints. This patch fixes that. Then it 
> becomes much easier to reason about its correctness, compared to hoping the 
> various hooks are applied after each safepoint.
> 
> With this new robustness fix, the thread running the escape barrier, keeps 
> the target thread stack processed, straight through safepoints on the 
> requesting thread, making it easy and intuitive to understand why this works 
> correctly. The RAII object basically just has to cover the code block that 
> pokes at the remote stack and goes in and out of safepoints, arbitrarily. 
> Arguably, this escape barrier doesn't need to be blazingly fast, and can 
> afford keeping stacks sane through its operation.

Erik Österlund has updated the pull request incrementally with two additional 
commits since the last revision:

 - Better encapsulate object deoptimization in EscapeBarrier also to facilitate 
correct interaction with concurrent thread stack processing.
   
   The Stackwalk for object deoptimization in VM_GetOrSetLocal::doit_prologue 
is not prepared for concurrent thread stack processing.
   EscapeBarrier::deoptimize_objects(int depth) is extended to cover a range of 
frames from depth d1 to depth d2. It is also prepared for concurrent thread 
stack processing. With this change it is used to deoptimize objects in the 
prologue of VM_GetOrSetLocal.
 - Review comments

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/832/files
  - new: https://git.openjdk.java.net/jdk/pull/832/files/5159c9a2..7db0bab1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=832=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=832=00-01

  Stats: 120 lines in 9 files changed: 27 ins; 63 del; 30 mod
  Patch: https://git.openjdk.java.net/jdk/pull/832.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/832/head:pull/832

PR: https://git.openjdk.java.net/jdk/pull/832


Re: RFR: 8255243: Reinforce escape barrier interactions with ZGC conc stack processing [v2]

2020-10-27 Thread Erik Österlund
On Mon, 26 Oct 2020 15:19:51 GMT, Richard Reingruber  wrote:

>>> Hi Erik, the last commit in 
>>> https://github.com/reinrich/jdk/commits/pr-832-with-better-encapsulation 
>>> would be the refactoring I would like to do. It removes the code not 
>>> compliant with concurrent thread stack processing from 
>>> VM_GetOrSetLocal::doit_prologue(). Instead 
>>> EscapeBarrier::deoptimize_objects(int d1, int d2) is called. You added 
>>> already a KeepStackGCProcessedMark to that method and I changed it to 
>>> accept a range [d1, d2] of frames do the object deoptimization for.
>>> 
>>> I'm not sure how to handle this from a process point of view. Can the 
>>> refactoring be done within this change? Should a new item or subtask be 
>>> created for it. I'd be glad if you could give an advice on that.
>>> 
>>> Thanks, Richard.
>> 
>> If you are okay with it, I can add your refactorings into this change, and 
>> add you as a co-author of the change. Sounds good?
>> 
>> Thanks,
>
> It does sound good indeed to me if you don't mind doing that. Thanks!
> I have run the tests dedicated to EscapeBarriers with ZGC enabled and also 
> the DeoptimizeObjectsALot stress testing. I will run some more serviceability 
> tests and my teams CI testing until tomorrow.

Thanks @reinrich. I uploaded your patch to this PR, and will add you as 
contributor. Also addressed your review comments. Hope your testing went fine.

-

PR: https://git.openjdk.java.net/jdk/pull/832


Re: RFR: 8223312: Utilize handshakes instead of is_thread_fully_suspended [v3]

2020-10-22 Thread Erik Österlund
On Thu, 22 Oct 2020 08:42:40 GMT, Richard Reingruber  wrote:

>> Stack frames are counted beginning at 0. The top frame has depth 0. So 
>> object deoptimization happens in the top frame.
>> 
>> Still the used method is not optimal because it assumes that objects of 
>> frames within the given depth are accessed and their escape state is 
>> changed. But potentially caller methods optimized on the escape state 
>> therefore it searches for caller frames passing ArgEscape objects and 
>> deoptimizes these too. With ForceEarlyReturn no objects are accessed but it 
>> is so uncommon that I did not bother optimizing this. Should I?
>
> @robehn you haven't messed up. Hope I havn't either. I've tested
> 
> ==
> Test summary
> ==
>TEST  TOTAL  PASS  FAIL ERROR  
>  
>jtreg:test/hotspot/jtreg:hotspot_serviceability 197   197 0 0  
>  
>jtreg:test/jdk:jdk_svc 1176  1176 0 0  
>  
>jtreg:test/jdk:jdk_jdi  174   174 0 0  
>  
>jtreg:test/hotspot/jtreg:vmTestbase_nsk_jdi1141  1141 0 0  
>  
>jtreg:test/hotspot/jtreg:vmTestbase_nsk_jvmti   648   648 0 0  
>  
>jtreg:test/hotspot/jtreg:vmTestbase_nsk_jdwp113   113 0 0  
>  
> ==
> TEST SUCCESS
> jdk_jdi now includes jdk/com/sun/jdi/EATests.java which tests 
> PopFrame/ForceEarlyReturn with object reallocation with and without 
> reallocation failures.

Ah. I see now the loop uses <= instead of <. So my reasoning was right but off 
by 1. Passing in 0 really means deopt 1 frame. Which means everything is fine 
and working the way I expect it to.

-

PR: https://git.openjdk.java.net/jdk/pull/729


Re: RFR: 8223312: Utilize handshakes instead of is_thread_fully_suspended [v3]

2020-10-22 Thread Erik Österlund
On Wed, 21 Oct 2020 08:40:47 GMT, Robbin Ehn  wrote:

>> The main point of this change-set is to make it easier to implement S/R on 
>> top of handshakes.
>> Which is a prerequisite for removing _suspend_flag (which duplicates the 
>> handshake functionality).
>> But we also remove some complicated S/R methods.
>> 
>> We basically just put in everything in the handshake closure, so the diff 
>> just looks much worse than what it is.
>> 
>> TraceSuspendDebugBits have an ifdef, but in both cases it now just returns.
>> But I was unsure if I should remove now or when is_ext_suspend_completed() 
>> is removed.
>> 
>> Passes multiple t1-5 runs, locally it passes many 
>> jck:vm/nsk_jvmti/nsk_jdi/jdk-jdi runs.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge 
> or a rebase. The pull request now contains seven commits:
> 
>  - Fixed merge miss
>  - Merge branch 'master' into 
> 8223312-Utilize-handshakes-instead-of-is_thread_fully_suspended
>  - Merge fix from Richard
>  - Merge branch 'master' into 
> 8223312-Utilize-handshakes-instead-of-is_thread_fully_suspended
>  - Removed TraceSuspendDebugBits
>  - Removed unused method is_ext_suspend_completed_with_lock
>  - Utilize handshakes instead of is_thread_fully_suspended

Looks good. Awesome fix IMO.

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/729


Re: RFR: 8223312: Utilize handshakes instead of is_thread_fully_suspended [v3]

2020-10-22 Thread Erik Österlund
On Wed, 21 Oct 2020 08:40:47 GMT, Robbin Ehn  wrote:

>> The main point of this change-set is to make it easier to implement S/R on 
>> top of handshakes.
>> Which is a prerequisite for removing _suspend_flag (which duplicates the 
>> handshake functionality).
>> But we also remove some complicated S/R methods.
>> 
>> We basically just put in everything in the handshake closure, so the diff 
>> just looks much worse than what it is.
>> 
>> TraceSuspendDebugBits have an ifdef, but in both cases it now just returns.
>> But I was unsure if I should remove now or when is_ext_suspend_completed() 
>> is removed.
>> 
>> Passes multiple t1-5 runs, locally it passes many 
>> jck:vm/nsk_jvmti/nsk_jdi/jdk-jdi runs.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge 
> or a rebase. The pull request now contains seven commits:
> 
>  - Fixed merge miss
>  - Merge branch 'master' into 
> 8223312-Utilize-handshakes-instead-of-is_thread_fully_suspended
>  - Merge fix from Richard
>  - Merge branch 'master' into 
> 8223312-Utilize-handshakes-instead-of-is_thread_fully_suspended
>  - Removed TraceSuspendDebugBits
>  - Removed unused method is_ext_suspend_completed_with_lock
>  - Utilize handshakes instead of is_thread_fully_suspended

src/hotspot/share/prims/jvmtiEnv.cpp line 1663:

> 1661:   return JVMTI_ERROR_OUT_OF_MEMORY;
> 1662: }
> 1663: if (!eb.deoptimize_objects(1)) {

Oh and why is the depth 1 here, when two frames are deoptimized? Maybe I missed 
something.

-

PR: https://git.openjdk.java.net/jdk/pull/729


Re: RFR: 8223312: Utilize handshakes instead of is_thread_fully_suspended [v3]

2020-10-22 Thread Erik Österlund
On Wed, 21 Oct 2020 08:40:47 GMT, Robbin Ehn  wrote:

>> The main point of this change-set is to make it easier to implement S/R on 
>> top of handshakes.
>> Which is a prerequisite for removing _suspend_flag (which duplicates the 
>> handshake functionality).
>> But we also remove some complicated S/R methods.
>> 
>> We basically just put in everything in the handshake closure, so the diff 
>> just looks much worse than what it is.
>> 
>> TraceSuspendDebugBits have an ifdef, but in both cases it now just returns.
>> But I was unsure if I should remove now or when is_ext_suspend_completed() 
>> is removed.
>> 
>> Passes multiple t1-5 runs, locally it passes many 
>> jck:vm/nsk_jvmti/nsk_jdi/jdk-jdi runs.
>
> Robbin Ehn has updated the pull request with a new target base due to a merge 
> or a rebase. The pull request now contains seven commits:
> 
>  - Fixed merge miss
>  - Merge branch 'master' into 
> 8223312-Utilize-handshakes-instead-of-is_thread_fully_suspended
>  - Merge fix from Richard
>  - Merge branch 'master' into 
> 8223312-Utilize-handshakes-instead-of-is_thread_fully_suspended
>  - Removed TraceSuspendDebugBits
>  - Removed unused method is_ext_suspend_completed_with_lock
>  - Utilize handshakes instead of is_thread_fully_suspended

Just wondering why the escape barrier for force early return uses a stack depth 
is 0. Either that is wrong, or the escape barrier is not needed in the first 
place here. I think.

src/hotspot/share/prims/jvmtiEnvBase.cpp line 1390:

> 1388:   return JVMTI_ERROR_OUT_OF_MEMORY;
> 1389: }
> 1390: if (!eb.deoptimize_objects(0)) {

Why is the depth 0 here? That makes no sense to me. My understanding of this is 
that we have extracted the object deopt that would "normally" (since last 
week?) be done in JvmtiEnvBase::check_top_frame. And it is walking 1 frame, so 
shouldn't the depth be 1?

-

PR: https://git.openjdk.java.net/jdk/pull/729


Re: RFR: 8254668: JVMTI process frames on thread without started processing

2020-10-13 Thread Erik Österlund
On Tue, 13 Oct 2020 09:25:55 GMT, Stefan Karlsson  wrote:

> I hit the following assert in some tests runs that I've been doing:
> # Internal Error 
> (/home/stefank/git/alt/open/src/hotspot/share/runtime/stackWatermark.inline.hpp:67),
>  pid=828170,
> tid=828734 # assert(processing_started()) failed: Processing should already 
> have started
> 
> The stack traces for this has been:
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> V [libjvm.so+0x1626d75] StackWatermarkSet::on_iteration(JavaThread*, frame 
> const&)+0xd5
> V [libjvm.so+0xad791a] frame::sender(RegisterMap*) const+0x7a
> V [libjvm.so+0xacd3f8] frame::real_sender(RegisterMap*) const+0x18
> V [libjvm.so+0x1804c4a] vframe::sender() const+0xea
> V [libjvm.so+0x175f47b] JavaThread::last_java_vframe(RegisterMap*)+0x5b
> V [libjvm.so+0x10e10fc] JvmtiEnvBase::vframeFor(JavaThread*, int)+0x4c
> V [libjvm.so+0x10e6972] JvmtiEnvBase::check_top_frame(Thread*, JavaThread*, 
> jvalue, TosState, Handle*)+0xe2
> V [libjvm.so+0x10e759c] JvmtiEnvBase::force_early_return(JavaThread*, jvalue, 
> TosState)+0x11c
> V [libjvm.so+0x105b8f5] jvmti_ForceEarlyReturnObject+0x215
> V  [libjvm.so+0x1626d75]  StackWatermarkSet::on_iteration(JavaThread*, frame 
> const&)+0xd5
> V  [libjvm.so+0xad791a]  frame::sender(RegisterMap*) const+0x7a
> V  [libjvm.so+0xacd3f8]  frame::real_sender(RegisterMap*) const+0x18
> V  [libjvm.so+0x1804c4a]  vframe::sender() const+0xea
> V  [libjvm.so+0x1804d00]  vframe::java_sender() const+0x10
> V  [libjvm.so+0x10e1115]  JvmtiEnvBase::vframeFor(JavaThread*, int)+0x65
> V  [libjvm.so+0x10d475f]  JvmtiEnv::NotifyFramePop(JavaThread*, int)+0x9f
> V  [libjvm.so+0x106b6aa]  jvmti_NotifyFramePop+0x23a
> The code inspects the top frame of a suspended java thread. However, there's 
> nothing in the code that starts the
> watermark processing of the thread, so the code asserts when sender calls 
> on_iteration.
> We only have to call start_processing/on_iteration when oops are being read. 
> The failing code does *not* inspect any
> oops, so I turn of the on_iteration call by settings process_frame to false.
> To notify the readers of the code that vframeFor doesn't process the oops, 
> I've renamed the function to
> vframeForNoProcess to give a visual cue.
> I found this bug when running this command line:
> makec ../build/fastdebug/ test TEST=test/hotspot/jtreg/vmTestbase/nsk/jvmti
> JTREG="JAVA_OPTIONS=-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=1 
> -XX:ZFragmentationLimit=0.01"
> JTREG_EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt   Five tests consistently 
> asserts with this command line. All tests pass
> with the proposed fix.
> Recommendations of tests to run are welcome. I intend to get this run through 
> tier1-3, but haven't yet.

Looks good. Thanks for fixing this.

-

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/627


Integrated: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-10-09 Thread Erik Österlund
On Tue, 22 Sep 2020 11:43:41 GMT, Erik Österlund  wrote:

> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

This pull request has now been integrated.

Changeset: b9873e18
Author:Erik Österlund 
URL:   https://git.openjdk.java.net/jdk/commit/b9873e18
Stats: 2740 lines in 131 files changed: 2167 ins; 311 del; 262 mod

8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

Reviewed-by: stefank, pliden, rehn, neliasso, coleenp, smonteith

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v13]

2020-10-09 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now
contains 19 commits:

 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: Andrew CR 1
 - Review: David CR 1
 - Review: Deal with new assert from mainline
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: StackWalker hook
 - Review: Kim CR 1 and exception handling fix
 - Review: Move barrier detach
 - Review: Remove assert that has outstayed its welcome
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - ... and 9 more: https://git.openjdk.java.net/jdk/compare/a2f65190...66070372

-

Changes: https://git.openjdk.java.net/jdk/pull/296/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=296=12
  Stats: 2740 lines in 131 files changed: 2167 ins; 311 del; 262 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v12]

2020-10-08 Thread Erik Österlund
On Wed, 7 Oct 2020 18:03:10 GMT, Stuart Monteith  wrote:

> I've been reviewing this and stepping through the debugger. It looks OK to me.

Thanks for the review Stuart.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]

2020-10-07 Thread Erik Österlund
On Tue, 6 Oct 2020 12:18:39 GMT, Erik Österlund  wrote:

>>> Hi Erik,
>>> Can you give an overview of the use of the "poll word" and its relation to 
>>> the "poll page" please?
>>> Thanks,
>>> David
>> 
>> Hi David,
>> 
>> Thanks for reviewing this code.
>> 
>> There are various polls in the VM. We have runtime transitions, interpreter 
>> transitions, transitions at returns, native
>> wrappers, transitions in nmethods... and sometimes they are a bit different.
>> The "poll word" encapsulates enough information to be able to poll for 
>> returns (stack watermark barrier), or poll for
>> normal handshakes/safepoints, with a conditional branch. So really, we could 
>> use the "poll word" for every single poll.
>> A low order bit is a boolean saying if handshake/safepoint is armed, and the 
>> rest of the word denotes the watermark for
>> which frame has armed returns.  The "poll page" is for polls that do not use 
>> conditional branches, but instead uses an
>> indirect load. It is used still in nmethod loop polls, because I 
>> experimentally found it to perform worse with
>> conditional branches on one machine, and did not want to risk regressions. 
>> It is also used for VM configurations that
>> do not yet support stack watermark barriers, such as Graal, PPC, S390 and 32 
>> bit platforms. They will hopefully
>> eventually support this mechanism, but having the poll page allows a more 
>> smooth transition. And unless it is crystal
>> clear that the performance of the conditional branch loop poll really is 
>> fast enough on sufficiently many machines, we
>> might keep it until that changes.  Hope this makes sense.  Thanks,
>
>> _Mailing list message from [Andrew Haley](mailto:a...@redhat.com) on 
>> [hotspot-dev](mailto:hotspot-...@openjdk.java.net):_
>> 
>> On 06/10/2020 08:22, Erik ?sterlund wrote:
>> 
>> > > This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
>> > > Processing" (cf.
>> > > https://openjdk.java.net/jeps/376).
>> 
>> One small thing: the couple of uses of lea(InternalAddress) should really be 
>> adr;
>> this generates much better code.
> 
> Hi Andrew,
> 
> Thanks for having a look. I applied your patch. Having said that, this is run 
> on the safepoint slow path, so should be
> a rather cold path, where threads have to wear coats and gloves. But it does 
> not hurt to optimize the encoding further,
> I suppose.  Thanks,

> 
> *Mailing list message from [David Holmes](mailto:david.hol...@oracle.com) on
>  [serviceability-dev](mailto:serviceability-dev@openjdk.java.net):*
> 
> Hi Erik,
> 
> On 6/10/2020 5:37 pm, Erik ?sterlund wrote:
> > On Tue, 6 Oct 2020 02:57:00 GMT, David Holmes  
> > wrote:
> > 
> >> Hi Erik,
> >> Can you give an overview of the use of the "poll word" and its relation to 
> >> the "poll page" please?
> >> Thanks,
> >> David
> > 
> > Hi David,
> > 
> > Thanks for reviewing this code.
> > 
> > There are various polls in the VM. We have runtime transitions, interpreter 
> > transitions, transitions at returns, native
> > wrappers, transitions in nmethods... and sometimes they are a bit different.
> > 
> > The "poll word" encapsulates enough information to be able to poll for 
> > returns (stack watermark barrier), or poll for
> > normal handshakes/safepoints, with a conditional branch. So really, we 
> > could use the "poll word" for every single poll.
> > A low order bit is a boolean saying if handshake/safepoint is armed, and 
> > the rest of the word denotes the watermark for
> > which frame has armed returns.
> > 
> > The "poll page" is for polls that do not use conditional branches, but 
> > instead uses an indirect load. It is used still
> > in nmethod loop polls, because I experimentally found it to perform worse 
> > with conditional branches on one machine, and
> > did not want to risk regressions. It is also used for VM configurations 
> > that do not yet support stack watermark
> > barriers, such as Graal, PPC, S390 and 32 bit platforms. They will 
> > hopefully eventually support this mechanism, but
> > having the poll page allows a more smooth transition. And unless it is 
> > crystal clear that the performance of the
> > conditional branch loop poll really is fast enough on sufficiently many 
> > machines, we might keep it until that changes.
> > 

Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]

2020-10-06 Thread Erik Österlund
On Tue, 6 Oct 2020 07:35:16 GMT, Erik Österlund  wrote:

>> Hi Erik,
>> Can you give an overview of the use of the "poll word" and its relation to 
>> the "poll page" please?
>> Thanks,
>> David
>
>> Hi Erik,
>> Can you give an overview of the use of the "poll word" and its relation to 
>> the "poll page" please?
>> Thanks,
>> David
> 
> Hi David,
> 
> Thanks for reviewing this code.
> 
> There are various polls in the VM. We have runtime transitions, interpreter 
> transitions, transitions at returns, native
> wrappers, transitions in nmethods... and sometimes they are a bit different.
> The "poll word" encapsulates enough information to be able to poll for 
> returns (stack watermark barrier), or poll for
> normal handshakes/safepoints, with a conditional branch. So really, we could 
> use the "poll word" for every single poll.
> A low order bit is a boolean saying if handshake/safepoint is armed, and the 
> rest of the word denotes the watermark for
> which frame has armed returns.  The "poll page" is for polls that do not use 
> conditional branches, but instead uses an
> indirect load. It is used still in nmethod loop polls, because I 
> experimentally found it to perform worse with
> conditional branches on one machine, and did not want to risk regressions. It 
> is also used for VM configurations that
> do not yet support stack watermark barriers, such as Graal, PPC, S390 and 32 
> bit platforms. They will hopefully
> eventually support this mechanism, but having the poll page allows a more 
> smooth transition. And unless it is crystal
> clear that the performance of the conditional branch loop poll really is fast 
> enough on sufficiently many machines, we
> might keep it until that changes.  Hope this makes sense.  Thanks,

> _Mailing list message from [Andrew Haley](mailto:a...@redhat.com) on 
> [hotspot-dev](mailto:hotspot-...@openjdk.java.net):_
> 
> On 06/10/2020 08:22, Erik ?sterlund wrote:
> 
> > > This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> > > Processing" (cf.
> > > https://openjdk.java.net/jeps/376).
> 
> One small thing: the couple of uses of lea(InternalAddress) should really be 
> adr;
> this generates much better code.

Hi Andrew,

Thanks for having a look. I applied your patch. Having said that, this is run 
on the safepoint slow path, so should be
a rather cold path, where threads have to wear coats and gloves. But it does 
not hurt to optimize the encoding further,
I suppose.

Thanks,

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v12]

2020-10-06 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Review: Andrew CR 1

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/296/files
  - new: https://git.openjdk.java.net/jdk/pull/296/files/2816b76b..54fe1f8b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=296=11
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=296=10-11

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]

2020-10-06 Thread Erik Österlund
On Tue, 6 Oct 2020 02:57:00 GMT, David Holmes  wrote:

> Hi Erik,
> Can you give an overview of the use of the "poll word" and its relation to 
> the "poll page" please?
> Thanks,
> David

Hi David,

Thanks for reviewing this code.

There are various polls in the VM. We have runtime transitions, interpreter 
transitions, transitions at returns, native
wrappers, transitions in nmethods... and sometimes they are a bit different.

The "poll word" encapsulates enough information to be able to poll for returns 
(stack watermark barrier), or poll for
normal handshakes/safepoints, with a conditional branch. So really, we could 
use the "poll word" for every single poll.
A low order bit is a boolean saying if handshake/safepoint is armed, and the 
rest of the word denotes the watermark for
which frame has armed returns.

The "poll page" is for polls that do not use conditional branches, but instead 
uses an indirect load. It is used still
in nmethod loop polls, because I experimentally found it to perform worse with 
conditional branches on one machine, and
did not want to risk regressions. It is also used for VM configurations that do 
not yet support stack watermark
barriers, such as Graal, PPC, S390 and 32 bit platforms. They will hopefully 
eventually support this mechanism, but
having the poll page allows a more smooth transition. And unless it is crystal 
clear that the performance of the
conditional branch loop poll really is fast enough on sufficiently many 
machines, we might keep it until that changes.

Hope this makes sense.

Thanks,

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v10]

2020-10-06 Thread Erik Österlund
On Tue, 6 Oct 2020 02:40:12 GMT, David Holmes  wrote:

>> Erik Österlund has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now
>> contains 16 commits:
>>  - Review: Deal with new assert from mainline
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: StackWalker hook
>>  - Review: Kim CR 1 and exception handling fix
>>  - Review: Move barrier detach
>>  - Review: Remove assert that has outstayed its welcome
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: Albert CR2 and defensive programming
>>  - Review: StefanK CR 3
>>  - Review: Per CR 1
>>  - ... and 6 more: 
>> https://git.openjdk.java.net/jdk/compare/9604ee82...e633cb94
>
> src/hotspot/share/runtime/stackWatermark.hpp line 91:
> 
>> 89:   JavaThread* _jt;
>> 90:   StackWatermarkFramesIterator* _iterator;
>> 91:   Mutex _lock;
> 
> How are you guaranteeing that the Mutex is unused at the time the 
> StackWatermark is deleted?

The StackWatermarks are deleted when the thread is deleted (and its destructor 
runs). Hence, I'm relying on the Threads
SMR project here. Anyone that pokes around at the StackWatermark is either the 
current thread, or a thread that has a
ThreadsListHandle containing the thread, making it safe to access that thread 
without it racingly being deleted.

> src/hotspot/share/runtime/safepointMechanism.cpp line 101:
> 
>> 99: uintptr_t SafepointMechanism::compute_poll_word(bool armed, uintptr_t 
>> stack_watermark) {
>> 100:   if (armed) {
>> 101: log_debug(stackbarrier)("Computed armed at %d", 
>> Thread::current()->osthread()->thread_id());
> 
> s/at/for/ ?

Fixed.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v11]

2020-10-06 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Review: David CR 1

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/296/files
  - new: https://git.openjdk.java.net/jdk/pull/296/files/e633cb94..2816b76b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=296=10
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=296=09-10

  Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v10]

2020-10-06 Thread Erik Österlund
On Tue, 6 Oct 2020 02:26:16 GMT, David Holmes  wrote:

>> Erik Österlund has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now
>> contains 16 commits:
>>  - Review: Deal with new assert from mainline
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: StackWalker hook
>>  - Review: Kim CR 1 and exception handling fix
>>  - Review: Move barrier detach
>>  - Review: Remove assert that has outstayed its welcome
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: Albert CR2 and defensive programming
>>  - Review: StefanK CR 3
>>  - Review: Per CR 1
>>  - ... and 6 more: 
>> https://git.openjdk.java.net/jdk/compare/9604ee82...e633cb94
>
> src/hotspot/share/runtime/safepointMechanism.cpp line 89:
> 
>> 87:   //
>> 88:   // The call has been carefully placed here to cater for a few 
>> situations:
>> 89:   // 1) After we exit from block after a global pool
> 
> Typo: pool -> poll

Fixed.

> src/hotspot/share/runtime/stackWatermark.cpp line 223:
> 
>> 221: void StackWatermark::yield_processing() {
>> 222:   update_watermark();
>> 223:   MutexUnlocker mul(&_lock, Mutex::_no_safepoint_check_flag);
> 
> This seems a little dubious - is it just a heuristic? There is no guarantee 
> that unlocking the Mutex will allow another
> thread to claim it before this thread re-locks it.

It is indeed just a heuristic. There is no need for a guarantee.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v10]

2020-10-05 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now
contains 16 commits:

 - Review: Deal with new assert from mainline
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: StackWalker hook
 - Review: Kim CR 1 and exception handling fix
 - Review: Move barrier detach
 - Review: Remove assert that has outstayed its welcome
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: Albert CR2 and defensive programming
 - Review: StefanK CR 3
 - Review: Per CR 1
 - ... and 6 more: https://git.openjdk.java.net/jdk/compare/9604ee82...e633cb94

-

Changes: https://git.openjdk.java.net/jdk/pull/296/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=296=09
  Stats: 2740 lines in 131 files changed: 2167 ins; 311 del; 262 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]

2020-10-01 Thread Erik Österlund
On Thu, 1 Oct 2020 09:50:45 GMT, Erik Österlund  wrote:

>> Marked as reviewed by rehn (Reviewer).
>
>> _Mailing list message from [Kim Barrett](mailto:kim.barr...@oracle.com) on
>> [hotspot-dev](mailto:hotspot-...@openjdk.java.net):_
>> I've only looked at scattered pieces, but what I've looked at seemed to be
>> in good shape. Only a few minor comments.
>> 
>> --
>> src/hotspot/share/runtime/frame.cpp
>> 456 // for(StackFrameStream fst(thread); !fst.is_done(); fst.next()) {
>> 
>> Needs to be updated for the new constructor arguments. Just in general, the
>> class documentation seems to need some updating for this change.
> 
> Fixed.
> 
>> --
>> src/hotspot/share/runtime/frame.cpp
>> 466 StackFrameStream(JavaThread *thread, bool update, bool process_frames);
>> 
>> Something to consider is that bool parameters like that, especially when
>> there are multiple, are error prone. An alternative is distinct enums, which
>> likely also obviates the need for comments in calls.
> 
> Coleen also had the same comment, and we agreed to file a follow-up RFE to 
> clean that up. This also applies to all the
> existing parameters passed in. So I would still like to do that, but in a 
> follow-up RFE. Said RFE will also make the
> parameter use in RegisterMap and vframeStream explicit.
>> --
>> src/hotspot/share/runtime/thread.hpp
>> 956 void set_processed_thread(Thread *thread) { _processed_thread = thread; }
>> 
>> I think this should assert that either _processed_thread or thread are NULL.
>> Or maybe the RememberProcessedThread constructor should be asserting that
>> _cur_thr->processed_thread() is NULL.
> 
> Fixed.
> 
>> --
> 
> Thanks for the review Kim!

In my last PR update, I included a fix to an exception handling problem that I 
encountered after lots of stress testing
that I have been running for a while now. I managed to catch the issue, get a 
reliable reproducer, and fix it.

The root problem is that the hook I had placed in 
SharedRuntime::exception_handler_for_return_address has been ignored.
The reason is that the stack is not walkable at this point. The hook then just 
ignores it. This had some unexpected
consequences. After looking closer at this code, I found that if we did have a 
walkable stack when we call
SharedRuntime::raw_exception_handler_for_return_address, that would have been 
the only hook we need at all for
exception handling. It is always the common root point where we unwind into a 
caller frame due to an exception throwing
into the caller, and we need to look up the rethrow handler of the caller. 
However, we are indeed not walkable here. To
deal with this, I have rearranged the exceptino hooks a bit. First of all, I 
have deleted all before_unwind hooks for
exception handling, because they should not be needed if the after_unwind hook 
is reliably called on the caller side
instead. And those hooks do indeed need to be there, because we do not always 
have a point where we can call
before_unwind (e.g. C1 unwind exception code, that just unwinds and looks up 
the rethrow handler via
SharedRuntime::exception_handler_for_return_address). I have then traced all 
paths from
SharedRuntime::raw_exception_handler_for_return_address into runtime rethrow 
handlers called, for each rethrow
exception handler PC exposed in the function. They are:

* OptoRuntime::rethrow_C when unwinding into C2 code
* exception_handler_for_pc_helper via Runtime1::handle_exception_from_callee_id 
when unwinding into C1 code
* JavaCallWrapper::~JavaCallWrapper when unwinding into a Java call stub
* InterpreterRuntime::exception_handler_for_exception when unwinding into an 
interpreted method
* Deoptimization::fetch_unroll_info (with exec_mode == Unpack_exception) when 
unwinding into a deoptimized nmethod

Each rethrow handler returned has a corresponding comment saying which rethrow 
runtime rethrow handler it will end up
in, once the stack has been walkable and we have transferred control into the 
caller. And all of those runtime hooks
now have an after_unwind() hook.

The good news is that now the responsibility for who calls the unwind hook for 
exception is clearer: it is never done
by the callee, and always done by the caller, in its rethrow handler, at which 
point the stack is walkable.

In order to avoid further issues where an unwind hook is ignored, I have 
changed them to assert that there is a
last_Java_frame present. Previously I did not assert that, because there was 
shared code bet

Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]

2020-10-01 Thread Erik Österlund
On Tue, 29 Sep 2020 16:09:48 GMT, Robbin Ehn  wrote:

>> Erik Österlund has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now
>> contains 12 commits:
>>  - Review: Move barrier detach
>>  - Review: Remove assert that has outstayed its welcome
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: Albert CR2 and defensive programming
>>  - Review: StefanK CR 3
>>  - Review: Per CR 1
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: Albert CR 1
>>  - Review: SteafanK CR 2
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - ... and 2 more: 
>> https://git.openjdk.java.net/jdk/compare/6bddeb70...2ffbd764
>
> Marked as reviewed by rehn (Reviewer).

> _Mailing list message from [Kim Barrett](mailto:kim.barr...@oracle.com) on
> [hotspot-dev](mailto:hotspot-...@openjdk.java.net):_
> I've only looked at scattered pieces, but what I've looked at seemed to be
> in good shape. Only a few minor comments.
> 
> --
> src/hotspot/share/runtime/frame.cpp
> 456 // for(StackFrameStream fst(thread); !fst.is_done(); fst.next()) {
> 
> Needs to be updated for the new constructor arguments. Just in general, the
> class documentation seems to need some updating for this change.

Fixed.

> --
> src/hotspot/share/runtime/frame.cpp
> 466 StackFrameStream(JavaThread *thread, bool update, bool process_frames);
> 
> Something to consider is that bool parameters like that, especially when
> there are multiple, are error prone. An alternative is distinct enums, which
> likely also obviates the need for comments in calls.

Coleen also had the same comment, and we agreed to file a follow-up RFE to 
clean that up. This also applies to all the
existing parameters passed in. So I would still like to do that, but in a 
follow-up RFE. Said RFE will also make the
parameter use in RegisterMap and vframeStream explicit.

> --
> src/hotspot/share/runtime/thread.hpp
> 956 void set_processed_thread(Thread *thread) { _processed_thread = thread; }
> 
> I think this should assert that either _processed_thread or thread are NULL.
> Or maybe the RememberProcessedThread constructor should be asserting that
> _cur_thr->processed_thread() is NULL.

Fixed.

> --

Thanks for the review Kim!

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v9]

2020-10-01 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Review: Kim CR 1 and exception handling fix

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/296/files
  - new: https://git.openjdk.java.net/jdk/pull/296/files/2ffbd764..83c40895

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=296=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=296=07-08

  Stats: 82 lines in 9 files changed: 48 ins; 26 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]

2020-09-29 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now
contains 12 commits:

 - Review: Move barrier detach
 - Review: Remove assert that has outstayed its welcome
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: Albert CR2 and defensive programming
 - Review: StefanK CR 3
 - Review: Per CR 1
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: Albert CR 1
 - Review: SteafanK CR 2
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6bddeb70...2ffbd764

-

Changes: https://git.openjdk.java.net/jdk/pull/296/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=296=07
  Stats: 2705 lines in 129 files changed: 2137 ins; 310 del; 258 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v6]

2020-09-29 Thread Erik Österlund
On Tue, 29 Sep 2020 14:39:23 GMT, Zhengyu Gu  wrote:

>>> Hi Erik,
>>> 
>>> I have been playing with this patch for past a few days. Great work!
>>> 
>>> I found that this patch seems to break an early assumption.
>>> We have a comment in JavaThread::exit() says:
>>> 
>>>   // We need to cache the thread name for logging purposes below as once
>>>   // we have called on_thread_detach this thread must not access any oops.
>>> 
>>> Then in method :
>>> 
>>> ```
>>> 
>>> void Threads::remove(JavaThread* p, bool is_daemon)  {
>>>  ...
>>>   BarrierSet::barrier_set()->on_thread_detach(p);
>>> 
>>>   // Extra scope needed for Thread_lock, so we can check
>>>   // that we do not remove thread without safepoint code notice
>>>   { MonitorLocker ml(Threads_lock);
>>> ..
>>> }
>>> ```
>>> 
>>> It calls barrier's on_thread_detach(), acquires Threads_lock.
>>> The lock acquisition triggers stack processing, that potential accesses 
>>> oops.
>>> 
>>> ```
>>> 
>>> V  [libjvm.so+0x10c6f5e]  StackWatermark::start_processing()+0x6a
>>> V  [libjvm.so+0x10c77e8]  StackWatermarkSet::start_processing(JavaThread*, 
>>> StackWatermarkKind)+0x82
>>> V  [libjvm.so+0xfd7757]  SafepointMechanism::process(JavaThread*)+0x37
>>> V  [libjvm.so+0xfd796b]  
>>> SafepointMechanism::process_if_requested_slow(JavaThread*)+0x1d
>>> V  [libjvm.so+0x4b3683]  
>>> SafepointMechanism::process_if_requested(JavaThread*)+0x2b
>>> V  [libjvm.so+0xe87f0d]  
>>> ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x5f
>>> V  [libjvm.so+0xe86700]  Mutex::lock_contended(Thread*)+0x12c
>>> V  [libjvm.so+0xe867d8]  Mutex::lock(Thread*)+0x96
>>> V  [libjvm.so+0xe86823]  Mutex::lock()+0x23
>>> V  [libjvm.so+0x29b4bc]  MutexLocker::MutexLocker(Mutex*, 
>>> Mutex::SafepointCheckFlag)+0xe2
>>> V  [libjvm.so+0x29b533]  MonitorLocker::MonitorLocker(Monitor*, 
>>> Mutex::SafepointCheckFlag)+0x29
>>> V  [libjvm.so+0x119f2ce]  Threads::remove(JavaThread*, bool)+0x56
>>> V  [libjvm.so+0x1198a2b]  JavaThread::exit(bool, JavaThread::ExitType)+0x905
>>> V  [libjvm.so+0x1197fde]  JavaThread::post_run()+0x22
>>> V  [libjvm.so+0x1193eae]  Thread::call_run()+0x230
>>> V  [libjvm.so+0xef3e38]  thread_native_entry(Thread*)+0x1e4
>>> ```
>>> 
>>> This is a problem for Shenandoah, as it flushes SATB buffers during 
>>> on_thread_detach() and does not expect to see any
>>> more SATB traffic.
>>> Thanks.
>> 
>> What oop are you encountering here? You should have no frames left at this 
>> point, and all oops should have been
>> cleared. At least that is the theory, and that was why the thread oop moved 
>> out from the thread (to enforce that). So I
>> am curious what oop you have found to still be around at this point.  
>> Anyway, you can try moving the GC hook into the
>> critical section. That should help.
>
>> > Hi Erik,
>> > I have been playing with this patch for past a few days. Great work!
>> > I found that this patch seems to break an early assumption.
>> > We have a comment in JavaThread::exit() says:
>> > // We need to cache the thread name for logging purposes below as once
>> > // we have called on_thread_detach this thread must not access any oops.
>> > Then in method :
>> > ```
>> > 
>> > void Threads::remove(JavaThread* p, bool is_daemon)  {
>> >  ...
>> >   BarrierSet::barrier_set()->on_thread_detach(p);
>> > 
>> >   // Extra scope needed for Thread_lock, so we can check
>> >   // that we do not remove thread without safepoint code notice
>> >   { MonitorLocker ml(Threads_lock);
>> > ..
>> > }
>> > ```
>> > 
>> > 
>> > It calls barrier's on_thread_detach(), acquires Threads_lock.
>> > The lock acquisition triggers stack processing, that potential accesses 
>> > oops.
>> > ```
>> > 
>> > V  [libjvm.so+0x10c6f5e]  StackWatermark::start_processing()+0x6a
>> > V  [libjvm.so+0x10c77e8]  StackWatermarkSet::start_processing(JavaThread*, 
>> > StackWatermarkKind)+0x82
>> > V  [libjvm.so+0xfd7757]  SafepointMechanism::process(JavaThread*)+0x37
>> > V  [libjvm.so+0xfd796b]  
>> > SafepointMechanism::process_if_requested_slow(JavaThread*)+0x1d
>> > V  [libjvm.so+0x4b3683]  
>> > SafepointMechanism::process_if_requested(JavaThread*)+0x2b
>> > V  [libjvm.so+0xe87f0d]  
>> > ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x5f
>> > V  [libjvm.so+0xe86700]  Mutex::lock_contended(Thread*)+0x12c
>> > V  [libjvm.so+0xe867d8]  Mutex::lock(Thread*)+0x96
>> > V  [libjvm.so+0xe86823]  Mutex::lock()+0x23
>> > V  [libjvm.so+0x29b4bc]  MutexLocker::MutexLocker(Mutex*, 
>> > Mutex::SafepointCheckFlag)+0xe2
>> > V  [libjvm.so+0x29b533]  MonitorLocker::MonitorLocker(Monitor*, 
>> > Mutex::SafepointCheckFlag)+0x29
>> > V  [libjvm.so+0x119f2ce]  Threads::remove(JavaThread*, bool)+0x56
>> > V  [libjvm.so+0x1198a2b]  JavaThread::exit(bool, 
>> > JavaThread::ExitType)+0x905
>> > V  [libjvm.so+0x1197fde]  JavaThread::post_run()+0x22
>> > V  [libjvm.so+0x1193eae]  Thread::call_run()+0x230
>> > V  [libjvm.so+0xef3e38]  thread_native_entry(Thread*)+0x1e4
>> 

Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v6]

2020-09-29 Thread Erik Österlund
On Tue, 29 Sep 2020 13:13:55 GMT, Zhengyu Gu  wrote:

>> I see; thank you for the explanation.
>
> Hi Erik,
> 
> I have been playing with this patch for past a few days.  Great work!
> 
> I found that this patch seems to break an early assumption.
> We have a comment in JavaThread::exit() says:
> 
> 
>   // We need to cache the thread name for logging purposes below as once
>   // we have called on_thread_detach this thread must not access any oops.
> 
> 
> Then in method :
> 
> void Threads::remove(JavaThread* p, bool is_daemon)  {
>  ...
>   BarrierSet::barrier_set()->on_thread_detach(p);
> 
>   // Extra scope needed for Thread_lock, so we can check
>   // that we do not remove thread without safepoint code notice
>   { MonitorLocker ml(Threads_lock);
> ..
> }
> 
> It calls barrier's on_thread_detach(), acquires Threads_lock.
> The lock acquisition triggers stack processing, that potential accesses oops.
> 
> V  [libjvm.so+0x10c6f5e]  StackWatermark::start_processing()+0x6a
> V  [libjvm.so+0x10c77e8]  StackWatermarkSet::start_processing(JavaThread*, 
> StackWatermarkKind)+0x82
> V  [libjvm.so+0xfd7757]  SafepointMechanism::process(JavaThread*)+0x37
> V  [libjvm.so+0xfd796b]  
> SafepointMechanism::process_if_requested_slow(JavaThread*)+0x1d
> V  [libjvm.so+0x4b3683]  
> SafepointMechanism::process_if_requested(JavaThread*)+0x2b
> V  [libjvm.so+0xe87f0d]  
> ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x5f
> V  [libjvm.so+0xe86700]  Mutex::lock_contended(Thread*)+0x12c
> V  [libjvm.so+0xe867d8]  Mutex::lock(Thread*)+0x96
> V  [libjvm.so+0xe86823]  Mutex::lock()+0x23
> V  [libjvm.so+0x29b4bc]  MutexLocker::MutexLocker(Mutex*, 
> Mutex::SafepointCheckFlag)+0xe2
> V  [libjvm.so+0x29b533]  MonitorLocker::MonitorLocker(Monitor*, 
> Mutex::SafepointCheckFlag)+0x29
> V  [libjvm.so+0x119f2ce]  Threads::remove(JavaThread*, bool)+0x56
> V  [libjvm.so+0x1198a2b]  JavaThread::exit(bool, JavaThread::ExitType)+0x905
> V  [libjvm.so+0x1197fde]  JavaThread::post_run()+0x22
> V  [libjvm.so+0x1193eae]  Thread::call_run()+0x230
> V  [libjvm.so+0xef3e38]  thread_native_entry(Thread*)+0x1e4
> 
> 
> This is a problem for Shenandoah, as it flushes SATB buffers during 
> on_thread_detach() and does not expect to see any
> more SATB traffic.
> Thanks.

> Hi Erik,
> 
> I have been playing with this patch for past a few days. Great work!
> 
> I found that this patch seems to break an early assumption.
> We have a comment in JavaThread::exit() says:
> 
>   // We need to cache the thread name for logging purposes below as once
>   // we have called on_thread_detach this thread must not access any oops.
> 
> Then in method :
> 
> ```
> 
> void Threads::remove(JavaThread* p, bool is_daemon)  {
>  ...
>   BarrierSet::barrier_set()->on_thread_detach(p);
> 
>   // Extra scope needed for Thread_lock, so we can check
>   // that we do not remove thread without safepoint code notice
>   { MonitorLocker ml(Threads_lock);
> ..
> }
> ```
> 
> It calls barrier's on_thread_detach(), acquires Threads_lock.
> The lock acquisition triggers stack processing, that potential accesses oops.
> 
> ```
> 
> V  [libjvm.so+0x10c6f5e]  StackWatermark::start_processing()+0x6a
> V  [libjvm.so+0x10c77e8]  StackWatermarkSet::start_processing(JavaThread*, 
> StackWatermarkKind)+0x82
> V  [libjvm.so+0xfd7757]  SafepointMechanism::process(JavaThread*)+0x37
> V  [libjvm.so+0xfd796b]  
> SafepointMechanism::process_if_requested_slow(JavaThread*)+0x1d
> V  [libjvm.so+0x4b3683]  
> SafepointMechanism::process_if_requested(JavaThread*)+0x2b
> V  [libjvm.so+0xe87f0d]  
> ThreadBlockInVMWithDeadlockCheck::~ThreadBlockInVMWithDeadlockCheck()+0x5f
> V  [libjvm.so+0xe86700]  Mutex::lock_contended(Thread*)+0x12c
> V  [libjvm.so+0xe867d8]  Mutex::lock(Thread*)+0x96
> V  [libjvm.so+0xe86823]  Mutex::lock()+0x23
> V  [libjvm.so+0x29b4bc]  MutexLocker::MutexLocker(Mutex*, 
> Mutex::SafepointCheckFlag)+0xe2
> V  [libjvm.so+0x29b533]  MonitorLocker::MonitorLocker(Monitor*, 
> Mutex::SafepointCheckFlag)+0x29
> V  [libjvm.so+0x119f2ce]  Threads::remove(JavaThread*, bool)+0x56
> V  [libjvm.so+0x1198a2b]  JavaThread::exit(bool, JavaThread::ExitType)+0x905
> V  [libjvm.so+0x1197fde]  JavaThread::post_run()+0x22
> V  [libjvm.so+0x1193eae]  Thread::call_run()+0x230
> V  [libjvm.so+0xef3e38]  thread_native_entry(Thread*)+0x1e4
> ```
> 
> This is a problem for Shenandoah, as it flushes SATB buffers during 
> on_thread_detach() and does not expect to see any
> more SATB traffic.
> Thanks.

What oop are you encountering here? You should have no frames left at this 
point, and all oops should have been
cleared. At least that is the theory, and that was why the thread oop moved out 
from the thread (to enforce that). So I
am curious what oop you have found to still be around at this point.

Anyway, you can try moving the GC hook into the critical section. That should 
help.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v7]

2020-09-29 Thread Erik Österlund
On Tue, 29 Sep 2020 09:22:18 GMT, Roman Kennke  wrote:

>> I've also has problems with this assert in the past, and I think that the 
>> underlying assumption of the assert is wrong.
>> I would not miss it.
>
> IMO it's ok to remove it.
> However, it can be argued that is_in() should not check for 'in committed 
> memory' but 'in reserved space for heap', IOW
> 'is this a pointer into our heap memory region?'.  Or maybe this would be 
> CH::is_in_reserved() and we should change the
> assert to that? Also, IMO we should not do anything that may trip barriers on 
> oop-iterators or similar GC-only paths.

Almost always, what you want to do, is to assert the validity of heap pointers, 
where they are expected to be valid
(dereferenceable pointer to a not freed object). And then you actually do want 
to check that they are within the
committed boundaries of the heap. Otherwise you have a bug. We don't want to 
make such reasonable assertions weaker,
because we cater for asserts that want oops to look valid pretty much all the 
time, even when they are stale pointers
from a previous cycle, pointing into freed memory (until it is fixed). And I 
would much rather cater for asserts with
good underlying reasoning (the oop should actually be okay here or something is 
wrong), than to cater for asserts that
seem to just have the wrong underlying reasoning, making everyone else suffer 
for it, and letting bugs slip past all
the other asserts.

The is_in_reserved check no longer exists. And I don't think I want to bring it 
back from the dead for this assert.
Because I don't think the assert is right. The GCs already have their own 
verification code for roots (and the heap),
that does this job, but much better. So I don't think it makes sense for the 
actual shared code oop iterator to go in
and put constraints on what an invalid (not yet fixed up) stale pointer should 
and should not look like, for all GC
implementations. Let the GC's own verification code take care of that. It has a 
much better idea about what a stale
pointer should and should not look like.

And I do agree that we should not use barriers in this kind of code path. I 
just didn't know what else to do here to
make the assert happy. Sorry about that. Well now I know. I am going to delete 
it, unless anyone opposes that solution.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v7]

2020-09-29 Thread Erik Österlund
On Mon, 28 Sep 2020 17:20:49 GMT, Aleksey Shipilev  wrote:

>> We call frame::oops_do from the GC to make bad oops good. But the oops_do 
>> logic assers the oop is good *before* the
>> closure is applied. Every time I do something, I run i to issues with this 
>> assert. It seems like an invalid assumption
>> to me, that oops are meant to always be good, even before they are fixed. I 
>> vote for removing this assert. It has
>> caused a lot of pain. Any takers?
>
> I think the previous assert was intentionally weaker: `is_in` checks that 
> argument points to a committed area in the
> heap, which can include the oops not yet fixed. It does not check if oop is 
> valid as far as GC is concerned. So maybe
> reverting `NativeAccess::oop_load` is enough?

It was checking is_in_or_null before and after. Our is_in checks that it is in 
committed memory, and that the pointer
is not stale w.r.t. color, as that is very likely to be a bug. We reluctantly 
made it relaxed for our safepoints that
flip colors. I think it was once again for this very same usual suspect assert. 
But now that this is concurrent, its
memory can get concurrently freed.

I think this is the 7th time I try to make this assert happy. I run into it all 
the time. IMO, the underlying
assumption of the assert, is wrong. This is an iterator used by GC to fix 
totally invalid stale pointers into valid
pointers. You really can't expect that they are valid before fixing them. That 
just happened to be true for other GCs.
It makes little sense to me. That is why my preferred solution is to remove the 
assert. I would not miss it.

We have had similar issues with the oop_iterate framework asserting oops are 
valid before the closure is applied. At
least there, it is possible to opt out...

Would anyone miss the assert?

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v7]

2020-09-28 Thread Erik Österlund
On Mon, 28 Sep 2020 15:22:55 GMT, Roman Kennke  wrote:

>> Erik Österlund has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Review: Albert CR2 and defensive programming
>
> src/hotspot/share/compiler/oopMap.cpp line 319:
> 
>> 317: #ifdef ASSERT
>> 318: if uintptr_t)loc & (sizeof(oop)-1)) != 0) ||
>> 319: 
>> !Universe::heap()->is_in_or_null((oop)NativeAccess::oop_load()))
>>  {
> 
> This NativeAccess call causes the troubles with Shenandoah GC. It triggers 
> evacuation of the object referenced by the
> stack location during a safepoint, which we'd rather avoid. Why is it needed?

We call frame::oops_do from the GC to make bad oops good. But the oops_do logic 
assers the oop is good *before* the
closure is applied. Every time I do something, I run i to issues with this 
assert. It seems like an invalid assumption
to me, that oops are meant to always be good, even before they are fixed. I 
vote for removing this assert. It has
caused a lot of pain. Any takers?

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v7]

2020-09-28 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Review: Albert CR2 and defensive programming

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/296/files
  - new: https://git.openjdk.java.net/jdk/pull/296/files/078a207e..27ecb2eb

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=296=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=296=05-06

  Stats: 15 lines in 5 files changed: 9 ins; 3 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v6]

2020-09-28 Thread Erik Österlund
On Thu, 24 Sep 2020 21:20:19 GMT, Albert Mingkun Yang  wrote:

> Thank you for the comments and diagrams; they make the code much more 
> digestible. From that diagram, I get the
> impression that the watermark is associated with stack pointer, so it should 
> be 1:1 relation, but `class Thread`
> contains multiple watermarks, `StackWatermarks _stack_watermarks;`. I think 
> some high level description on the relation
> between the thread and a list of watermarks belong to that thread could be 
> beneficial.

I added some further comments explaining this.

> > The first time it reaches past the last frame it will report true, and the 
> > second time it will report false.
> 
> Why so? As I see it, once a stream becomes "done", it stays in that state 
> forever. Am I missing sth?
> 
> ```
> inline bool StackFrameStream::is_done() {
>   return (_is_done) ? true : (_is_done = _fr.is_first_frame(), false);
> }
> ```

When you step into the last frame of the iteration (first frame in the stack), 
the first time you ask is_done() it will
report false, the next time it will report true, despite still being in the 
same frame. Therefore, the is_done()
function is not idempotent, as I need it to be.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v6]

2020-09-24 Thread Erik Österlund
On Thu, 24 Sep 2020 14:28:44 GMT, Coleen Phillimore  wrote:

> To be clear, my comments about boolean parameters to vframeStream/RegMap can 
> be addressed in a follow up RFE.  That
> would be better.

Thanks for reviewing Coleen!

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v6]

2020-09-24 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Review: StefanK CR 3

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/296/files
  - new: https://git.openjdk.java.net/jdk/pull/296/files/3db0dbcb..078a207e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=296=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=296=04-05

  Stats: 12 lines in 3 files changed: 9 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v5]

2020-09-24 Thread Erik Österlund
On Thu, 24 Sep 2020 12:49:06 GMT, Coleen Phillimore  wrote:

>> Erik Österlund has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now
>> contains seven commits:
>>  - Review: Per CR 1
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: Albert CR 1
>>  - Review: SteafanK CR 2
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: SteafanK CR 1
>>  - 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack 
>> Processing
>
> I looked at the runtime code and it is nicely non-invasive, and makes sense 
> where there are changes.

I uploaded an incremental change to re-instate the separate verification of the 
thread head vs frames that I had
before. Stefan wanted it merged into one function unless I could remember why I 
had split it. I did not remember why,
but now I do. Updated with comments this time to describe why this is split 
(the exception oop must be fixed before
fiddling with the frames).

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v5]

2020-09-24 Thread Erik Österlund
On Thu, 24 Sep 2020 12:45:00 GMT, Coleen Phillimore  wrote:

>> Erik Österlund has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now
>> contains seven commits:
>>  - Review: Per CR 1
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: Albert CR 1
>>  - Review: SteafanK CR 2
>>  - Merge branch 'master' into 8253180_conc_stack_scanning
>>  - Review: SteafanK CR 1
>>  - 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack 
>> Processing
>
> src/hotspot/share/runtime/vframe.hpp line 340:
> 
>> 338:  public:
>> 339:   // Constructors
>> 340:   vframeStream(JavaThread* thread, bool stop_at_java_call_stub = false, 
>> bool process_frames = true);
> 
> I was wondering if you supply arguments to all callers of vframeStream now?  
> It seems like having default parameters is
> an invitation to errors. Same with RegMap.

I have tried to have explicit arguments as much as possible. But RegMap and 
vframeStream seemed to clutter my whole
patch if I made them explicit (they are already implicit today). I agree though 
that having them explicit would be
better. But what do you think about doing that in a follow-up cleanup RFE 
instead? It might reduce the amount of
scrolling needed to review this patch. The defaults are on the safe 
conservative side, so not supplying the extra
parameter should not harm you.

> src/hotspot/share/classfile/javaClasses.cpp line 2440:
> 
>> 2438:   // trace as utilizing vframe.
>> 2439: #ifdef ASSERT
>> 2440:   vframeStream st(thread, false /* stop_at_java_call_stub */, false /* 
>> process_frames */);
> 
> This is a nit, but could you put the /* stop_at_java_call_stub*/ and /* 
> process_frames */ before the values? Having a
> bunch of bool parameters seems like it could also become buggy.  In 
> linkResolver.hpp we have enums so that the values
> can be checked and are more explicit at the call sites.  Not this change, but 
> can we fix this later to do the same
> thing?

Per and Stefan would kill me if I put the comments before the value (they 
explicitly prefer it on the right side of the
value). Personally I have no strong preference. And yes, this also sounds like 
a good follow-up cleanup RFE.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v4]

2020-09-24 Thread Erik Österlund
On Thu, 24 Sep 2020 12:32:36 GMT, Nils Eliasson  wrote:

> I did a pre-reviewed of the compiler parts, and have now done a second 
> reading.
> 
> The compiler parts looks good!

Thanks for the review, Nils.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v5]

2020-09-24 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now
contains seven commits:

 - Review: Per CR 1
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: Albert CR 1
 - Review: SteafanK CR 2
 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: SteafanK CR 1
 - 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack 
Processing

-

Changes: https://git.openjdk.java.net/jdk/pull/296/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=296=04
  Stats: 2665 lines in 129 files changed: 2116 ins; 289 del; 260 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v4]

2020-09-24 Thread Erik Österlund
On Thu, 24 Sep 2020 11:29:36 GMT, Per Liden  wrote:

> I had also pre-reviewed, but did a final semi-quick review. Looks good 
> overall. Just found a couple of minor nitpicks.

Thansk for the review Per. I uploaded fixes to your nitpicks.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v4]

2020-09-24 Thread Erik Österlund
On Thu, 24 Sep 2020 12:23:50 GMT, Erik Österlund  wrote:

> Looks good.

Thanks for the review Robbin.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v3]

2020-09-24 Thread Erik Österlund
On Wed, 23 Sep 2020 15:45:12 GMT, Albert Mingkun Yang  wrote:

>> Marked as reviewed by stefank (Reviewer).
>
> Given names like `StackWatermarkSet::lowest_watermark`, I wonder if some 
> diagrams could be provided in the code
> comments for `class StackWatermarks` so that readers will have the correct 
> mental pictures of the layout of stack
> frames and watermarks.  Some docs on `volatile uint32_t _state;` in 
> `StackWatermark` would be nice, such as what info
> is encoded in this state, typical state transitions, who can access this 
> state (tying it to volatile) etc.
> I wonder if it's possible to remove `bool _is_done;` in 
> `StackWatermarkFramesIterator`; just calling `is_done()` on the
> underlying frame stream seems cleaner.

Thanks for reviewing Albert.

> Given names like `StackWatermarkSet::lowest_watermark`, I wonder if some 
> diagrams could be provided in the code
> comments for `class StackWatermarks` so that readers will have the correct 
> mental pictures of the layout of stack
> frames and watermarks.

Added some further comments and diagrams.
 
> Some docs on `volatile uint32_t _state;` in `StackWatermark` would be nice, 
> such as what info is encoded in this state,
> typical state transitions, who can access this state (tying it to volatile) 
> etc.

Added some comments about the state.

> I wonder if it's possible to remove `bool _is_done;` in 
> `StackWatermarkFramesIterator`; just calling `is_done()` on the
> underlying frame stream seems cleaner.

Unfortunately this is not possible. The underlying frame stream is_done() 
function is not idempotent. It alters the
state of the iteration as a side effect of asking the question whether it is 
done or not. The first time it reaches
past the last frame it will report true, and the second time it will report 
false. So for that use to be valid, it is
assumed that you only ask if you are is_done() once. But the way I use it, I 
ask this question potentially more than
once after we are done, and expect it to return false consistently. That's why 
I read it once from the stream and cache
the result that I use. One might argue that the underlying stream class has 
undesirable behaviour, but fixing that
seemed outside the scope of this RFE. I have had too many rabbit holes already.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v4]

2020-09-24 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Review: Albert CR 1

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/296/files
  - new: https://git.openjdk.java.net/jdk/pull/296/files/71db0303..4e2aabbb

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=296=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=296=02-03

  Stats: 36 lines in 2 files changed: 35 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v3]

2020-09-23 Thread Erik Österlund
On Wed, 23 Sep 2020 12:59:40 GMT, Erik Österlund  wrote:

>> src/hotspot/share/runtime/stackWatermarkSet.cpp line 112:
>> 
>>> 110: return;
>>> 111:   }
>>> 112:   verify_poll_context();
>> 
>> There's a verfy_poll_context here, but no update_poll_values call. I guess 
>> this is because we could be iterating over a
>> thread that is not Thread::current()? But in that case, should we really be 
>> "verifying the poll context" of the current
>> thread?
>
> Correct - I'm not updating the poll values, because it's not necessarily the 
> current thread we are processing. But the
> verify_poll_context() code does correctly always refer to the current frame. 
> It verifies we are in the right context to
> perform this kind of processing, yet the processing could be for a different 
> thread.

Updating the name to verify_processing_context to better reflect what this does.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v3]

2020-09-23 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request incrementally with one additional 
commit since the last revision:

  Review: SteafanK CR 2

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/296/files
  - new: https://git.openjdk.java.net/jdk/pull/296/files/e77e8e88..71db0303

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=296=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=296=01-02

  Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v3]

2020-09-23 Thread Erik Österlund
On Wed, 23 Sep 2020 10:27:05 GMT, Stefan Karlsson  wrote:

>> Erik Österlund has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Review: SteafanK CR 2
>
> I've gone over the entire patch, but I'm leaving the compiler parts to others 
> to review.

Thanks for reviewing @stefank, and thanks for all the pre-reviewing as well. I 
have updated the PR to resolve all your
comments.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v2]

2020-09-23 Thread Erik Österlund
On Wed, 23 Sep 2020 08:33:20 GMT, Erik Österlund  wrote:

>> src/hotspot/share/compiler/oopMap.cpp line 243:
>> 
>>> 241:   } else {
>>> 242: all_do(fr, reg_map, f, process_derived_oop, _nothing_cl);
>>> 243:   }
>> 
>> I wonder if we shouldn't hide the StackWatermarkSet in the GC code, and not 
>> "activate" the DerivedPointerTable when a
>> SWS is used? Isn't it the case that already don't enable the table for ZGC? 
>> Couldn't this simply be: `
>>   if (DerivedPointerTable::is_active()) {
>> all_do(fr, reg_map, f, add_derived_oop, _nothing_cl);
>>   } else {
>> all_do(fr, reg_map, f, process_derived_oop, _nothing_cl);
>>   }
>> `
>
> The problem isn't the GC code deciding to use or not use the derived pointer 
> "table". It's the shared code uses of it
> that is the problem here. The table is explicitly activated by shared code, 
> when for example using the JFR leak
> profiler, JVMTI heap walks, etc. This code makes the selection a GC choice 
> when invoked by GC code, and a shared
> runtime choice when invoked by the shared runtime.

After discussions off-line, I think passing in an explicit derived pointer 
iteration mode into oops_do, is much better
than trying to figure out who the caller is from this context.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v2]

2020-09-23 Thread Erik Österlund
> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
> Processing" (cf.
> https://openjdk.java.net/jeps/376).
> Basically, this patch modifies the epilog safepoint when returning from a 
> frame (supporting interpreter frames, c1, c2,
> and native wrapper frames), to compare the stack pointer against a 
> thread-local value. This turns return polls into
> more of a swiss army knife that can be used to poll for safepoints, 
> handshakes, but also returns into not yet safe to
> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
> other thread oops) in a state of a mess in
> the GC checkpoint safepoints, rather than processing all threads and their 
> stacks. Processing is initialized
> automagically when threads wake up for a safepoint, or get poked by a 
> handshake or safepoint. Said initialization
> processes a few (3) frames and other thread oops. The rest - the bulk of the 
> frame processing, is deferred until it is
> actually needed. It is needed when a frame is exposed to either 1) execution 
> (returns or unwinding due to exception
> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
> lazy processing of frames.  Mutator and GC
> threads can compete for processing. The processing is therefore performed 
> under a per-thread lock. Note that disarming
> of the poll word (that the returns are comparing against) is only performed 
> by the thread itself. So sliding the
> watermark up will require one runtime call for a thread to note that nothing 
> needs to be done, and then update the poll
> word accordingly. Downgrading the poll word concurrently by other threads was 
> simply not worth the complexity it
> brought (and is only possible on TSO machines). So left that one out.

Erik Österlund has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now
contains three commits:

 - Merge branch 'master' into 8253180_conc_stack_scanning
 - Review: SteafanK CR 1
 - 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack 
Processing

-

Changes: https://git.openjdk.java.net/jdk/pull/296/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=296=01
  Stats: 2629 lines in 130 files changed: 2079 ins; 289 del; 261 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-09-23 Thread Erik Österlund
On Wed, 23 Sep 2020 08:51:08 GMT, Stefan Karlsson  wrote:

>> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
>> Processing" (cf.
>> https://openjdk.java.net/jeps/376).
>> Basically, this patch modifies the epilog safepoint when returning from a 
>> frame (supporting interpreter frames, c1, c2,
>> and native wrapper frames), to compare the stack pointer against a 
>> thread-local value. This turns return polls into
>> more of a swiss army knife that can be used to poll for safepoints, 
>> handshakes, but also returns into not yet safe to
>> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
>> other thread oops) in a state of a mess in
>> the GC checkpoint safepoints, rather than processing all threads and their 
>> stacks. Processing is initialized
>> automagically when threads wake up for a safepoint, or get poked by a 
>> handshake or safepoint. Said initialization
>> processes a few (3) frames and other thread oops. The rest - the bulk of the 
>> frame processing, is deferred until it is
>> actually needed. It is needed when a frame is exposed to either 1) execution 
>> (returns or unwinding due to exception
>> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
>> lazy processing of frames.  Mutator and GC
>> threads can compete for processing. The processing is therefore performed 
>> under a per-thread lock. Note that disarming
>> of the poll word (that the returns are comparing against) is only performed 
>> by the thread itself. So sliding the
>> watermark up will require one runtime call for a thread to note that nothing 
>> needs to be done, and then update the poll
>> word accordingly. Downgrading the poll word concurrently by other threads 
>> was simply not worth the complexity it
>> brought (and is only possible on TSO machines). So left that one out.
>
> src/hotspot/share/jfr/leakprofiler/checkpoint/rootResolver.cpp line 275:
> 
>> 273:
>> 274: // Traverse the execution stack
>> 275: for (StackFrameStream fst(jt, true /* update */, true /* 
>> process_frames */); !fst.is_done(); fst.next()) {
> 
> Noticed that lines 261-262 refers to the array that was removed with:
> JDK-8252656: Replace RegisterArrayForGC mechanism with plain Handles
> 
> And the comment in block 299-304 might need some love.
> 
> It's not directly related to this patch, but something that has been moved 
> around in preparation for this patch. I
> wouldn't be opposed to cleaning that up with this patch.

Will do.

> src/hotspot/share/runtime/stackWatermarkSet.cpp line 112:
> 
>> 110: return;
>> 111:   }
>> 112:   verify_poll_context();
> 
> There's a verfy_poll_context here, but no update_poll_values call. I guess 
> this is because we could be iterating over a
> thread that is not Thread::current()? But in that case, should we really be 
> "verifying the poll context" of the current
> thread?

Correct - I'm not updating the poll values, because it's not necessarily the 
current thread we are processing. But the
verify_poll_context() code does correctly always refer to the current frame. It 
verifies we are in the right context to
perform this kind of processing, yet the processing could be for a different 
thread.

> src/hotspot/share/runtime/stackWatermarkSet.cpp line 125:
> 
>> 123: watermark->start_processing();
>> 124:   }
>> 125: }
> 
> No update poll values here?

Like the previous scenario, start_processing may be called on a different 
thread than Thread::current. The thread the
stack belongs to will always update the poll values when it wakes up from the 
safepoint. Other threads have no business
updating it.

> src/hotspot/share/utilities/vmError.cpp line 754:
> 
>> 752:Thread* thread = ((NamedThread *)_thread)->processed_thread();
>> 753:if (thread != NULL && thread->is_Java_thread()) {
>> 754:  JavaThread* jt = static_cast(thread);
> 
> as_Java_thread() - maybe just search for this in the patch

Done.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-09-23 Thread Erik Österlund
On Wed, 23 Sep 2020 07:39:55 GMT, Stefan Karlsson  wrote:

>> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
>> Processing" (cf.
>> https://openjdk.java.net/jeps/376).
>> Basically, this patch modifies the epilog safepoint when returning from a 
>> frame (supporting interpreter frames, c1, c2,
>> and native wrapper frames), to compare the stack pointer against a 
>> thread-local value. This turns return polls into
>> more of a swiss army knife that can be used to poll for safepoints, 
>> handshakes, but also returns into not yet safe to
>> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
>> other thread oops) in a state of a mess in
>> the GC checkpoint safepoints, rather than processing all threads and their 
>> stacks. Processing is initialized
>> automagically when threads wake up for a safepoint, or get poked by a 
>> handshake or safepoint. Said initialization
>> processes a few (3) frames and other thread oops. The rest - the bulk of the 
>> frame processing, is deferred until it is
>> actually needed. It is needed when a frame is exposed to either 1) execution 
>> (returns or unwinding due to exception
>> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
>> lazy processing of frames.  Mutator and GC
>> threads can compete for processing. The processing is therefore performed 
>> under a per-thread lock. Note that disarming
>> of the poll word (that the returns are comparing against) is only performed 
>> by the thread itself. So sliding the
>> watermark up will require one runtime call for a thread to note that nothing 
>> needs to be done, and then update the poll
>> word accordingly. Downgrading the poll word concurrently by other threads 
>> was simply not worth the complexity it
>> brought (and is only possible on TSO machines). So left that one out.
>
> src/hotspot/share/gc/z/zCollectedHeap.cpp line 235:
> 
>> 233:   return true;
>> 234: }
>> 235:
> 
> Weird placement between store barrier functions. But even weirder that we 
> still have those functions. I'll remove them
> with JDK-8253516.

Perfect, thanks.

> src/hotspot/share/gc/z/zDriver.cpp line 108:
> 
>> 106: return false;
>> 107:   }
>> 108:
> 
> Group needs_inactive_gc_locker, skip_thread_oop_barriers, and 
> allow_coalesced_vm_operations together?
> 
> Add a comment about why we chose to skip coalescing here.

Per explicitly wanted skip_thread_oop_barriers grouped with 
needs_inactive_gc_locker. But I should remove
allow_coalesced_vm_operations; it is no longer used since I have removed VM 
operation coalescing completely already.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-09-23 Thread Erik Österlund
On Wed, 23 Sep 2020 07:17:31 GMT, Stefan Karlsson  wrote:

>> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
>> Processing" (cf.
>> https://openjdk.java.net/jeps/376).
>> Basically, this patch modifies the epilog safepoint when returning from a 
>> frame (supporting interpreter frames, c1, c2,
>> and native wrapper frames), to compare the stack pointer against a 
>> thread-local value. This turns return polls into
>> more of a swiss army knife that can be used to poll for safepoints, 
>> handshakes, but also returns into not yet safe to
>> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
>> other thread oops) in a state of a mess in
>> the GC checkpoint safepoints, rather than processing all threads and their 
>> stacks. Processing is initialized
>> automagically when threads wake up for a safepoint, or get poked by a 
>> handshake or safepoint. Said initialization
>> processes a few (3) frames and other thread oops. The rest - the bulk of the 
>> frame processing, is deferred until it is
>> actually needed. It is needed when a frame is exposed to either 1) execution 
>> (returns or unwinding due to exception
>> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
>> lazy processing of frames.  Mutator and GC
>> threads can compete for processing. The processing is therefore performed 
>> under a per-thread lock. Note that disarming
>> of the poll word (that the returns are comparing against) is only performed 
>> by the thread itself. So sliding the
>> watermark up will require one runtime call for a thread to note that nothing 
>> needs to be done, and then update the poll
>> word accordingly. Downgrading the poll word concurrently by other threads 
>> was simply not worth the complexity it
>> brought (and is only possible on TSO machines). So left that one out.
>
> src/hotspot/share/gc/z/zStackWatermark.cpp line 78:
> 
>> 76:   ZThreadLocalData::do_invisible_root(_jt, 
>> ZBarrier::load_barrier_on_invisible_root_oop_field);
>> 77:
>> 78:   ZVerify::verify_thread_frames_bad(_jt);
> 
> Every time I read the name verify_thread_no_frames_bad I read it as "verify 
> that no frames are bad", but that's not at
> all what that function does.
> Is there still a reason why verify_thread_no_frames_bad and 
> verify_thread_frames_bad are split up into two functions?
> Or could we fuse them into a ZVerify::verify_thread_bad?

I will try this.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-09-23 Thread Erik Österlund
On Tue, 22 Sep 2020 13:45:16 GMT, Stefan Karlsson  wrote:

>> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
>> Processing" (cf.
>> https://openjdk.java.net/jeps/376).
>> Basically, this patch modifies the epilog safepoint when returning from a 
>> frame (supporting interpreter frames, c1, c2,
>> and native wrapper frames), to compare the stack pointer against a 
>> thread-local value. This turns return polls into
>> more of a swiss army knife that can be used to poll for safepoints, 
>> handshakes, but also returns into not yet safe to
>> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
>> other thread oops) in a state of a mess in
>> the GC checkpoint safepoints, rather than processing all threads and their 
>> stacks. Processing is initialized
>> automagically when threads wake up for a safepoint, or get poked by a 
>> handshake or safepoint. Said initialization
>> processes a few (3) frames and other thread oops. The rest - the bulk of the 
>> frame processing, is deferred until it is
>> actually needed. It is needed when a frame is exposed to either 1) execution 
>> (returns or unwinding due to exception
>> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
>> lazy processing of frames.  Mutator and GC
>> threads can compete for processing. The processing is therefore performed 
>> under a per-thread lock. Note that disarming
>> of the poll word (that the returns are comparing against) is only performed 
>> by the thread itself. So sliding the
>> watermark up will require one runtime call for a thread to note that nothing 
>> needs to be done, and then update the poll
>> word accordingly. Downgrading the poll word concurrently by other threads 
>> was simply not worth the complexity it
>> brought (and is only possible on TSO machines). So left that one out.
>
> src/hotspot/share/compiler/oopMap.cpp line 243:
> 
>> 241:   } else {
>> 242: all_do(fr, reg_map, f, process_derived_oop, _nothing_cl);
>> 243:   }
> 
> I wonder if we shouldn't hide the StackWatermarkSet in the GC code, and not 
> "activate" the DerivedPointerTable when a
> SWS is used? Isn't it the case that already don't enable the table for ZGC? 
> Couldn't this simply be: `
>   if (DerivedPointerTable::is_active()) {
> all_do(fr, reg_map, f, add_derived_oop, _nothing_cl);
>   } else {
> all_do(fr, reg_map, f, process_derived_oop, _nothing_cl);
>   }
> `

The problem isn't the GC code deciding to use or not use the derived pointer 
"table". It's the shared code uses of it
that is the problem here. The table is explicitly activated by shared code, 
when for example using the JFR leak
profiler, JVMTI heap walks, etc. This code makes the selection a GC choice when 
invoked by GC code, and a shared
runtime choice when invoked by the shared runtime.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-09-23 Thread Erik Österlund
On Tue, 22 Sep 2020 13:17:05 GMT, Stefan Karlsson  wrote:

>> This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
>> Processing" (cf.
>> https://openjdk.java.net/jeps/376).
>> Basically, this patch modifies the epilog safepoint when returning from a 
>> frame (supporting interpreter frames, c1, c2,
>> and native wrapper frames), to compare the stack pointer against a 
>> thread-local value. This turns return polls into
>> more of a swiss army knife that can be used to poll for safepoints, 
>> handshakes, but also returns into not yet safe to
>> expose frames, denoted by a "stack watermark".  ZGC will leave frames (and 
>> other thread oops) in a state of a mess in
>> the GC checkpoint safepoints, rather than processing all threads and their 
>> stacks. Processing is initialized
>> automagically when threads wake up for a safepoint, or get poked by a 
>> handshake or safepoint. Said initialization
>> processes a few (3) frames and other thread oops. The rest - the bulk of the 
>> frame processing, is deferred until it is
>> actually needed. It is needed when a frame is exposed to either 1) execution 
>> (returns or unwinding due to exception
>> handling), or 2) stack walker APIs. A hook is then run to go and finish the 
>> lazy processing of frames.  Mutator and GC
>> threads can compete for processing. The processing is therefore performed 
>> under a per-thread lock. Note that disarming
>> of the poll word (that the returns are comparing against) is only performed 
>> by the thread itself. So sliding the
>> watermark up will require one runtime call for a thread to note that nothing 
>> needs to be done, and then update the poll
>> word accordingly. Downgrading the poll word concurrently by other threads 
>> was simply not worth the complexity it
>> brought (and is only possible on TSO machines). So left that one out.
>
> src/hotspot/cpu/x86/templateInterpreterGenerator_x86_64.cpp line 194:
> 
>> 192:
>> 193: Label slow_path;
>> 194: __ safepoint_poll(slow_path, r15_thread, true /* at_return */, 
>> false /* in_nmethod */);
> 
> Why is this tagged with 'true /* at_return */? Same for line 240. The _x86_32 
> version uses false.

Good point.

> src/hotspot/share/c1/c1_Runtime1.cpp line 515:
> 
>> 513:   if (thread->last_frame().is_runtime_frame()) {
>> 514: // The Runtime1::handle_exception_from_callee_id handler is invoked 
>> after the
>> 515: // frame has been unwinded. It instead builds its own stub frame, 
>> to call the
> 
> Unwided -> unwound, maybe? Same below and probably other places as well.

Good point.

-

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-09-22 Thread Erik Österlund
On Tue, 22 Sep 2020 14:38:05 GMT, Daniel D. Daugherty  
wrote:

> @fisk - You are missing testing information for this PR.

Sorry about that. I have performed testing with ZGC enabled from tier1-7 
(multiple times), and standard testing
tier1-5. I have also stress tested the change with gc-test-suite, with full 
verification on (there is a bunch of it),
and the GC interval reduced to very small (GC all the time). Recent versions of 
this patch I also tested from tier1-8
(2 times). I have also perf tested this in dev-submit to check for regressions 
(none found), as well as stress tested
SPECjbb2015 on a 3 TB machine with ~2200 threads. I have also manually stress 
tested the changes on an AArch64 machine,
using gc-test-suite.

-

PR: https://git.openjdk.java.net/jdk/pull/296


RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing

2020-09-22 Thread Erik Österlund
This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack 
Processing" (cf.
https://openjdk.java.net/jeps/376).

Basically, this patch modifies the epilog safepoint when returning from a frame 
(supporting interpreter frames, c1, c2,
and native wrapper frames), to compare the stack pointer against a thread-local 
value. This turns return polls into
more of a swiss army knife that can be used to poll for safepoints, handshakes, 
but also returns into not yet safe to
expose frames, denoted by a "stack watermark".

ZGC will leave frames (and other thread oops) in a state of a mess in the GC 
checkpoint safepoints, rather than
processing all threads and their stacks. Processing is initialized 
automagically when threads wake up for a safepoint,
or get poked by a handshake or safepoint. Said initialization processes a few 
(3) frames and other thread oops. The
rest - the bulk of the frame processing, is deferred until it is actually 
needed. It is needed when a frame is exposed
to either 1) execution (returns or unwinding due to exception handling), or 2) 
stack walker APIs. A hook is then run to
go and finish the lazy processing of frames.

Mutator and GC threads can compete for processing. The processing is therefore 
performed under a per-thread lock. Note
that disarming of the poll word (that the returns are comparing against) is 
only performed by the thread itself. So
sliding the watermark up will require one runtime call for a thread to note 
that nothing needs to be done, and then
update the poll word accordingly. Downgrading the poll word concurrently by 
other threads was simply not worth the
complexity it brought (and is only possible on TSO machines). So left that one 
out.

-

Commit messages:
 - 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack 
Processing

Changes: https://git.openjdk.java.net/jdk/pull/296/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=296=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8253180
  Stats: 2560 lines in 131 files changed: 2046 ins; 279 del; 235 mod
  Patch: https://git.openjdk.java.net/jdk/pull/296.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/296/head:pull/296

PR: https://git.openjdk.java.net/jdk/pull/296


Re: RFR: 8247281: migrate ObjectMonitor::_object to OopStorage [v6]

2020-09-16 Thread Erik Österlund
On Wed, 16 Sep 2020 16:13:23 GMT, Daniel D. Daugherty  
wrote:

>> This RFE is to migrate the following field to OopStorage:
>> 
>> class ObjectMonitor {
>> 
>>   void* volatile _object; // backward object pointer - strong root
>> 
>> Unlike the previous patches in this series, there are a lot of collateral
>> changes so this is not a trivial review. Sorry for the tedious parts of
>> the review. Since Erik and I are both contributors to this patch, we
>> would like at least 1 GC team reviewer and 1 Runtime team reviewer.
>> 
>> This changeset was tested with Mach5 Tier[1-3],4,5,6,7,8 testing
>> along with JDK-8252980 and JDK-8252981. I also ran it through my
>> inflation stress kit for 48 hours on my Linux-X64 machine.
>
> Daniel D. Daugherty has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Minor changes/deletions to address kimbarrett and sspitsyn CR comments.

Marked as reviewed by eosterlund (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/135


Re: RFR: 8247281: migrate ObjectMonitor::_object to OopStorage [v5]

2020-09-16 Thread Erik Österlund
On Wed, 16 Sep 2020 00:21:02 GMT, Serguei Spitsyn  wrote:

>> Daniel D. Daugherty has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   rkennke, coleenp, fisk CR - delete random assert() that knows too much 
>> about markWords.
>
> Marked as reviewed by sspitsyn (Reviewer).

I added a release note (https://bugs.openjdk.java.net/browse/JDK-8253225) 
describing that these roots are now weak, and
hence won't be reported. Please have a look at that, to make sure what I am 
describing makes sense.

-

PR: https://git.openjdk.java.net/jdk/pull/135


Re: RFR: 8247281: migrate ObjectMonitor::_object to OopStorage [v4]

2020-09-14 Thread Erik Österlund
On Mon, 14 Sep 2020 20:51:58 GMT, Roman Kennke  wrote:

>> @fisk - please chime in here...
>
> I agree. Also, the assert becomes true somewhat obviously because of its 
> first clause, which is already guaranteed
> because of its placement in the surrounding else-branch (unless something 
> really weird happens).

I am also 100% okay with just removing the random assertion. We have a whole 
bunch of other block iteration code in GC
code that does not sprinkle random asserts about what patterns of markWords are 
supposedly good or bad in live/dead
objects. And I also don't get what kind of bug this would catch.

-

PR: https://git.openjdk.java.net/jdk/pull/135


Re: RFR: 8247281: migrate ObjectMonitor::_object to OopStorage [v2]

2020-09-14 Thread Erik Österlund
On Mon, 14 Sep 2020 14:24:02 GMT, Roman Kennke  wrote:

> > @rkennke - Thanks for the review. I believe @fisk is going to address
> > your comments.
> 
> Actually, if I understand it correctly, OopStorage already gives us full 
> barriers, so we should be covered, and we can
> simply revert JDK-8251451: 
> http://cr.openjdk.java.net/~rkennke/8247281-shenandoah.patch
> (Is there a better way to propose amendments to a PR?!)
> 
> I believe this probably also means that we don't need to scan object monitor 
> lists during thread-scans, and let SATB
> (or whatever concurrent marking) take care of it. @fisk WDYT?

Absolutely. GC should no longer have to know anything about ObjectMonitors, 
only the automatically plugged in
OopStorage that it processes. GC should not walk any monitor lists at all.

-

PR: https://git.openjdk.java.net/jdk/pull/135


Re: RFR: 8247281: migrate ObjectMonitor::_object to OopStorage [v2]

2020-09-14 Thread Erik Österlund
On Mon, 14 Sep 2020 02:01:58 GMT, David Holmes  wrote:

> Not sure about the JVM TI changes!

Here is a generic comment. You mention that the specification doesn't make it 
clear whether the roots reported are
strong or weak. This is true. There is no mention about roots being strong or 
weak here. Notably, all listed roots, are
strong though. I believe that with the chosen interpretation of "root", it is 
implied that it is strong, and hence why
strong vs weak roots is not discussed. According to the memory management 
glossary
(https://www.memorymanagement.org/glossary/r.html#glossary-r), this is the 
definition of a root: "In tracing garbage
collection, a root holds a reference or set of references to objects that are a 
priori reachable. The root set is used
as the starting point in determining all reachable data." Note that, this 
definition of root makes it implicit that it
is also strong. And after a long discussion with GC nerds in Stockholm, this is 
in fact also why we have IN_NATIVE oop
accesses, rather than IN_ROOT oop accesses. Because the opinions differed too 
much about what a "root" is. But to me it
really does look like the JVMTI interpretation of a root, is a strong root, and 
hence that this is precisely what the
text is talking about when mentioning roots, and hence also why all the 
enumerated roots, are indeed strong roots. Same
comments for the HPROF_GC_ROOT_* discussion, as for the JVMTI_HEAP_ROOT_* 
discussion. That is also why I think it is
now a bug to report them, as they are no longer "roots" in the JVMTI/HPROF 
sense. But you are right, filing a CSR for
the behavioural change is probably a good idea.

-

PR: https://git.openjdk.java.net/jdk/pull/135


Re: RFR: 8247281: migrate ObjectMonitor::_object to OopStorage [v2]

2020-09-14 Thread Erik Österlund
On Mon, 14 Sep 2020 14:15:41 GMT, Erik Österlund  wrote:

>> @rkennke - Thanks for the review. I believe @fisk is going to address
>> your comments.
>
> Hi Kim,
> 
> Here is a partial reply to your review. Thanks for reviewing!
> Hmm seems like your email was only sent to shenandoah-dev. Not sure if that 
> was intended. I'm not subscribed to that
> mailing list, so I will send my reply through github and hope for the best.
>> _Mailing list message from [Kim Barrett](mailto:kim.barr...@oracle.com) on
>> [shenandoah-dev](mailto:shenandoah-...@openjdk.java.net):_
>> --
>> src/hotspot/share/oops/weakHandle.cpp
>> 36 WeakHandle::WeakHandle(OopStorage* storage, oop obj) :
>> 37 _obj(storage->allocate()) {
>> 38 assert(obj != NULL, "no need to create weak null oop");
>> 
>> Please format this differently so the ctor-init-list is more easily
>> distinguished from the body. I don't care that much which of the several
>> alternatives is used.
>> 
>> --
>> src/hotspot/share/runtime/objectMonitor.cpp
>> 244 // Check that object() and set_object() are called from the right 
>> context:
>> 245 static void check_object_context() {
>> 
>> This seems like checking we would normally only do in a debug build. Is this
>> really supposed to be done in product builds too? (It's written to support
>> that, just wondering if that's really what we want.) Maybe these aren't
>> called very often so it doesn't matter? I also see that guarantee (rather
>> than assert) is used a fair amount in this and related code.
> 
> I don't think I have a preference here. As you say, it seems a bit mixed. I 
> would be okay with both. Do you want them
> to be asserts?
>> --
>> src/hotspot/share/runtime/objectMonitor.cpp
>> 251 JavaThread* jt = (JavaThread*)self;
>> 
>> Use self->as_Java_thread().
>> 
>> Later: Coleen already commented on this and it's been fixed.
>> 
>> --
>> src/hotspot/share/runtime/objectMonitor.cpp
>> 249 guarantee(self->is_Java_thread() || self->is_VM_thread(), "must be");
>> 250 if (self->is_Java_thread()) {
>> 
>> Maybe instead
>> 
>> if (self->is_Java_thread()) {
>> ...
>> } else {
>> guarantee(self->is_VM_thread(), "must be");
>> }
>> 
>> --
>> src/hotspot/share/runtime/objectMonitor.cpp
>> Both ObjectMonitor::object() and ObjectMonitor::object_peek() have
>> 265 if (_object.is_null()) {
>> 266 return NULL;
>> 
>> Should we really be calling those functions when in such a state? That seems
>> like it might be a bug in the callers?
>> 
>> OK, I think I see some places where object_peek() might need such protection
>> because of races, in src/hotspot/share/runtime/synchronizer.cpp. And
>> because we don't seem to ever release() the underlying WeakHandles.
>> 
>> 514 if (m->object_peek() == NULL) {
>> 703 if (cur_om->object_peek() == NULL) {
>> 
>> But it still seems like it might be a bug to call object() in such a state.
> 
> The problem is Type Stable Memory (TSM). The idea is that all monitors are 
> allocated in blocks of many (when the object
> they are associated with is not yet known), and are then re-used, switching 
> object association back and forth. They are
> also never freed.  The changes in this RFR were originally tied together with 
> a patch that also removes TSM. But this
> has been split into separate parts by Dan in order to perform the OopStorage 
> migration first, and then removing TSM. So
> in this patch, we are still dealing with block-allocation of monitors before 
> they have an associated object, and
> changing of object association. Therefore, what you are saying is 100% true, 
> but I would like to defer fixing that
> until the follow-up patch that removes TSM. In that patch, the accessors are 
> changed to do exactly what you expect them
> to (resolve/peek), as the ObjectMonitor instances are allocated with a single 
> object association in mind, installed in
> the constructor, and get deleted after deflation, with destuctors run. So I 
> hope we can agree to wait until that
> subsequent patch with addressing such concerns. We thought reviewing the 
> wh

  1   2   >