Re: RFR: 8287135: Calculation of jmm_GetMemoryUsage is wrong

2022-05-27 Thread Thomas Stuefe
On Fri, 27 May 2022 06:25:45 GMT, Yi Yang  wrote:

> 
> If we remove `CompressedClassSpace`, we can only 
> `getMemoryPool("Metaspace")`. Although metaspace is not baked in the 
> specification, IMHO it's easier for developers to understand what is 
> `metaspace` compared to the concepts of `Non Class Space` and `Compressed 
> Class Space`.

I personally think that would be totally fine.

-

PR: https://git.openjdk.java.net/jdk/pull/8831


Re: RFR: 8287135: Calculation of jmm_GetMemoryUsage is wrong

2022-05-27 Thread Thomas Stuefe
On Fri, 27 May 2022 06:22:36 GMT, Ioi Lam  wrote:

> > The basic problem is that we have two non-heap pools:
> > 
> > * `MetaspacePool`
> >   
> >   * consists of `ClassType` and `NonClassType` parts
> > * `CompressedKlassSpacePool`
> > 
> > but the `CompressedKlassSpacePool` is actually the "ClassType" part of the 
> > `MetaspacePool`!
> > I think the right fix is to just convert the `MetaspacePool` into 
> > `NonClassMetaspacePool` and only report the non-class values.
> > AFAICS this will only be visible via the MXBean.
> 
> When CDS is enabled, the CompressedKlassSpacePool actually contains more than 
> Klass objects. It can contain other metadata such as Method, ConstantPool, 
> etc. So calling these pools "class" and "non-class" is not 100% correct.
> 
> Is there any reason for separating the CompressedKlassSpacePool from other 
> metadata? I know in the past it was possible to run out of spaces in 
> CompressedKlassSpace, so there might be a reason to monitor it separately?

Yes, you can run out of Class space separately from Metaspace.

-

PR: https://git.openjdk.java.net/jdk/pull/8831


Re: RFR: 8287135: Calculation of jmm_GetMemoryUsage is wrong

2022-05-26 Thread Thomas Stuefe
On Thu, 26 May 2022 11:32:46 GMT, Yi Yang  wrote:

>> I think the problem is the definition of the pools. We seem to have nested 
>> pools but it is far from clear that this API/mechanism was designed/intended 
>> to support nested pools.
>
> In any case, it's unreasonable for getNonHeapMemoryUsage to count a piece of 
> memory repeatedly, right? In the extreme case, we might add all nested pools 
> of metaspace, and we will get Metaspace x2 + CodeCache when using 
> getNonHeapMemoryUsage, which should be far beyond what developers expect. In 
> a real scenario, some monitor platforms draw a line chart by collecting 
> getHeapMemoryUsage and getNonHeapMemoryUsage. It's confusing for end-users of 
> the monitor platform to understand why getHeapMemoryUsage is desired while 
> getNonHeapMemoryUsage is bigger than expectation.
> 
> From the developer's point of view, MemoryMXBean.getNonHeapMemoryUsage is 
> expected to obtain the size of non-heap area. Given that 
> MemoryMXBean.getHeapMemoryUsage is clearly stated which is heap area:
> 
>  * Returns the current memory usage of the heap that
>  * is used for object allocation.  The heap consists
>  * of one or more memory pools.
> 
> 
> I propose to revise the Java doc to describe the definition of non-heap area 
> more precisely:
> 
> /**
>  * Returns the current memory usage of non-heap memory that
>  * contains code cache and metaspace.
>  * The non-heap memory consists of one or more memory pools.

Beside my general points, I think this patch makes sense. I agree with 
@kelthuzadx that counting Classspace twice is wrong, whatever the intention of 
this API originally was.

-

PR: https://git.openjdk.java.net/jdk/pull/8831


Re: RFR: 8287135: Calculation of jmm_GetMemoryUsage is wrong

2022-05-26 Thread Thomas Stuefe
On Thu, 26 May 2022 12:47:42 GMT, David Holmes  wrote:

>> In any case, it's unreasonable for getNonHeapMemoryUsage to count a piece of 
>> memory repeatedly, right? In the extreme case, we might add all nested pools 
>> of metaspace, and we will get Metaspace x2 + CodeCache when using 
>> getNonHeapMemoryUsage, which should be far beyond what developers expect. In 
>> a real scenario, some monitor platforms draw a line chart by collecting 
>> getHeapMemoryUsage and getNonHeapMemoryUsage. It's confusing for end-users 
>> of the monitor platform to understand why getHeapMemoryUsage is desired 
>> while getNonHeapMemoryUsage is bigger than expectation.
>> 
>> From the developer's point of view, MemoryMXBean.getNonHeapMemoryUsage is 
>> expected to obtain the size of non-heap area. Given that 
>> MemoryMXBean.getHeapMemoryUsage is clearly stated which is heap area:
>> 
>>  * Returns the current memory usage of the heap that
>>  * is used for object allocation.  The heap consists
>>  * of one or more memory pools.
>> 
>> 
>> I propose to revise the Java doc to describe the definition of non-heap area 
>> more precisely:
>> 
>> /**
>>  * Returns the current memory usage of non-heap memory that
>>  * contains code cache and metaspace.
>>  * The non-heap memory consists of one or more memory pools.
>
> No, things like code-cache and metaspace do not get baked into the 
> specification. "non-heap" is deliberately vague and open-ended.
> 
> I can agree the calculation may be wrong but we need to determine exactly why 
> it is wrong. To me the pool definitions seem wrong - they shouldn't be 
> nested, so then you couldn't double-count. But we really need the experts on 
> this code to chime in.

Sorry for chiming in in a not very helpful way, but I am not sure what even the 
point is of this API. 

I'm seriously interested in who uses it, and for what. Calculating real memory 
usage is notoriously difficult. This one seems to be just an arbitrary 
selection of stuff. It leaves out C-heap usage (VM, outer JDK, and third party 
libs), native mappings, sometimes massive system library overhead, text 
segments, thread stacks... that it counts class space twice is probably its 
smallest problem...

-

PR: https://git.openjdk.java.net/jdk/pull/8831


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-19 Thread Thomas Stuefe
On Wed, 18 May 2022 18:10:45 GMT, Severin Gehwolf  wrote:

>> @iklam yes I meant `Find the longest common prefix`. Fixed the comment.
>
> I'm not convinced the extra function makes the code more readable, but here 
> it is. I can revert back if this is too much.

@jerboaa Feel free to go back to your original variant. This was only a 
proposal.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v12]

2022-05-17 Thread Thomas Stuefe
On Tue, 17 May 2022 13:53:36 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Change comment wording

Looks good to me now.

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v12]

2022-05-17 Thread Thomas Stuefe
On Tue, 17 May 2022 13:53:36 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Change comment wording

src/hotspot/share/code/codeCache.cpp line 662:

> 660:   bool is_zombie = result != NULL && result->is_zombie();
> 661:   bool is_result_safe = !is_zombie || result->is_locked_by_vm() || 
> VMError::is_error_reported();
> 662:   guarantee(is_result_safe || is_in_asgct(), "unsafe access to zombie 
> method");

Why is this a guarantee? Does it still need to be one? We usually avoid paying 
for assert setup in release VMs, its rather uncommon to use guarantee.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v9]

2022-05-17 Thread Thomas Stuefe
On Tue, 17 May 2022 14:54:23 GMT, Johannes Bechberger  
wrote:

>> But you aren't keeping it on the stack versus TLS you are using a stack 
>> object to set the TLS. All we have done in this final form is replace:
>> 
>> 
>> thread->set_in_asgct(true);
>> 
>> thread->set_in_asgct(false);
>> 
>> with
>> 
>> 
>> ASGCTMark mark(thread);
>> 
>> 
>> but now ASGCTMark has to worry about whether `thread` is NULL, whether it is 
>> the current javaThread, when we've already done all that for `thread`.
>> 
>> And this is the only place we will ever use ASGCTMark. Now if we had lots of 
>> return points in `` then RAII would definitely be better to ensure we 
>> can't forget to reset the state. But we don't. So use of RAII adds 
>> complexity cost now just in case in the future RAII might be useful.
>
> But the crash-handler does not prevent that `guarantee` aborts the VM, as we 
> discussed already in my related PR. My comment was more a suggestion and 
> related to "And this is the only place we will ever use ASGCTMark" by 
> @dholmes-ora . I agree with you that changing JFR should be done in another 
> PR.

> AFAIK, currently JFR is 'wrapping' the code in a crash-handler. We can take a 
> look at reusing this approach there in some follow-up PR but for now I would 
> really prefer getting this one merged without attaching any more 
> bells

@jbachorik No, please don't do that. That approach is very unsafe and should be 
used with extreme care. I would actually prefer for it to get removed 
completely, not to be reused.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v9]

2022-05-17 Thread Thomas Stuefe
On Tue, 17 May 2022 12:23:56 GMT, David Holmes  wrote:

>> Hi David,
>> 
>>> You can't call `JavaThread::current()` in any of this code as it is not 
>>> safe from a signal handling context. (There is an existing use in ASGCT 
>>> that is also unsafe.) I suggest not having the no-arg constructor and 
>>> require the current JavaThread, or null, to be passed in (having already 
>>> been safely obtained by the caller). You can then assert using 
>>> `Thread::current_or_null_safe()`.
>>> 
>>> Personally I find the ASGCTMark complete overkill for this situation as 
>>> there is only ever going to be a single use - sorry @tstuefe - it is just 
>>> complicating things IMO.
>> 
>> No problem, that's why we talk :)
>> 
>> To make an argument for this: if you compare the first iterations of this 
>> patch to the current one, the current one is a lot simpler. I really dislike 
>> funneling arguments through many layers down to a utility function, only to 
>> fine control one specific behavioral aspect of that tiny function, and that 
>> aspect is only relevant for one outlier use case. I dislike this more than 
>> the RAII object.
>> 
>> Look at it this way. We have a thread-specific state (we are in AGCT and 
>> deep down somewhere we want to do something different depending on that 
>> state). We need to pass that information down. The only difference is where 
>> we keep the "we are in thread-specific state" info - on the stack as an 
>> argument chain, or anchored to TLS.
>> 
>> Putting it into arguments is one way, but I find this rather ugly. Now every 
>> caller of this function has to care. With the RAII, we have the information 
>> as a mark exactly where it matters - in the entrance frame of AGCT - and no 
>> outside code needs to care.
>> 
>> Cheers, Thomas
>
> But you aren't keeping it on the stack versus TLS you are using a stack 
> object to set the TLS. All we have done in this final form is replace:
> 
> 
> thread->set_in_asgct(true);
> 
> thread->set_in_asgct(false);
> 
> with
> 
> 
> ASGCTMark mark(thread);
> 
> 
> but now ASGCTMark has to worry about whether `thread` is NULL, whether it is 
> the current javaThread, when we've already done all that for `thread`.
> 
> And this is the only place we will ever use ASGCTMark. Now if we had lots of 
> return points in `` then RAII would definitely be better to ensure we 
> can't forget to reset the state. But we don't. So use of RAII adds complexity 
> cost now just in case in the future RAII might be useful.

@dholmes-ora Oh, you are only objecting to the RAII mark, not to the fact that 
we store the state in Thread? Ok, sure. In my view, a mark prevents the "on" 
state from escaping the frame, e.g. if stray return calls sneak in with future 
changes. But I don't have strong emotions here.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v9]

2022-05-17 Thread Thomas Stuefe
On Tue, 17 May 2022 08:13:11 GMT, Jaroslav Bachorik  
wrote:

>> src/hotspot/share/prims/forte.hpp line 53:
>> 
>>> 51:   ASGCTMark() : ASGCTMark(JavaThread::current()) {}
>>> 52:   ~ASGCTMark() {
>>> 53: JavaThread::current()->set_in_asgct(false);
>> 
>> You can't call `JavaThread::current()` in any of this code as it is not safe 
>> from a signal handling context. (There is an existing use in ASGCT that is 
>> also unsafe.) I suggest not having the no-arg constructor and require the 
>> current JavaThread, or null, to be passed in (having already been safely 
>> obtained by the caller). You can then assert using 
>> `Thread::current_or_null_safe()`.
>> 
>> Personally I find the ASGCTMark complete overkill for this situation as 
>> there is only ever going to be a single use - sorry @tstuefe - it is just 
>> complicating things IMO.
>
> Ok, I fixed the `ASGCTMark` to be safe to use from a signal handler.
> 
> I have no strong opinion about whether we should keep it or not - it makes 
> the code in `forte.cpp` slightly more resilient in case of further 
> modifications for the price of more complexity introduced by the mark class 路

Hi David,

> You can't call `JavaThread::current()` in any of this code as it is not safe 
> from a signal handling context. (There is an existing use in ASGCT that is 
> also unsafe.) I suggest not having the no-arg constructor and require the 
> current JavaThread, or null, to be passed in (having already been safely 
> obtained by the caller). You can then assert using 
> `Thread::current_or_null_safe()`.
> 
> Personally I find the ASGCTMark complete overkill for this situation as there 
> is only ever going to be a single use - sorry @tstuefe - it is just 
> complicating things IMO.

No problem, that's why we talk :)

To make an argument for this: if you compare the first iterations of this patch 
to the current one, the current one is a lot simpler. I really dislike 
funneling arguments through many layers down to a utility function, only to 
fine control one specific behavioral aspect of that tiny function, and that 
aspect is only relevant for one outlier use case. I dislike this more than the 
RAII object.

Look at it this way. We have a thread-specific state (we are in AGCT and deep 
down somewhere we want to do something different depending on that state). We 
need to pass that information down. The only difference is where we keep the 
"we are in thread-specific state" info - on the stack as an argument chain, or 
anchored to TLS.

Putting it into arguments is one way, but I find this rather ugly. Now every 
caller of this function has to care. With the RAII, we have the information as 
a mark exactly where it matters - in the entrance frame of AGCT - and no 
outside code needs to care.

Cheers, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-17 Thread Thomas Stuefe
On Tue, 17 May 2022 06:18:25 GMT, Ioi Lam  wrote:

>> Severin Gehwolf has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Use stringStream to simplify controller path assembly
>
> src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 92:
> 
>> 90:   }
>> 91:   ss.print_raw(_root, last_matching_slash_pos);
>> 92:   _path = os::strdup(ss.base());
> 
> Do you mean `Find the longest common prefix`? Maybe give an example in the 
> comments? Text parsing in C code is really difficult to understand.

Maybe factor out the search like this

// Return length of common starting substring. E.g. "cat" for ("cattle", 
"catnip");
static int find_common_starting_substring(const char* s1, const char* s2) {
...
}

That way its clearer and we can find and move this to utilities if we ever need 
this in other places.

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v2]

2022-05-13 Thread Thomas Stuefe
On Thu, 12 May 2022 18:09:40 GMT, Severin Gehwolf  wrote:

>> Please review this change to the cgroup v1 subsystem which makes it more 
>> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
>> re-create a similar system as the reporter. The idea of using the longest 
>> substring match for the cgroupv1 file paths was based on the fact that on 
>> systemd systems processes run in separate scopes and the maven forked test 
>> runner might exhibit this property. For that it makes sense to use the 
>> common ancestor path. Nothing changes in the common cases where the 
>> `cgroup_path` matches `_root` and when the `_root` is `/` (container the 
>> former, host system the latter).
>> 
>> In addition, the code paths which are susceptible to throw NPE have been 
>> hardened to catch those situations. Should it happen in the future it makes 
>> more sense (to me) to not have accurate container detection in favor of 
>> continuing to keep running.
>> 
>> Finally, with the added unit-tests a bug was uncovered on the "substring" 
>> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
>> point to `_root` as it's used as the "needle" to find in "haystack" 
>> `cgroup_path` (not the other way round).
>> 
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> Severin Gehwolf has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Use stringStream to simplify controller path assembly

Thanks for taking my suggestion! Much simpler now :-)

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8286212: Cgroup v1 initialization causes NPE on some systems

2022-05-12 Thread Thomas Stuefe
On Tue, 10 May 2022 12:29:10 GMT, Severin Gehwolf  wrote:

> Please review this change to the cgroup v1 subsystem which makes it more 
> resilient on some of the stranger systems. Unfortunately, I wasn't able to 
> re-create a similar system as the reporter. The idea of using the longest 
> substring match for the cgroupv1 file paths was based on the fact that on 
> systemd systems processes run in separate scopes and the maven forked test 
> runner might exhibit this property. For that it makes sense to use the common 
> ancestor path. Nothing changes in the common cases where the `cgroup_path` 
> matches `_root` and when the `_root` is `/` (container the former, host 
> system the latter).
> 
> In addition, the code paths which are susceptible to throw NPE have been 
> hardened to catch those situations. Should it happen in the future it makes 
> more sense (to me) to not have accurate container detection in favor of 
> continuing to keep running.
> 
> Finally, with the added unit-tests a bug was uncovered on the "substring" 
> match case of cgroup paths in hotspot. `p` returned from `strstr` can never 
> point to `_root` as it's used as the "needle" to find in "haystack" 
> `cgroup_path` (not the other way round).
> 
> Testing:
> - [x] Added unit tests
> - [x] GHA
> - [x] Container tests on cgroups v1 Linux. Continue to pass

src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 113:

> 111:   }
> 112:   buf[MAXPATHLEN-1] = '\0';
> 113:   _path = os::strdup(buf);

I think this code can be simplified a lot with stringStream and without strtok, 
so no need for fixed buffers (which may fail with longer path names) and no 
need for writable string copies on the stack.

Something like this:

stringStream ss;
ss.print_raw(_mount_point);
const char* p1 = _root;
const char* p2 = cgroup_path;
int last_matching_dash_pos = -1;
for (int i = 0; *p1 == *p2 && *p1 != 0; i ++) {
if (*p1 == '/') {
last_matching_dash_pos = i;
}
p1++; p2++;
}
ss.print_raw(_root, last_matching_dash_pos);
// Now use ss.base() to access the assembled string

-

PR: https://git.openjdk.java.net/jdk/pull/8629


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Thomas Stuefe
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Move 'in_asgct' flag to JavaThread

src/hotspot/share/prims/forte.cpp line 594:

> 592:   }
> 593: 
> 594:   thread->set_in_asgct(true);

Suggestion: Use a small RAII Mark object instead. That way you prevent future 
errors if someone should add a return in the mids of AGCT.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Thomas Stuefe
On Fri, 6 May 2022 14:20:30 GMT, Jaroslav Bachorik  
wrote:

>> Functionally this looks good now - thanks.
>> 
>> The only concern is the overhead added to `find_blob` to account for this 
>> very special case. Can you do some benchmarking of this?
>> 
>> Thanks.
>
> Hi @dholmes-ora , thanks for taking time to review this.
> 
> I am running `jmh-jdk-microbenchmarks` now but I am going to be away for a 
> week now so I will get back to you with the results only then.

Hi @jbachorik, thanks a lot for taking my suggestion. This looks way less 
intrusive now!

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: JDK-8285921: serviceability/dcmd/jvmti/AttachFailed/AttachReturnError.java fails on Alpine

2022-05-03 Thread Thomas Stuefe
On Tue, 3 May 2022 07:47:28 GMT, Matthias Baesken  wrote:

> The test  serviceability/dcmd/jvmti/AttachFailed/AttachReturnError.java fails 
> on Alpine Linux using musl.
> Reason is the difference how dlclose is done 
> https://wiki.musl-libc.org/functional-differences-from-glibc.html
> "musl’s dynamic loader loads libraries permanently for the lifetime of the 
> process, until it exits or calls exec. dlclose is a no-op. "
> 
> This leads to "libReturnError.so found in stdout" when running the test on 
> Alpine/musl , unlike on other Linux versions without musl.

Looks good.

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8515


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing [v9]

2022-04-16 Thread Thomas Stuefe
On Fri, 15 Apr 2022 10:04:12 GMT, Johannes Bechberger  
wrote:

> > Thanks for the JFR insight, Marcus. I can see the CrashProtection being a 
> > stop-gap measure under time constraints. Maybe worth rethinking now. From 
> > your description, the way it is used in JFR differ significantly from how 
> > the async-profiler would use it.
> 
> I came to the same conclusion. Therefore I don't think that adding 
> CrashProtection is worth the effort.
> 
> > Privately, I start wondering whether the benefits of async-profiler over 
> > JFR are worth the risk of corrupting the VM or the work needed to harden 
> > out AGCT. Their feature-set seems to overlap a lot. I may not see the whole 
> > picture though, I'm not a profiler expert.
> 
> ASGCT makes it possible to develop your own profiler. The implementation 
> overlaps but the feature set does not. You cannot build your own profiler 
> with JFR. AsyncProfiler has with ASGCT the possibility to capture call traces 
> with different intervals that are related to perf-events which is not 
> possible in JFR.
> 
> ASGCT should be a best effort API that has its caveats, one of them being 
> that it might crash the VM.

I disagree on this one, partly because you a crash is not the worst that could 
happen, that would be data corruption due to some freakish causal chain. And 
partly because that crashiness of AGCT is little known outside our community, 
so customers are probably not aware of this risk.

> I think the goal is to make ASGCT as safe as possible to use without altering 
> the normal behaviour (and speed) of the VM. Calling the API should not alter 
> anything on the heap or in any other data structures. This is also true for 
> JFR and many stability fixes for ASGCT (like using SafeFetch in some places) 
> should also benefit the stability of JFR.
> 
> I therefore want to close this PR (and the related issue) and propose my "[Do 
> not call 
> JavaThread::thread_from_jni_environment](https://github.com/openjdk/jdk/pull/8225/commits/b16e85d2251ae8cf7433ff146ea209d094a37c8a)"
>  commit in a new PR as adding PRs for all the errors I find in JFR and ASGCT 
> seems to be the way forward.

Sounds reasonable. I'm sorry for frustrating your efforts here.

Cheers, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing [v9]

2022-04-15 Thread Thomas Stuefe
On Thu, 14 Apr 2022 17:08:32 GMT, Markus Grönlund  wrote:

> For additional context, I should add that the CrashProtection mechanism was 
> mainly put in place as a result of having to deliver JFR from JRockit into 
> Hotspot under a deadline, upholding feature-parity. The stack walking code 
> was in really bad shape back then. Over the years, it has been hardened and 
> improved much, and I have not seen any reported issues about JFR crashes in 
> many years (we log when crashing in production).
> 
> An important difference is that AGCT allows more thread states compared to 
> JFR, so there can be issues in that area that are not seen in JFR.

Thanks for the JFR insight, Marcus. I can see the CrashProtection being a 
stop-gap measure under time constraints. Maybe worth rethinking now. From your 
description, the way it is used in JFR differ significantly from how the 
async-profiler would use it.

Privately, I start wondering whether the benefits of async-profiler over JFR 
are worth the risk of corrupting the VM or the work needed to harden out AGCT. 
Their feature-set seems to overlap a lot. I may not see the whole picture 
though, I'm not a profiler expert.

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing [v6]

2022-04-14 Thread Thomas Stuefe
On Thu, 14 Apr 2022 14:33:35 GMT, Johannes Bechberger  
wrote:

> I compared the code of ASGCT and JFR and found only one instance of a method 
> that is used in ASGCT, but not in JFR and which does affect the VM state. It 
> is the `JavaThread::block_if_vm_exited` method which is transitively called, 
> a fix for this is coming. Therefore there is, in my opinion, no reason why we 
> cannot wrap AsyncGetCallTrace in ThreadCrashProtection.

Sorry, you did address none of my arguments. I see a reasoning chain that 
starts by JFR being "good enough" and from there concluding that using it in 
AGCT its good enough too. Well, maybe it was a bad idea in the first place. And 
right now, this code is only used by JFR. Reusing it perpetuates this pattern 
which others will follow. Because its so convenient just to catch all crashes. 

Does the code use Resource Area? Then its not sigjmp safe. Does it do any kind 
of dynamic allocation which could leak? Then its not sigjmp safe. Does it use 
locks? Again, not sigjmp safe. Does it use any kind of RAII objects? You guess 
it, its not sigjmp safe. Does it call libc functions that cause it to change 
state? Probably unsafe too.

And even if you today can prove that the whole code is safe, how do you prevent 
bitrot? These are general purpose hotspot functions. Someone may tomorrow use 
malloc, or RA. How would you know?

The fact that we never saw a problem in JFR does not prove much. A corruption 
may never show symptoms. It may only effect certain platforms, certain 
conditions, certain flukes of the memory layout god. There may be that one 
crash that you really should not have ignored or sigjmp'ed out of, but you 
never saw it in your tests, because it only happens to that one customer. What 
I really worry about is the delayed effect. Hiding the crash may cause delayed 
errors that are really hard to track down. Effectively transforming what would 
be a simple crash with an hs-err file and a core into a maintenance nightmare. 
That is why sigjmp is used very sparingly, and only in very controlled 
environments.

Put it another way: to really prove that what you do is safe, you have to 
answer, for every possible crash location:
1) is the crash itself benign and can be ignored?
2) is it safe to jump out of the crash code back to the entrance of AGCT, or 
would that interruption corrupt the VM or a system library?

My point is, if you can answer these questions, you can fix the crash. 
Therefore it makes no sense to wholesale catch signals. Either the code is safe 
and does not crash, then no need for catch-all. Or it is not, then its 
important to understand where we crash and why. Ignoring unknown crashes is not 
a good idea.

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing [v4]

2022-04-14 Thread Thomas Stuefe
On Thu, 14 Apr 2022 11:30:13 GMT, Johannes Bechberger  
wrote:

> > We have only one other subsystem that does this kind of wholesale catching 
> > of error signals and then unwinding the stack (JFR) and presumably they 
> > fine-combed their coding and are sure exactly what they do under the sigjmp 
> > guard. Are we also certain?
> 
> They share almost all of their code. Allocating on the heap is not safe in 
> the signal handler and should be considered an error.
>
> My goal is remove all code that (potentially) modifies the heap or the state 
> of the VM from AsyncGetCallTrace, with the focus on the methods that are 
> called from AsyncGetCallTrace but from JFR. Both are so similar that this 
> practical.
> 
> Regarding the `NoHandleMark`: I must have missed this.
> 
> > I think the safer approach, albeit much more work intensive, would be to 
> > make sure we do not crash. Starting with removing RA allocation. I tried to 
> > make RA signal safe with https://bugs.openjdk.java.net/browse/JDK-8282405, 
> > but that got totally stuck, so better just remove it altogether from AGCT. 
> > Over using SafeFetch or plain defensive coding to avoid crashing.
> 
> They are complementary goals. My goal is to improve ASGCT and its stability 
> but this PR should help prevent rare segmentation faults to appear in 
> production.

Sorry, I remain unconvinced.

Catching and ignoring error signals is akin to silencing the smoke detector 
because the noise bothers you. Your house still burns.

With this patch, we would ignore error signals and the VM would continue 
running, with a potentially corrupted state of VM and/or system libraries. That 
may cause silent data corruption, or even security holes. A hard fast crash is 
always better.

We deliberately never return from error handling for this very reason. Once we 
report a crash, we never even unwind the stack. There had been attempts in the 
past to return from error handling to normal VM mode for this reason and that, 
and we always avoided it.

The sigjmp technique is fine if you can guarantee that your code does not 
change state beyond what lives on the thread stack. No global state can change 
then, not in the VM itself, nor in system libraries. In order to prove that 
your catch-all patch is safe, you would have to fine-comb AGCT and all the code 
it calls and make it safe to always jump out of. I believe the work involved 
would be much higher than just fixing the crashes, and it would be much more 
prone to bitrot.

I believe there is no easy way here. I think if you want to make AGCT safe, you 
have to analyze and solve the individual crash points. Because if you catch 
them wholesale, how do you know which crashes are safe to ignore, and which 
should stop the VM? And even if the crash is safe to ignore, how do you know 
that by sigjumping back you won't leave the VM in an inconsistent state?

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing [v4]

2022-04-14 Thread Thomas Stuefe
On Thu, 14 Apr 2022 10:16:28 GMT, Ludovic Henry  wrote:

> At
> 
> https://github.com/openjdk/jdk/blob/77a21cb5658961b20011b7125a1ed896622faa4d/src/hotspot/share/prims/forte.cpp#L461
> 
> , you have a NoHandleMark which wouldn’t get cleaned up in case of crash.
> For other such cases, it could be worth having some form of assertion in 
> StackObj destructor to make sure you don’t call it when you have a crash 
> protection. That assert would trigger only in debug and even when there is no 
> crash so it should help us root out such issue faster.

Darn, @luhenry is right. And the more I think about this, the more doubts I get 
about this approach. 

We have only one other subsystem that does this kind of wholesale catching of 
error signals and then unwinding the stack (JFR) and presumably they 
fine-combed their coding and are sure exactly what they do under the sigjmp 
guard. Are we also certain? 

Because as long as we cannot guarantee that we really don't use RAII, or 
actually anything needing cleanup (e.g. dynamic memory allocation), and that we 
don't accidentally abandon a lock, we cannot use sigjmp. This precludes use of 
resource area, and had that not been an issue? We cannot even use malloc here.

Under these circumstances, crashing may be better than not crashing, since not 
crashing could leave the thread in a corrupted state or in a deadlock.

I think the safer approach, albeit much more work intensive, would be to make 
sure we do not crash. Starting with removing RA allocation. I tried to make RA 
signal safe with https://bugs.openjdk.java.net/browse/JDK-8282405, but that got 
totally stuck, so better just remove it altogether from AGCT. Over using 
SafeFetch or plain defensive coding to avoid crashing.

Sorry for this. If I'm wrong, please someone correct me. This looked like a 
simple solution at first.

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing

2022-04-13 Thread Thomas Stuefe
On Wed, 13 Apr 2022 14:53:54 GMT, Johannes Bechberger  
wrote:

> Move the AsyncGetCallTrace method implementation into a separate method and 
> wrap its call in non-assert compilation mode in `os::ThreadCrashProtection` 
> like it is done in 
> [JFR](https://github.com/openjdk/jdk/blob/965ea8d9cd29aee41ba2b1b0b0c67bb67eca22dd/src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp#L165).
> This prevents AsyncGetCallTrace from crashing on segmentation faults (but not 
> on `guarantee`s).
> 
> If a crash is observed, then the `num_frames` field of the trace is set to 
> `ticks_unknown_state` (-7) to signal a state that cannot be properly handled. 
> `ticks_unknown_state` is currently also used for signaling unknown thread 
> states but this should not be a problem, as the semantic is the same. If 
> `num_frames` already has an error code then this error code is not changed. 
> This helps to distinguish between errors in walking threads in Java and 
> non-Java mode, as `num_frames` is set there before the walking to the 
> appropriate error code.
> 
> _Thanks for @tstuefe for suggesting this._

I'm actually ambivalent about the "production only" thing.

I dislike having different behavior between debug and release, I prefer to test 
what later runs in the field. Also, if we prevent crashes because we want to 
ignore them, we should ignore them in debug too, otherwise, we burn error 
analysis cycles needlessly. 

Optionally, I'd make the behavior of ThreadCrashProtection switchable at 
runtime with a diagnostic XX switch. That switch would control the jumping back 
in the signal handler. That way, if you encounter strange bugs while using 
async profiler, one can disable the crash guard and enable asserts and crashes. 
If you decide to do that, we should also do tests for it, so maybe leave it for 
a future RFE.

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing

2022-04-13 Thread Thomas Stuefe
On Wed, 13 Apr 2022 15:11:41 GMT, Jaroslav Bachorik  
wrote:

>> Move the AsyncGetCallTrace method implementation into a separate method and 
>> wrap its call in non-assert compilation mode in `os::ThreadCrashProtection` 
>> like it is done in 
>> [JFR](https://github.com/openjdk/jdk/blob/965ea8d9cd29aee41ba2b1b0b0c67bb67eca22dd/src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp#L165).
>> This prevents AsyncGetCallTrace from crashing on segmentation faults (but 
>> not on `guarantee`s).
>> 
>> If a crash is observed, then the `num_frames` field of the trace is set to 
>> `ticks_unknown_state` (-7) to signal a state that cannot be properly 
>> handled. `ticks_unknown_state` is currently also used for signaling unknown 
>> thread states but this should not be a problem, as the semantic is the same. 
>> If `num_frames` already has an error code then this error code is not 
>> changed. This helps to distinguish between errors in walking threads in Java 
>> and non-Java mode, as `num_frames` is set there before the walking to the 
>> appropriate error code.
>> 
>> _Thanks for @tstuefe for suggesting this._
>
> src/hotspot/share/prims/forte.cpp line 671:
> 
>> 669: #ifndef ASSERT
>> 670:   trace->num_frames = ticks_unknown_state;
>> 671:   AsyncGetCallTraceCallBack cb(trace, depth, ucontext);
> 
> ~Isn't [this 
> assert](https://github.com/openjdk/jdk/blob/e245f9d2007b0a6c9962b6bf4488ba4d4ce47e92/src/hotspot/os/posix/os_posix.cpp#L1158)
>  failing? It seems like the crash protection is only for the JFR sampler 
> thread.~
> 
> OIC - this is only when asserts are not enabled.

I think if we start using ThreadCrashProtection like a general-purpose class, 
we should make it one and remove the JFR dependencies.

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8284828: Use `os::ThreadCrashProtection` to protect AsyncGetCallTrace from crashing

2022-04-13 Thread Thomas Stuefe
On Wed, 13 Apr 2022 14:53:54 GMT, Johannes Bechberger  
wrote:

> Move the AsyncGetCallTrace method implementation into a separate method and 
> wrap its call in non-assert compilation mode in `os::ThreadCrashProtection` 
> like it is done in 
> [JFR](https://github.com/openjdk/jdk/blob/965ea8d9cd29aee41ba2b1b0b0c67bb67eca22dd/src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp#L165).
> This prevents AsyncGetCallTrace from crashing on segmentation faults (but not 
> on `guarantee`s).
> 
> If a crash is observed, then the `num_frames` field of the trace is set to 
> `ticks_unknown_state` (-7) to signal a state that cannot be properly handled. 
> `ticks_unknown_state` is currently also used for signaling unknown thread 
> states but this should not be a problem, as the semantic is the same. If 
> `num_frames` already has an error code then this error code is not changed. 
> This helps to distinguish between errors in walking threads in Java and 
> non-Java mode, as `num_frames` is set there before the walking to the 
> appropriate error code.
> 
> _Thanks for @tstuefe for suggesting this._

src/hotspot/share/prims/forte.cpp line 508:

> 506: 
> 507: 
> 508: void asyncGetCallTraceImpl(ASGCT_CallTrace *trace, jint depth, void* 
> ucontext) {

make it static? No need to export this.

src/hotspot/share/prims/forte.cpp line 669:

> 667: JNIEXPORT
> 668: void AsyncGetCallTrace(ASGCT_CallTrace *trace, jint depth, void* 
> ucontext) {
> 669: #ifndef ASSERT

Since ThreadCrashProtection uses Thread::current, here I would bail for 
Thread::current_or_null_safe() == NULL

src/hotspot/share/prims/forte.cpp line 674:

> 672:   os::ThreadCrashProtection crash_protection;
> 673:   if (!crash_protection.call(cb)) {
> 674: if (trace->num_frames < -10 || trace->num_frames >= 0) {

Where does this -10 come from?

-

PR: https://git.openjdk.java.net/jdk/pull/8225


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v8]

2022-04-11 Thread Thomas Stuefe
On Mon, 11 Apr 2022 11:55:33 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add some basic tests for ObjectBitSet

All good

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v6]

2022-04-09 Thread Thomas Stuefe
On Fri, 8 Apr 2022 17:34:57 GMT, Roman Kennke  wrote:

> Yes, I get that. But the fragments and fragment-table are themselves inner 
> classes that take a MEMFLAGS template. I could (perhaps) either use a 
> constexpr MEMFLAGS arg and pass this through, or do at some point a switch 
> like:
> 
> ```
> switch (_flags) {
>   case mtServiceability:
>   ... new BitMapFragmentTable(); break;
>   case mtServiceability:
>   ... new BitMapFragmentTable(); break;
>   default: ShouldNotReachHere();
> }
> ```
> 
> Which seems kinda-ugly but would work (I think), and avoid making the outer 
> class template-ized.

I see what you mean. This MEMFLAGS template parameter is deeply interwoven into 
everything. I'd just live with the current solution. It uses established 
pattern, so at least nobody is surprised.

I think the basic problem is that CHeapObj itself is a template class. 
Rethinking MEMFLAGS seems worthwhile for a future RFE. As I wrote, one approach 
could be to make them a property of the current thread, and switchable and 
stackable via a Mark class. That way, everything allocated within a given range 
of frames would count toward a given category. No need to decide on a 
fine-granular basis. No need for templates. Maybe no need even to have a 
MEMFLAGS argument for every allocation.

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v6]

2022-04-08 Thread Thomas Stuefe
On Thu, 7 Apr 2022 18:28:42 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Move JFRBitSet typedef into its own header; Make _bitset a direct member, 
> not dynamically allocated

> > We must call new on it somewhere. I am not opposed to making this an 
> > mtFlags template then. This is a small point in this improvement.
> 
> Yes, at some point we need to allocate the fragments and fragments table. We 
> could _perhaps_ work around it by passing MEMFLAGS as constexpr, but I don't 
> think this would change generated code much.

(all bikeshedding since we all agree the current version is fine, no?)

No, sorry, I meant make BitSet a normal class. Not derived from CHeapObj, since 
it gets not newed, only used via composition or on a stack. And give it a 
runtime parm to set the MEMFLAG.

So: 

class BitSet {
  const MEMFLAGS _flags;
public:
  BitSet(MEMFLAGS f = mtInternal) : _flags(f) {...}
};


and use those flags for the c-heap allocation of the fragments.

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8284330: jcmd may not be able to find processes in the container [v3]

2022-04-08 Thread Thomas Stuefe
On Fri, 8 Apr 2022 08:34:24 GMT, Yasumasa Suenaga  wrote:

>> jcmd uses 
>> src/jdk.internal.jvmstat/linux/classes/sun/jvmstat/PlatformSupportImpl.java 
>> to scan temporary directories to find out processes in the container. It 
>> checks inode to ensure the temp directory is not conflicted. However inode 
>> maybe same value between the container and others. Thus we should check 
>> device id for that case.
>> 
>> For example I saw following case on [distroless 
>> cc-debian11](https://github.com/GoogleContainerTools/distroless/blob/main/cc/README.md)
>>  container. I started rescue:jdk19 container with sharing PID namespace. 
>> `/proc/1/root/tmp` is different from `/tmp` on rescue:jdk19, but they are 
>> same inode value. However we can see the differense in device id.
>> 
>> 
>> $ podman run -it --rm --entrypoint=sh --pid=container:fa39662f7352 
>> rescue:jdk19
>> / #
>> / # stat /tmp
>>   File: /tmp
>>   Size: 29  Blocks: 0  IO Block: 4096   directory
>> Device: efh/239dInode: 135674931   Links: 1
>> Access: (1777/drwxrwxrwt)  Uid: (0/root)   Gid: (0/root)
>> Access: 2022-04-05 08:51:37.0
>> Modify: 2022-04-05 08:51:37.0
>> Change: 2022-04-05 08:51:37.0
>> 
>> / # stat /proc/1/root/tmp
>>   File: /proc/1/root/tmp
>>   Size: 29  Blocks: 0  IO Block: 4096   directory
>> Device: e1h/225dInode: 135674931   Links: 1
>> Access: (1777/drwxrwxrwt)  Uid: (0/root)   Gid: (0/root)
>> Access: 2022-04-05 08:51:37.0
>> Modify: 2022-04-05 08:50:42.0
>> Change: 2022-04-05 08:50:42.0
>
> Yasumasa Suenaga has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Update the comment

still good

-

PR: https://git.openjdk.java.net/jdk/pull/8103


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v6]

2022-04-08 Thread Thomas Stuefe
On Thu, 7 Apr 2022 18:28:42 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Move JFRBitSet typedef into its own header; Make _bitset a direct member, 
> not dynamically allocated

Third alternative would be not to derive from CHeapObj at all, since we don't 
new it anywhere if I see correctly.

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v6]

2022-04-08 Thread Thomas Stuefe
On Fri, 8 Apr 2022 14:17:06 GMT, Roman Kennke  wrote:

> > Here's an idea. Add an mtflag called mtBitMap and use that. There's plenty 
> > of bits in mt flags.
> 
> Naa, I don't think this is justified. It isn't used as widely or frequently 
> to warrant its own flag.

Plus, it would muddy the water, since what we actually want is the memory 
accounted toward the subsystem on behalf of which we create the data structure. 
mtBitMap would be a bit like "mtArray" for GrowableArray.

Note that there was the idea, and an associated JBS issue 
(https://bugs.openjdk.java.net/browse/JDK-8281819) of making NMT categories 
groupable in a one- or multi-layered hierarchy. That way, one could account to 
"mtGC" -> "mtBitSet", for instance.

For now, I opt to either live with what we have with the typedef'ed BitSet, and 
leave MEMFLAG cleanups to a later RFE.

Alternatively, make the MEMFLAG a runtime parameter of the bitset, use that for 
the heap allocated fragments, but use "mtInternal" for the actual class. I 
mean, those are just a few bytes.

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v6]

2022-04-08 Thread Thomas Stuefe
On Thu, 7 Apr 2022 18:28:42 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Move JFRBitSet typedef into its own header; Make _bitset a direct member, 
> not dynamically allocated

Hi Thomas,

> > A general thought, maybe for a future RFE:
> > We now have BitMap and BitSet, which do almost the same thing. Instead of 
> > having a new class BitSet, whose job is to be sparse, we could give BitMap 
> > the ability to be sparse. We'd save code, reuse the rich BitMap interface 
> > and get all the test code for free.
> > One way to do that would be to make BitMap responsible for committing its 
> > underlying backing memory, in chunks, on-demand only. To do that BitMap 
> > would need a secondary map - a commit map - to remember which fragments of 
> > its backing memory are committed. Commit map would have a bit per 64M 
> > fragment.
> 
> Please don't do this. `BitMap` (well, `BitMapView`) is also used as a 
> lightweight way to overlay it on arbitrary memory temporarily (i.e. 
> remembered sets bitmaps). This typically boils down to no additional memory 
> usage (and overhead) at all. So any additional memory usage or initialization 
> should not be added lightly. Also in most other cases, `BitMap` is used for 
> very tiny bitmaps anyway, this would just add unnecessary functionality (and 
> overhead) for 99% of the usage of `BitMap`.
> 
> Further GCs are using `BitMap` to implement such sparse bit sets already, 
> managing their memory with the corresponding Java heap memory; so adding this 
> functionality at the `BitMap` level would just duplicate functionality with 
> probably not-so-hilarious side effects.
> 

Yes, you are right. I had some more time to think about this and came to the 
same conclusion. Also abstracting bitwise getters and setters as I thought 
originally would probably not be enough.

> I also _do_ think that the use cases for a dense `BitMap` are significantly 
> different from (huge) sparse "BitMap"s to warrant its own separate class (not 
> saying that one couldn't use the other internally, or that if that `BitSet` 
> could be reused in GCs if configurable enough).
> 
> I am also not sure that the automatic lazy backing of the OS isn't sufficient 
> for these use cases (e.g. JVMTI) too.

I'm not sure about this. We often run against the commit charge of the 
underlying OS, which seems to be more frequent than running out of physical 
memory. I think voluntary uncommit does make sense. Whether the libc actually 
uncommits on free() is a different question, but in this case, I think they do, 
the fragments are large enough.

Thanks, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8284330: jcmd may not be able to find processes in the container [v2]

2022-04-07 Thread Thomas Stuefe
On Wed, 6 Apr 2022 12:44:35 GMT, Yasumasa Suenaga  wrote:

>> jcmd uses 
>> src/jdk.internal.jvmstat/linux/classes/sun/jvmstat/PlatformSupportImpl.java 
>> to scan temporary directories to find out processes in the container. It 
>> checks inode to ensure the temp directory is not conflicted. However inode 
>> maybe same value between the container and others. Thus we should check 
>> device id for that case.
>> 
>> For example I saw following case on [distroless 
>> cc-debian11](https://github.com/GoogleContainerTools/distroless/blob/main/cc/README.md)
>>  container. I started rescue:jdk19 container with sharing PID namespace. 
>> `/proc/1/root/tmp` is different from `/tmp` on rescue:jdk19, but they are 
>> same inode value. However we can see the differense in device id.
>> 
>> 
>> $ podman run -it --rm --entrypoint=sh --pid=container:fa39662f7352 
>> rescue:jdk19
>> / #
>> / # stat /tmp
>>   File: /tmp
>>   Size: 29  Blocks: 0  IO Block: 4096   directory
>> Device: efh/239dInode: 135674931   Links: 1
>> Access: (1777/drwxrwxrwt)  Uid: (0/root)   Gid: (0/root)
>> Access: 2022-04-05 08:51:37.0
>> Modify: 2022-04-05 08:51:37.0
>> Change: 2022-04-05 08:51:37.0
>> 
>> / # stat /proc/1/root/tmp
>>   File: /proc/1/root/tmp
>>   Size: 29  Blocks: 0  IO Block: 4096   directory
>> Device: e1h/225dInode: 135674931   Links: 1
>> Access: (1777/drwxrwxrwt)  Uid: (0/root)   Gid: (0/root)
>> Access: 2022-04-05 08:51:37.0
>> Modify: 2022-04-05 08:50:42.0
>> Change: 2022-04-05 08:50:42.0
>
> Yasumasa Suenaga has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fix comments

Looks good to me (with @iklam's Comment changes). Thans for fixing.

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8103


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v6]

2022-04-07 Thread Thomas Stuefe
On Thu, 7 Apr 2022 18:28:42 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Move JFRBitSet typedef into its own header; Make _bitset a direct member, 
> not dynamically allocated

BTW, one neat solution to the MEMFLAGS problem would be to make it a property 
of the thread, and stackable. With sort of a MemflagMark. Each Thread could 
have a property "default MEMFLAGS", set with a MemFlagMark, and that gets used 
by default with every malloc in this thread. This way we may be able to get rid 
of a lot (most?) manual MEMFLAG declarations.

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking

2022-04-07 Thread Thomas Stuefe
On Fri, 8 Apr 2022 02:01:50 GMT, Kim Barrett  wrote:

> > > One open question is which MEMFLAGS to use. mtTracing doesn't seem to be 
> > > exactly right. Should I templatize BitSet and make JVMTI use 
> > > mtServiceability and JRF use mtTracing as it did before?
> > 
> > 
> > Yes, I think templatizing for MEMFLAGS makes sense (concurrentHashTable.hpp 
> > is too).
> 
> I haven't had time to look at the code, but I don't know about this. Slapping 
> a template parameter on everything isn't necessarily a good idea. We recently 
> (JDK-8283368) undid exactly this sort of thing in the cardset code, instead 
> making the MEMFLAGS value a runtime parameter provided at construction time. 
> This avoids a bunch of generated code duplication, additional template 
> syntax, and allows more implementation be put in .cpp files because it isn't 
> a template.

I never liked MEMFLAGS as template parameter. A runtime parameter would it make 
easier to use a general-purpose data structure on behalf of a subsystem and 
account its memory to that subsystem, while still being able to pass it around 
as a simple pointer to other utility functions.

But here Roman squirreled the template definition away into a typedef, so I 
think we could cleanup MEMFLAG usage in a separate RFE?

(we also should move MEMFLAGS to an own header file, btw, to avoid having to 
pull allocation.hpp every time)

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v5]

2022-04-07 Thread Thomas Stuefe
On Thu, 7 Apr 2022 17:20:41 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with four additional 
> commits since the last revision:
> 
>  - Add comment describing ObjectBitSet
>  - Refactor JVMTI usage of ObjectBitSet
>  - Typedef ObjectBitSet to JFRBitSet in JFR code
>  - Rename BitSet to ObjectBitSet

About https://github.com/openjdk/jdk/pull/8148, could you wait until that one 
is in? Would make it easier to backport Zhengyus change.

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v5]

2022-04-07 Thread Thomas Stuefe
On Thu, 7 Apr 2022 17:22:12 GMT, Thomas Stuefe  wrote:

>> Roman Kennke has updated the pull request incrementally with four additional 
>> commits since the last revision:
>> 
>>  - Add comment describing ObjectBitSet
>>  - Refactor JVMTI usage of ObjectBitSet
>>  - Typedef ObjectBitSet to JFRBitSet in JFR code
>>  - Rename BitSet to ObjectBitSet
>
> src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.hpp line 37:
> 
>> 35: class ObjectBitSet;
>> 36: 
>> 37: typedef ObjectBitSet JFRBitSet;
> 
> Can you include memory/allocation.hpp for the MEMFLAGS? iterator.hpp already 
> does, but better to be include-clean

Optional: move this typedef to an own jfr/leakprofiler/utilities/jfrbitset.hpp 
and include that one instead of objectbitset.inline.hpp in the callee cpp files

-

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v5]

2022-04-07 Thread Thomas Stuefe
On Thu, 7 Apr 2022 17:20:41 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with four additional 
> commits since the last revision:
> 
>  - Add comment describing ObjectBitSet
>  - Refactor JVMTI usage of ObjectBitSet
>  - Typedef ObjectBitSet to JFRBitSet in JFR code
>  - Rename BitSet to ObjectBitSet

Looks better :) Placed in the VMOp this is less confusing. Also, less code.

Small remarks inline, feel free to ignore them. We are entering nitpicking 
territory here.

Cheers, Thomas

src/hotspot/share/jfr/leakprofiler/chains/bfsClosure.hpp line 37:

> 35: class ObjectBitSet;
> 36: 
> 37: typedef ObjectBitSet JFRBitSet;

Can you include memory/allocation.hpp for the MEMFLAGS? iterator.hpp already 
does, but better to be include-clean

src/hotspot/share/prims/jvmtiTagMap.cpp line 2252:

> 2250:   GrowableArray* _visit_stack; // the visit stack
> 2251: 
> 2252:   JVMTIBitSet* _bitset;

Small nit: You may consider making this a simple member, then you can get rid 
of the dynamic allocation and deallocation. Just feed the address of the member 
to CallbackInvoker. 

If you are worried about stack size, Bitset is not that big, 100ish bytes or 
so. Should still be fine to put the VMOp on the stack.

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v4]

2022-04-07 Thread Thomas Stuefe
On Mon, 4 Apr 2022 17:51:28 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Templatize BitSet for MEMFLAGS

If you add my proposed comment for bitset.hpp, or some variation of it, then 
the rest looks good to me in its current form. Thanks for considering my 
proposals!

Thanks, Thomas

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7964


Re: RFR: 8283710: JVMTI: Use BitSet for object marking [v4]

2022-04-07 Thread Thomas Stuefe
On Mon, 4 Apr 2022 17:51:28 GMT, Roman Kennke  wrote:

>> JVMTI heap walking marks objects in order to track which have been visited 
>> already. In order to do that, it uses bits in the object header. Those are 
>> the same bits that are also used by some GCs to mark objects (the lowest two 
>> bits, also used by locking code). Some GCs also use the bits in order to 
>> indicate 'forwarded' objects, where the upper bits of the header represent 
>> the forward-pointer. In the case of Shenandoah, it's even more problematic 
>> because this happens concurrently, even while JVMTI heap walks can 
>> intercept. So far we carefully worked around that problem, but it becomes 
>> very problematic in Lilliput, where accesses to the Klass* also requires to 
>> decode the header, and figure out what bits means what.
>> 
>> In addition to that, marking objects in their header requires that the 
>> original header gets saved and restored. We only do that for 'interesting' 
>> headers, that is headers that have a stack-lock, monitor or hash-code. All 
>> other headers are reset to their default value. This means we are losing 
>> object's GC age. This is not catastrophic, but nontheless interferes with 
>> GC. 
>> 
>> JFR already has a datastructure called BitSet to support object marking 
>> without messing with object's headers. We can use that in JVMTI too.
>> 
>> Testing:
>>  - [x] tier1
>>  - [x] tier2
>>  - [x] tier3
>>  - [x] serviceability/jvmti
>>  - [x] vmTestbase/nsk/jvmti
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Templatize BitSet for MEMFLAGS

Hi @coleenp and @rkennke,

Since the JVMTI heap walk usually walks the whole heap, the bitmap may not be 
as sparse as we think. It would materialize a lot - possibly all - of its 
fragments. The increased memory consumption during JVMTI heap walk may mess 
with JVMTI agents using heapwalking for analysis in OOM situations.

But I'm not against this. It is clearly more maintainable. A lot clearer. And 
if it helps lilliput, we save a lot more memory than we pay here.

---

About this PR:

You moved Bitset to general-purpose land. Now its a general purpose class, and 
should be more versatile and better documented. A comment would be good, at 
least for the structure itself. Proposal (feel free to modify/extend):


BitSet is a sparse bitmap describing a memory range. It holds one bit per 
xxx-aligned
address. Its underlying backing memory is allocated on-demand only, in yyy-sized
fragments. Fragments are never deleted. The underlying memory is allocated from 
C-Heap.


I would also rename the function, since it is very unclear what the difference 
is between BitMap and BitSet. E.g. "AddressMap" "SparseAddressMap", or similar.

fragment size xxx and alignment yyy would ideally be configurable at compile 
time. At the moment, it feels very "heapish". It uses oop. It uses 
LogMinObjAlignmentInBytes. Ideally we would not include oop.hpp, and not know 
about oops.

Especially the dependency on LogMinObjAlignmentInBytes is not optimal: that is 
a runtime variable depending on `-XX:ObjectAlignmentInBytes`. It would be good 
if that could be a compile time constant. Right now, IIUC, 
`BitSet::addr_to_bit()` will always load the alignment from memory.

As a general purpose structure, some gtests would be nice. Maybe in some future 
RFE.

---

A general thought, maybe for a future RFE:

We now have BitMap and BitSet, which do almost the same thing. Instead of 
having a new class BitSet, whose job is to be sparse, we could give BitMap the 
ability to be sparse. We'd save code, reuse the rich BitMap interface and get 
all the test code for free.

One way to do that would be to make BitMap responsible for committing its 
underlying backing memory, in chunks, on-demand only. To do that BitMap would 
need a secondary map - a commit map - to remember which fragments of its 
backing memory are committed. Commit map would have a bit per 64M fragment.

As a sketch:


class BitMap {

   // bitmap holding one bit per 64M fragment of the underlying bitmap backing 
memory
   BitMap _commit_map;
   
   inline void set_bit(idx_t bit) {
// - find out which fragment the bit lives in
// - if (commitmap[fragmentid] == 0) { commit_fragment(); 
commitmap[fragmentid]=1; }
// - set bit
   }
   
   // same for clear_bit()
   
   bool at(idx_t index) const {
// - find out which fragment the bit lives in
// - if (commitmap[fragmentid] == 0) { return false; } // bit has never 
been set, is therefore zero
// - else return the actual bit
   }

}


The sparse-handling part would have to be switchable at compile time so that we 
won't pay for it in non-sparse BitMap.

I believe this solution could be simpler than a new separate utility class, and 
be more versatile. Would be interesting to know @kimbarrett 's opinion. 

But this can be done of course in a separate RFE. If at all.



Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v7]

2022-03-21 Thread Thomas Stuefe
On Mon, 21 Mar 2022 12:44:58 GMT, Thomas Stuefe  wrote:

>> Johannes Bechberger has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Remove two unnecessary lines
>
> I'm currently implementing Andrews proposal for a static safefetch 
> (https://github.com/openjdk/jdk/pull/7865, still in draft, but almost done). 
> That will be more generic solution since we don't have to deal with thread wx 
> state at all. That's why we closed this PR.

> The conversation here is some what hard to follow. I do see that "foreign 
> threads" was mentioned by @tstuefe in the context of AGCT but I have to 
> assume he misspoke there (assuming a foreign thread is one not attached to 
> the VM) as AGCT only works for attached JavaThreads. The signal handler that 
> will call AGCT has to be prepared to find any kind of thread in any state, 
> but AGCT should only be called on the right kinds of thread in the right 
> state.

Sure, AGCT can be limited to VM threads - or maybe already is. But tracking 
non-VM threads could be a valid use case. 

We have downstream in the SapMachine a facility where we track callstacks from 
malloc sites - independently from NMT or the VM. With the explicit purpose of 
catching mallocs from non-VM threads too. For collecting the stack trace, we 
use some VM utilities, SafeFetch among them. That is a very useful facility. I 
could argue a similar case for the Async Profiler: why should profiling be 
limited to Java threads? In the end, if it eats performance, it hurts, 
regardless whether its a java thread or a non-VM-attached thread. Could be a 
concurrent native thread burning CPU, why would that not be interesting.

Our concern was with SafeFetch, and AGCT is only one example. SafeFetch should 
be as safe as possible. Error reporting alone is a sufficient reason.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v7]

2022-03-21 Thread Thomas Stuefe
On Fri, 11 Mar 2022 07:52:16 GMT, Johannes Bechberger  
wrote:

>> The WXMode for the current thread (on MacOS aarch64) is currently stored in 
>> the thread class which is unnecessary as the WXMode is bound to the current 
>> OS thread, not the current instance of the thread class.
>> This pull request moves the storage of the current WXMode into a thread 
>> local global variable in `os` and changes all related code. SafeFetch 
>> depended on the existence of a thread object only because of the WXMode. 
>> This pull request therefore removes the dependency, making SafeFetch usable 
>> in more contexts.
>
> Johannes Bechberger has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove two unnecessary lines

I'm currently implementing Andrews proposal for a static safefetch 
(https://github.com/openjdk/jdk/pull/7865, still in draft, but almost done). 
That will be more generic solution since we don't have to deal with thread wx 
state at all. That's why we closed this PR.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v7]

2022-03-14 Thread Thomas Stuefe
On Mon, 14 Mar 2022 08:19:41 GMT, Florian Weimer  wrote:

> > Thanks a lot, Florian! I got it to work under Linux x64.
> 
> Great!
> 
> > My error was that I had declared the label in C++ as `extern void* 
> > SafeFetch_continuation`. Declaring it as `extern char 
> > _SafeFetch32_continuation[] __attribute__ ((visibility ("hidden")));` as 
> > you suggested does the trick. I'm not sure I understand the difference.
> 
> Your approach might have worked as well, but you would have to use 
> `_continuation` on the C++ side. Arrays work directly because of 
> pointer decay.

Ah, that makes sense. I wondered why the address did not look like a code 
pointer in C++.

Anyway, got Linux x86_32 working too. Now I am working on aarch64. 

> 
> Anyway, from what I've seen, the array is more idiomatic.
> 
> > > It doesn't hurt, but the Itanium ABI does not mangle such global data 
> > > symbols, so it's not strictly needed.
> > 
> > 
> > I don't understand this remark, what does Itanium have to do with this?
> 
> The [C++ ABI definition](https://github.com/itanium-cxx-abi/cxx-abi) is 
> probably Itanium's most lasting contribution to computing. I think it's used 
> on most non-Windows systems these days, not just on Linux, and of course on 
> all kinds of CPUs.

Interesting to know. Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v7]

2022-03-14 Thread Thomas Stuefe
On Fri, 11 Mar 2022 07:52:16 GMT, Johannes Bechberger  
wrote:

>> The WXMode for the current thread (on MacOS aarch64) is currently stored in 
>> the thread class which is unnecessary as the WXMode is bound to the current 
>> OS thread, not the current instance of the thread class.
>> This pull request moves the storage of the current WXMode into a thread 
>> local global variable in `os` and changes all related code. SafeFetch 
>> depended on the existence of a thread object only because of the WXMode. 
>> This pull request therefore removes the dependency, making SafeFetch usable 
>> in more contexts.
>
> Johannes Bechberger has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove two unnecessary lines

Hi Florian,

> If you have a PR, please Cc: me on it, I will have a look.

Thanks a lot, Florian! I got it to work under Linux x64.

My error was that I had declared the label in C++ as `extern void* 
SafeFetch_continuation`. Declaring it as `extern char 
_SafeFetch32_continuation[] __attribute__ ((visibility ("hidden")));` as you 
suggested does the trick. I'm not sure I understand the difference.

>> extern "C"

> It doesn't hurt, but the Itanium ABI does not mangle such global data 
> symbols, so it's not strictly needed.

I don't understand this remark, what does Itanium have to do with this?

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v7]

2022-03-11 Thread Thomas Stuefe
On Fri, 11 Mar 2022 10:44:25 GMT, Thomas Stuefe  wrote:

> > On 3/11/22 10:12, Thomas Stuefe wrote: We could also redefine SafeFetch on 
> > MacOS/AArch64 to not need WX. We could do this by statically generating 
> > SafeFetch on that platform, and it wouldn't be in the JIT region at all. 
> > Why not just do that? Do you mean using inline assembly?
> > I'd use out-of-line assembly, as I do for atomic compare-and-swap on linux: 
> > https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S
> >  But I guess inline would work.
> 
> Oh, this is neat. It would work on all platforms too, or on all we care to 
> implement it for. And it would nicely solve the initialization window problem 
> since it would work before stub routines are generated. We could throw 
> `CanUseSafeFetch` away.
> 
> It seems we already do static assembly on bsd aarch. So there is already a 
> path to follow.
> 
> But this could also be done as a follow up enhancement. I still like the OS 
> TLS variable idea.

I spent some time doing a static implementation of SafeFetch on Linux x64, and 
its not super trivial. The problem is that we need to know addresses of 
instructions inside that function. I can set labels in assembly, and I can 
export them, but so far I have been unable to use them as addresses in C++ 
code. I will research some more.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

2022-03-11 Thread Thomas Stuefe
On Fri, 11 Mar 2022 16:34:29 GMT, Anton Kozlov  wrote:

> > blocking SIGSEGV and SIGBUS - or other synchronous error signals like 
> > SIGFPE - and then triggering said signal is UB. What happens is 
> > OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, 
> > since what is the kernel supposed to do. It cannot deliver the signal, and 
> > deferring it would require returning to the faulting instruction, that 
> > would just re-fault.
> > For some more details see e.g. 
> > https://bugs.openjdk.java.net/browse/JDK-8252533
> 
> This UB looks reasonable. My point is that a native thread would run fine 
> with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things 
> gets nasty.

Blocking synchronous error signals makes zero sense even for normal programs, 
since you lose the ability to get cores. For the JVM in particular, it also 
blocks facilities like polling pages, or dynamically querying CPU abilities. So 
a JVM would not even start with synchronous error signals blocked.

> 
> > > Is there a crash that is fixed by the change? I just spotted it is an 
> > > enhancement, not a bug. Just trying to understand the problem.
> > 
> > 
> > Yes, this issue is a breakout from 
> > https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use 
> > SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is 
> > called from the signal handler, and it may run in any number of situations 
> > (e.g. in foreign threads, or threads that are in the process of getting 
> > dismantled, etc).
> 
> I mean, some way to verify the issue is fixed, e.g. a test that does not fail 
> anymore.

No, tests do not exist. Unfortunately, otherwise this regression would have 
been detected right away and we would not need this PR.

We have a test though that tests SafeFetch during error handling. That test can 
be tweaked for this purpose. So, test does not exist yet, but can be easily 
written. 

> 
> I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at 
> the wrong place? 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569
> 
> > Another situation is error handling itself. When writing an hs-err file, we 
> > use SafeFetch to do carefully tiptoe around the possibly corrupt VM state. 
> > If the original crash happened in a foreign thread, we still want some of 
> > these reports to work (e.g. dumping register content or printing stacks). 
> > So SafeFetch should be as robust as possible.
> 
> OK, thanks. I think we also handle recursive segfaults recover after 
> interpretation of the corrupted VM state. Otherwise, implementing the 
> printing functions would be too tedious and hard with SafeFetch alone. But I 
> see it's used in printing register content, at least.

Secondary error handling is a very coarse-grained tool. If an error reporting 
step crashes out, we continue with the next step. Has disadvantages though. The 
total number of retries is very limited. And a faulting error reporting step 
still hurts, because its report is compromised. E.g. if the call stack printing 
crashes out, we have no call stack. This is not an abstract problem. Its a very 
concrete and typical problem.

I spend a large part of my work with hs-err reports. They are of very high 
importance to us. We (SAP) have invested a lot of time and effort in hardening 
out OpenJDK error reporting, and SafeFetch is an important part of that. For 
example, we provided the facility that made SafeFetch usable in signal 
handling. It would be nice if our work was not compromised. Please let us find 
a way forward here.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v7]

2022-03-11 Thread Thomas Stuefe
On Fri, 11 Mar 2022 10:27:25 GMT, Andrew Haley  wrote:

> On 3/11/22 10:12, Thomas Stuefe wrote: We could also redefine SafeFetch on 
> MacOS/AArch64 to not need WX. We could do this by statically generating 
> SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why 
> not just do that? Do you mean using inline assembly?
> I'd use out-of-line assembly, as I do for atomic compare-and-swap on linux: 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S
>  But I guess inline would work.

Oh, this is neat. It would work on all platforms too, or on all we care to 
implement it for. And it would nicely solve the initialization window problem 
since it would work before stub routines are generated. We could throw 
`CanUseSafeFetch` away.

It seems we already do static assembly on bsd aarch. So there is already a path 
to follow.

But this could also be done as a follow up enhancement. I still like the OS TLS 
variable idea.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

2022-03-11 Thread Thomas Stuefe
On Fri, 11 Mar 2022 09:33:40 GMT, Andrew Haley  wrote:

> We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do 
> this by statically generating SafeFetch on that platform, and it wouldn't be 
> in the JIT region at all. Why not just do that?

Do you mean using inline assembly?

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

2022-03-10 Thread Thomas Stuefe
On Thu, 10 Mar 2022 14:09:24 GMT, Anton Kozlov  wrote:

> > > Is it possible to change SafeFetch only? Switch to WXExec before calling 
> > > the stub and switch WXWrite back unconditionally? We won't need to 
> > > provide assert in ThreadWXEnable. But SafeFetch can check the assumption 
> > > with assert via Thread, if it exists.
> > 
> > 
> > But SafeFetch could be used from outside code as well as VM code. In case 
> > of the latter, prior state can either be WXWrite or WXExec. It needs to 
> > restore the prior state after the call.
> 
> I'm not sure I understand what is the "outside code". The SafeFetch is the 
> private hotspot function, it cannot be linked with non-JVM code, isn't it?

Sorry for being imprecise. I meant SafeFetch is triggered from within a signal 
handler that runs on a foreign thread. E.g. AGCT or error handling.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

2022-03-10 Thread Thomas Stuefe
On Thu, 10 Mar 2022 12:31:06 GMT, Anton Kozlov  wrote:

> > Signal handlers are per-process not per-thread, so a thread need not be 
> > attached to the VM for our signal handlers to get involved - that is why 
> > they call Thread::current_or_null_safe().
> 
> Oh, right, thanks. I was too concentrated on thinking another platforms like 
> windows, that missed the example will work for *nix.
> 
> A general question to the bug. The signal mask is per-thread, and a native 
> thread may block the JVM signal. I think safefetch will fail, if somehow we 
> manage to call it on this thread (without jsig). So I'm not sure the 
> safefetch is really safe on all platforms and in all contexts, that is, it 
> always recovers from the read failure if called on a random thread.

To expand on @dholmes-ora answer: blocking SIGSEGV and SIGBUS - or other 
synchronous error signals like SIGFPE - and then triggering said signal is UB. 
What happens is OS-dependent. I saw processes vanishing, or hang, or core. It 
makes sense, since what is the kernel supposed to do. It cannot deliver the 
signal, and deferring it would require returning to the faulting instruction, 
that would just re-fault. 

For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

> 
> Is there a crash that is fixed by the change? I just spotted it is an 
> enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from 
https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use 
SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is 
called from the signal handler, and it may run in any number of situations 
(e.g. in foreign threads, or threads which are in the process of getting 
dismantled, etc).

Another situation is error handling itself. When writing an hs-err file, we use 
SafeFetch to do carefully tiptoe around the possibly corrupt VM state. If the 
original crash happened in a foreign thread, we still want some of these 
reports to work (e.g. dumping register content or printing stacks). So 
SafeFetch should be as robust as possible.

> I'm not opposing the refactoring, it has some advantages, but I'd want to 
> separate functional change from the refactoring. This would aid backporting, 
> at least.

I agree that a minimal patch would be good. I feel partly guilty for this 
expanding discussion. I'm fine with a minimal change, without any refactoring, 
in whatever form @parttimenerd choses - be it OS thread local or the approach 
proposed by you, @AntonKozlov.

Cheers Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

2022-03-10 Thread Thomas Stuefe
On Wed, 9 Mar 2022 08:35:41 GMT, Johannes Bechberger  
wrote:

>> The WXMode for the current thread (on MacOS aarch64) is currently stored in 
>> the thread class which is unnecessary as the WXMode is bound to the current 
>> OS thread, not the current instance of the thread class.
>> This pull request moves the storage of the current WXMode into a thread 
>> local global variable in `os` and changes all related code. SafeFetch 
>> depended on the existence of a thread object only because of the WXMode. 
>> This pull request therefore removes the dependency, making SafeFetch usable 
>> in more contexts.
>
> Johannes Bechberger has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   current_thread_wx -> ThreadWX

Hi Anton,

thanks a lot for your explanations. You made some things clearer to me. My 
answers are inline.

> > Why don't we just switch it on once, for a thread that conceivably may call 
> > into generated code, and be done with? Why is this fine granular switching 
> > even needed? I find it difficult to imagine an attack vector that exploits 
> > having this always enabled for a thread. After all, we have to mark code 
> > cache with MAP_JIT already, so it is not like we can execute arbitrary 
> > memory ranges.
> 
> A java thread executes the code (interpreter, JIT) and changes the code (e.g. 
> it could make a nmethod nonentrant, change inline cache). Code modifications 
> are done in VM (runtime) call. So WX state is tied to the java thread state. 
> The WX management is done more to satisfy the platform requirement than to 
> make the system more secure.

Okay, that was the piece I was missing. I was assuming that we have either 
executing or modifying threads and that a thread was either one or the other 
(compiler threads compile, java threads run). You are saying that Java Threads 
may write too. And CompilerThreads, in the case of SafeFetch at least, may run 
generated code. So switching has to be done as a stack mark. Have I gotten this 
right?

> 
> > Related to that, how much does this call cost? Is this a runtime call into 
> > the pthread library or does it get inlined somehow? Because things like 
> > SafeFetch are trimmed to be super cheap if the memory can be accessed. 
> > Doing a pthread call on every invocation may throw off the cost-benefit 
> > ratio.
> 
> SafeFetch is much more expensive compared the direct memory access. So I 
> assume it's used only when the real chance exists the access may fail, and 
> the average cost of SafeFetch is much higher than the best case.

Yes, we only do this when necessary, but it is supposed to be reasonably cheap 
if memory is accessible. Its Load (the safefetch blob) -> unconditional jump to 
the blob -> load target memory -> jump back. Depending on what the pthread 
library call does, and if it's a real function call into a library, it would be 
more expensive than that.

> 
> Yes, WX management is offered via a pthread function call. I haven't 
> investigated if the native compiler can inline the call. The WX management 
> itself considerably cheap [#2200 
> (comment)](https://github.com/openjdk/jdk/pull/2200#issuecomment-773382787).
> 
> > Why and where do we need nesting? This would be so much easier if we could 
> > just not care.
> 
> We swtich the state to WXWrite at the entry in VM call, but a VM call may do 
> another VM call. E.g. a runtime VM calls the JNI GetLongField. So 
> GetLongField could be called from a code executing in Java (WXExec) and VM 
> (WXWrite) states, the WX state should be restored back on leaving JNI 
> function. The original state is tracked in Thread.

Okay, thanks again for clarifying.

> > Arguably we should restore, upon leaving the VM, the state that has been 
> > present before. Because a native thread may already have modified the wx 
> > state and overruling it is just rude. But I offhand don't see a way to do 
> > this since we have no way (?) to query the current state.
> 
> How in general safe to call SafeFetch on a native thread that has no Thread 
> object?

SafeFetch is safe to call if the stub routine exists. So it is safe after VM 
initialization. And that can be tested too. Callers, when in doubt, are 
encouraged to use `CanUseSafeFetch` to check if VM is still in 
pre-initialization time. `CanUseSafeFetch` + `SafeFetch` should never crash, 
regardless of when and by whom it was called. We also have 
`os::is_readable_pointer()`, which wraps these two calls for convenience.

> The JVM has not initialized the thread, so there could be no JVM signal 
> handler installed. Or using libjsig is mandatory for this case?

As David wrote, the Signal handler is per process. It is set as part of VM 
initialization before SafeFetch blob is generated. Foreign threads still enter 
the signal handler. So crashes in foreign threads still generate hs-err 
reports. Depending on how your support is organized that's either a bug or a 
feature :)

> 
> I also 

Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

2022-03-09 Thread Thomas Stuefe
On Wed, 9 Mar 2022 19:03:16 GMT, Anton Kozlov  wrote:

>> Johannes Bechberger has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   current_thread_wx -> ThreadWX
>
> https://github.com/openjdk/jdk/compare/master...478ec1a7ca2c72e5916b28613a4875aa2ee1a793
>  touches more places than a targeted change in ThreadWXEnable... I'm not sure 
> the real nesting is required for a thread that is not registered properly 
> with VM. The initial state is always assumed for the NULL Thread. The 
> SafeFetch assembly does not do up-calls to VM. I don't see why we'd need 
> runtime tracking of WX state. The state is either WXExec for SafeFetch 
> assembly, or unknown -- which we assume to be WXWrite regardless of approach 
> taken.
> 
> Nesting was implemented to reduce the amount of changes in JVM (yes, WX code 
> scattered around the VM less than it could be :)), but it is possible to 
> avoid runtime WX tracking if you always know the state, like we do if Thread 
> == NULL.

Hi @AntonKozlov,

> [master...478ec1a](https://github.com/openjdk/jdk/compare/master...478ec1a7ca2c72e5916b28613a4875aa2ee1a793)
>  touches more places than a targeted change in ThreadWXEnable... I'm not sure 
> the real nesting is required for a thread that is not registered properly 
> with VM.

Arguably we should restore, upon leaving the VM, the state that has been 
present before. Because a native thread may already have modified the wx state 
and overruling it is just rude. But I offhand don't see a way to do this since 
we have no way (?) to query the current state.



> The change proposes to assume WXWrite as the initial state. Have you 
> considered to extend ThreadWXEnable to fix the assert failure? Something like 
> below (I have not tried to compile though). The refactoring looks OK, but it 
> makes sense to separate it from functional change.
> 
> ```
> class ThreadWXEnable  {
>   Thread* _thread;
>   WXMode _old_mode;
> 
> public:
>   ThreadWXEnable(WXMode new_mode, Thread* thread) :
> _thread(thread)
>   {
> if (_thread) {
>   _old_mode = _thread->enable_wx(new_mode);
> } else {
>   os::current_thread_enable_wx(new_mode);
>   _old_mode = WXWrite;
> }
>   }
>   ~ThreadWXEnable() {
> if (_thread) {
>   _thread->enable_wx(_old_mode);
> } else {
>   os::current_thread_enable_wx(_old_mode);
> }
>   }
> };
> ```

I honestly don't find this easier than the solution @parttimenerd proposes, 
using a OS thread local. Using an OS thread local makes this whole system 
independent from Thread, so you don't need to know about Thread and don't rely 
on Thread::current being present.

It also would be slightly faster. Using Thread, we'd access TLS to get 
Thread::current, then dereference that to read the wx state . OTOH using OS 
TLS, we access TLS to get the wx state directly. We save one dereference.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v6]

2022-03-09 Thread Thomas Stuefe
On Wed, 9 Mar 2022 19:03:16 GMT, Anton Kozlov  wrote:

>> Johannes Bechberger has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   current_thread_wx -> ThreadWX
>
> https://github.com/openjdk/jdk/compare/master...478ec1a7ca2c72e5916b28613a4875aa2ee1a793
>  touches more places than a targeted change in ThreadWXEnable... I'm not sure 
> the real nesting is required for a thread that is not registered properly 
> with VM. The initial state is always assumed for the NULL Thread. The 
> SafeFetch assembly does not do up-calls to VM. I don't see why we'd need 
> runtime tracking of WX state. The state is either WXExec for SafeFetch 
> assembly, or unknown -- which we assume to be WXWrite regardless of approach 
> taken.
> 
> Nesting was implemented to reduce the amount of changes in JVM (yes, WX code 
> scattered around the VM less than it could be :)), but it is possible to 
> avoid runtime WX tracking if you always know the state, like we do if Thread 
> == NULL.

@AntonKozlov can you give us please a bit more background about the wx state 
stuff?

- Why don't we just switch it on once, for a thread that conceivably may call 
into generated code, and be done with? Why is this fine granular switching even 
needed? I find it difficult to imagine an attack vector that exploits having 
this always enabled for a thread. After all, we have to mark code cache with 
MAP_JIT already, so it is not like we can execute arbitrary memory ranges.
- Related to that, how much does this call cost? Is this a runtime call into 
the pthread library or does it get inlined somehow? Because things like 
SafeFetch are trimmed to be super cheap if the memory can be accessed. Doing a 
pthread call on every invocation may throw off the cost benefit ratio.
- Why and where do we need nesting? This would be so much easier if we could 
just not care.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v4]

2022-03-09 Thread Thomas Stuefe
On Wed, 9 Mar 2022 11:00:15 GMT, David Holmes  wrote:

> @tstuefe do you have some examples of (3)? I don't like introducing a fake, 
> seemingly general-purpose, API for something that is very much platform 
> specific. I do dislike intensely the way the ThreadWX changes pollute shared 
> code, and as has been said in other reviews of that code, there really should 
> be a much cleaner/clearer place where these transitions occur - if we can 
> find it.

Examples for platform specifics hidden behind a common facade with one or two 
platforms missing are very common, but I assume you mean implementations that 
are stubbed out almost everywhere, right?

 "os::os_exception_wrapper" is a very good example, hiding the details of SEH - 
which only exists on Windows - from generic code. It is an empty stub 
implementation on all platforms but Windows x86. On all other platforms 
(including windows aarch) we don't have that facility.

- "os::breakpoint" is a bit similar, since native breakpoint support only 
exists on Windows

- "os::set_native_thread_name" used to initially only exist on windows and 
linux, leaving out aix, mac and solaris.

I can dig deeper, but I remember generic wrappers with only a few or only one 
platform implementation being around. Especially when we still had Solaris 
around. 

I remember because like you I never was too fond of that pattern, but I find it 
often the smaller evil.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v4]

2022-03-09 Thread Thomas Stuefe
On Wed, 9 Mar 2022 07:30:43 GMT, David Holmes  wrote:

>> I don't know why the Linux x86 build fails.
>> 
>> I tested the current version with code related to #7591 and it seems to fix 
>> the remaining problems (I tested it also with NMT enabled).
>
> @parttimenerd  please never force-push in an active review as it completely 
> destroys the review history and comment context!

Hi @dholmes-ora , @parttimenerd 

I'd like to argue again for my proposal from before.

All this is contrary to how we deal with platform dependencies normally. 
Normally,  we
1) either keep the whole code in platform specific places - including callers. 
If that is possible.
2) If there are very few callers in shared places, we keep implementation in 
platform branch and #ifdef the callsites in shared code out to the affected 
platforms
3) If there are many callers in shared code or if it looks like it may be 
useful on other platforms too at some point, we usually wrap the logic in a 
platform generic function behind a descriptive name, which we stub out for the 
unaffected platforms.

(3) is very common.

My proposal from before would make it possible to really hide all WX logic from 
shared code:

shared/runtime/os.hpp
``` 
class os {
...
// Platform specific hook to prepare the current thread for calling generated 
code
void enable_jit_calls_for_current_thread() NOT_MACOS_AARCH64({})
// Platform specific hook to clean up the current thread after calling into 
generated code
void disable_jit_calls_for_current_thread() NOT_MACOS_AARCH64({})

class ThreadEnableJitCallsMark: public StackObj {
public:
ThreadEnableJitCallsMark() { enable_jit_calls_for_current_thread(); }
~ThreadEnableJitCallsMark() { disable_jit_calls_for_current_thread(); }
}



(ThreadEnableJitCallsMark could be optionally spread out into separate include)


os.bsd_aarch64.cpp

void os::enable_jit_calls_for_current_thread() {
 ... blabla ... pthread_jit_write_protect_np();
}

void os::disable_jit_calls_for_current_thread() {
 ... blabla ... pthread_jit_write_protect_np();
}



Thats very little code. It effectively hides all platform details where they 
belong, away from shared code. In shared code, you use either one of 
`os::(enable|disable)_jit_calls_for_current_thread()` or the companion 
`os::ThreadEnableJitCallsMark`. Advantages would be:

- Call sites in shared code are now easier to mentally parse. 
`os::disable_jit_calls_for_current_thread()` is much clearer than 
`MACOS_AARCH64_ONLY(os::ThreadWX::Enable __wx(os::ThreadWX::Write));`.
- We don't need MAC_AARCH64_ONLY in shared code
- We don't need the enums in shared code. Dont need to understand what they do, 
nor the difference between "Write" and "Exec". 

Side note, I'd also suggest a different name for the RAII object to something 
like "xxxMark". That is more according to hotspot customs.



Cheers, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v5]

2022-03-09 Thread Thomas Stuefe
On Wed, 9 Mar 2022 08:16:51 GMT, Johannes Bechberger  
wrote:

> Oh, I did not know that. Sorry for that, I just wanted to rebase it and 
> forgot that this would change all the commit ids.

You couldn't know. Just merge "master", which achieves the same and leaves the 
history intact.

Additional tip. You can start a PR in draft mode. Reviews won't start until you 
un-draft. In draft mode, you have plenty of time to flesh out the patch and fix 
all bugs that show up in GHAs. In that phase, it's still okay to rebase. Once 
you consider the patch ready, you can un-draft the PR and the reviews start. 
Only then you should not rebase anymore.

Starting out with a draft PR has also the advantage of not using reviewer time 
on an incomplete patch.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current [v4]

2022-03-08 Thread Thomas Stuefe
On Tue, 8 Mar 2022 13:08:13 GMT, Johannes Bechberger  
wrote:

>> src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.hpp line 45:
>> 
>>> 43:   #ifdef __APPLE__
>>> 44: 
>>> 45:   class current_thread_wx {
>> 
>> This violates the style guide for class names. It would be CurrentThreadWX - 
>> but ThreadWX seems sufficient to me.
>
> But os is okay? I just use this name for grouping.

"os" is okay. Its historical. Its also more of a namespace really.

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current

2022-03-07 Thread Thomas Stuefe
On Mon, 7 Mar 2022 11:29:08 GMT, Johannes Bechberger  
wrote:

> The WXMode for the current thread (on MacOS aarch64) is currently stored in 
> the thread class which is unnecessary as the WXMode is bound to the current 
> OS thread, not the current instance of the thread class.
> This pull request moves the storage of the current WXMode into a thread local 
> global variable in `os` and changes all related code. SafeFetch depended on 
> the existence of a thread object only because of the WXMode. This pull 
> request therefore removes the dependency, making SafeFetch usable in more 
> contexts.

Hi Johannes, just some drive-by comments, not a full review. Also please see my 
comment toward David, proposing a more generic interface in os instead.

Cheers, Thomas

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 537:

> 535: #endif
> 536: 
> 537: static THREAD_LOCAL WXMode _wx_state = WXUnknown;

All this wx coding inside bsd sources should be guarded with `__APPLE__` out of 
politeness toward the BSDs.

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 552:

> 550: _wx_state = new_state;
> 551: pthread_jit_write_protect_np(_wx_state == WXExec);
> 552:   }

I would simplify this:


if (_wx_state == unknown) {
  _wx_state = write; // No way to know but we assume the original state is 
"writable, not executable"
}
WXMode old = _wx_state;
_wx_state = new_state;
pthread_jit_write_protect_np(_wx_state == WXExec);
}


that is simpler and avoids calling pthread_jit_write_protect_np twice for the 
"unknown->exec" transition.

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 558:

> 556: void os::current_thread_reset_wx() {
> 557:   current_thread_change_wx(WXWrite);
> 558: }

I find the naming a  bit misleading. You use this as initialization, so I would 
call it "init" something.

Then, I'm not sure it is even needed. I know you just transformed it from the 
original `init_wx()`, so the question is directed more at the original authors 
(@AntonKozlov?).

AFAIU we use this to initialize wxstate for newly attached threads to "dont 
execute". But should this not already be the case? And if its not - e.g. 
because that thread had been calling into another library that also does call 
generated code - is it not impolite to switch the state to "executable false"? 
I know this is highly unlikely, I just try to understand.

src/hotspot/share/runtime/os.hpp line 943:

> 941:   static WXMode current_thread_change_wx(WXMode new_state);
> 942: 
> 943:   static void current_thread_reset_wx();

Please add comments what this is supposed to do

-

Changes requested by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8282475: SafeFetch should not rely on existence of Thread::current

2022-03-07 Thread Thomas Stuefe
On Mon, 7 Mar 2022 11:29:08 GMT, Johannes Bechberger  
wrote:

> The WXMode for the current thread (on MacOS aarch64) is currently stored in 
> the thread class which is unnecessary as the WXMode is bound to the current 
> OS thread, not the current instance of the thread class.
> This pull request moves the storage of the current WXMode into a thread local 
> global variable in `os` and changes all related code. SafeFetch depended on 
> the existence of a thread object only because of the WXMode. This pull 
> request therefore removes the dependency, making SafeFetch usable in more 
> contexts.

Hi David,

> The general idea seems good (pity it touches so many files, but then I've 
> never liked any of this WX support precisely because it is so invasive of 
> shared code). I agree that safeFetch should not have become dependent on 
> Thread::current existing, but I have to wonder whether we can just skip the 
> WX code if there is no current thread? If the thread is not attached to the 
> VM then what does it even mean to manipulate the WX state of an unknown 
> thread?

We need to change the wx state of the current pthread in order to be able to 
execute stub routines. Otherwise, we would crash right away when trying to 
execute the SafeFetch stub.

And that is a valid requirement. Let's say we crash in a native thread, 
unrelated to and completely oblivious of the JVM it shares the process with. 
We'd still want to see e.g. native crash information, stack frames, maybe 
register region information etc - all that stuff that may require SafeFetch. In 
fact, this patch is related to Johannes other PR where he modified stack frame 
walking to check that the registers point into valid memory. 

> 
> That aside, with this change I think we can move the conditional WX code out 
> of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually 
> belongs.

Oh yes!

> 
> I'd even like to see threadWXSetters.inline.hpp moved to being in 
> src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be 
> needed for the callsites to function - os.hpp I presume?

I agree, all that wx stuff should be limited to os/bsd or os/bsd_aarch. 

We could have generic wrappers like:


class os {
...
// Platform does whatever needed to prepare for execution of generated code 
inside the current thread
os::pre_current_thread_jit_call() NOT_MACOS_AARCH64({})
// Platform does whatever needed to clean up after executing generated code 
inside the current thread
os::post_current_thread_jit_call() NOT_MACOS_AARCH64({})


(Macro does not yet exist, but MACOS_AARCH64_ONLY does)

--

Side note, I think we have reached a point where it would be cleaner to split 
xxxBSD and MacOS sources. E.g. this wx stuff should be limited to MacOS too, 
and we have more and more `__APPLE_` only sections.

Cheers, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/7727


Re: RFR: 8275775: Add jcmd VM.classes to print details of all classes [v10]

2022-03-05 Thread Thomas Stuefe
On Fri, 4 Mar 2022 09:05:36 GMT, Yi Yang  wrote:

>> Add VM.classes to print details of all classes, output looks like:
>> 
>> 1. jcmd VM.classes
>> 
>> KlassAddr Size State Flags LoaderName ClassName
>> 0x000800c0b400 62 inited W bootstrap 
>> java.lang.invoke.LambdaForm$MH/0x000800c0b400
>> 0x000800c0b000 62 inited W bootstrap 
>> java.lang.invoke.LambdaForm$DMH/0x000800c0b000
>> 0x000800c0ac00 62 inited W bootstrap 
>> java.lang.invoke.LambdaForm$MH/0x000800c0ac00
>> ...
>> 
>> 2. jcmd VM.classes verbose
>> 
>> KlassAddr Size State Flags LoaderName ClassName
>> 0x000800c0b400 62 inited W bootstrap 
>> java.lang.invoke.LambdaForm$MH/0x000800c0b400
>> java.lang.invoke.LambdaForm$MH/0x000800c0b400 {0x000800c0b400}
>>  - instance size: 2
>>  - klass size: 62
>>  - access: final synchronized
>>  - state: inited
>>  - name: 'java/lang/invoke/LambdaForm$MH+0x000800c0b400'
>>  - super: 'java/lang/Object'
>>  - sub:
>>  - arrays: NULL
>>  - methods: Array(0x7f620841f210)
>>  - method ordering: Array(0x000800a7e5a8)
>>  - default_methods: Array(0x)
>>  - local interfaces: Array(0x0008005af748)
>>  - trans. interfaces: Array(0x0008005af748)
>>  - constants: constant pool [41] {0x7f620841f030} for 
>> 'java/lang/invoke/LambdaForm$MH+0x000800c0b400' cache=0x7f620841f380
>>  - class loader data: loader data: 0x7f61c804a690 of 'bootstrap' has a 
>> class holder
>>  - source file: 'LambdaForm$MH'
>>  - class annotations: Array(0x)
>>  - class type annotations: Array(0x)
>>  - field annotations: Array(0x)
>>  - field type annotations: Array(0x)
>>  - inner classes: Array(0x0008005af6d8)
>>  - nest members: Array(0x0008005af6d8)
>>  - permitted subclasses: Array(0x0008005af6d8)
>>  - java mirror: a 'java/lang/Class'{0x00011f4b3968} = 
>> 'java/lang/invoke/LambdaForm$MH+0x000800c0b400'
>>  - vtable length 5 (start addr: 0x000800c0b5b8)
>>  - itable length 2 (start addr: 0x000800c0b5e0)
>>  -  static fields (1 words):
>>  - static final '_D_0' 'Ljava/lang/invoke/LambdaForm;' @112
>>  -  non-static fields (0 words):
>>  - non-static oop maps:
>> 0x000800c0b000 62 inited W bootstrap 
>> java.lang.invoke.LambdaForm$DMH/0x000800c0b000
>> java.lang.invoke.LambdaForm$DMH/0x000800c0b000 {0x000800c0b000}
>>  - instance size: 2
>>  - klass size: 62
>>  - access: final synchronized
>>  - state: inited
>>  - name: 'java/lang/invoke/LambdaForm$DMH+0x000800c0b000'
>>  - super: 'java/lang/Object'
>>  - sub:
>>  - arrays: NULL
>>  - methods: Array(0x7f620841ea68)
>>  - method ordering: Array(0x000800a7e5a8)
>>  - default_methods: Array(0x)
>>  - local interfaces: Array(0x0008005af748)
>>  - trans. interfaces: Array(0x0008005af748)
>>  - constants: constant pool [49] {0x7f620841e838} for 
>> 'java/lang/invoke/LambdaForm$DMH+0x000800c0b000' cache=0x7f620841ebe0
>>  - class loader data: loader data: 0x7f61c804a750 of 'bootstrap' has a 
>> class holder
>>  - source file: 'LambdaForm$DMH'
>>  - class annotations: Array(0x)
>>  - class type annotations: Array(0x)
>>  - field annotations: Array(0x)
>>  - field type annotations: Array(0x)
>>  - inner classes: Array(0x0008005af6d8)
>>  - nest members: Array(0x0008005af6d8)
>>  - permitted subclasses: Array(0x0008005af6d8)
>>  - java mirror: a 'java/lang/Class'{0x00011f4b0968} = 
>> 'java/lang/invoke/LambdaForm$DMH+0x000800c0b000'
>>  - vtable length 5 (start addr: 0x000800c0b1b8)
>>  - itable length 2 (start addr: 0x000800c0b1e0)
>>  -  static fields (1 words):
>>  - static final '_D_0' 'Ljava/lang/invoke/LambdaForm;' @112
>>  -  non-static fields (0 words):
>> ...
>
> Yi Yang has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   use %4d

Looks good!

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7105


Integrated: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible

2022-01-25 Thread Thomas Stuefe
On Sat, 22 Jan 2022 13:33:24 GMT, Thomas Stuefe  wrote:

> JDK-8249944 moved AllStatic to its own header. We should use that one instead 
> of allocation.hpp where possible to reduce header dependencies.
> 
> This patch:
> - replaces includes of allocation.hpp with allstatic.hpp where appropiate
> - fixes up resulting errors since this changes uncovers missing dependencies. 
> Mainly, missing includes of debug.hpp, of globalDefinitions.hpp, and missing 
> outputStream definitions.
> 
> Changes are trivial but onerous. Done partly with a script, partly manually.
> 
> Test:
> - Checked the build with gtests on Linux x86, x64, minimal, zero, aarch64, 
> for both fastdebug and release. All builds of course without PCH.
> - GHAs

This pull request has now been integrated.

Changeset: 2155afe2
Author:Thomas Stuefe 
URL:   
https://git.openjdk.java.net/jdk/commit/2155afe2a87d718757b009d712361d7a63946a7f
Stats: 368 lines in 172 files changed: 35 ins; 0 del; 333 mod

8280503: Use allStatic.hpp instead of allocation.hpp where possible

Reviewed-by: dholmes, iklam

-

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible [v3]

2022-01-25 Thread Thomas Stuefe
On Mon, 24 Jan 2022 20:32:56 GMT, Ioi Lam  wrote:

> All builds in our CI passed.

Thanks a lot, Ioi!

-

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible [v2]

2022-01-23 Thread Thomas Stuefe
On Mon, 24 Jan 2022 05:44:50 GMT, Thomas Stuefe  wrote:

>> JDK-8249944 moved AllStatic to its own header. We should use that one 
>> instead of allocation.hpp where possible to reduce header dependencies.
>> 
>> This patch:
>> - replaces includes of allocation.hpp with allstatic.hpp where appropiate
>> - fixes up resulting errors since this changes uncovers missing 
>> dependencies. Mainly, missing includes of debug.hpp, of 
>> globalDefinitions.hpp, and missing outputStream definitions.
>> 
>> Changes are trivial but onerous. Done partly with a script, partly manually.
>> 
>> Test:
>> - Checked the build with gtests on Linux x86, x64, minimal, zero, aarch64, 
>> for both fastdebug and release. All builds of course without PCH.
>> - GHAs
>
> Thomas Stuefe has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   add missing includes for macos, windows

Hi Ioi,

I fixed Windows, but cannot test MacOS since I don't have the hardware. Could 
you please give this another try?

Found out that the reason we don't see failing builds in GHAs is that GHAs 
build with precompiled headers. We should change this if possible.

Thanks, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible [v3]

2022-01-23 Thread Thomas Stuefe
> JDK-8249944 moved AllStatic to its own header. We should use that one instead 
> of allocation.hpp where possible to reduce header dependencies.
> 
> This patch:
> - replaces includes of allocation.hpp with allstatic.hpp where appropiate
> - fixes up resulting errors since this changes uncovers missing dependencies. 
> Mainly, missing includes of debug.hpp, of globalDefinitions.hpp, and missing 
> outputStream definitions.
> 
> Changes are trivial but onerous. Done partly with a script, partly manually.
> 
> Test:
> - Checked the build with gtests on Linux x86, x64, minimal, zero, aarch64, 
> for both fastdebug and release. All builds of course without PCH.
> - GHAs

Thomas Stuefe has updated the pull request incrementally with one additional 
commit since the last revision:

  Add missing header to zNUMA.hpp

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7188/files
  - new: https://git.openjdk.java.net/jdk/pull/7188/files/2f60b989..e2b9b0d6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7188=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7188=01-02

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7188.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7188/head:pull/7188

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible [v2]

2022-01-23 Thread Thomas Stuefe
On Mon, 24 Jan 2022 05:54:18 GMT, Ioi Lam  wrote:

> > > BTW, I have some scripts for checking how often a header file is 
> > > included. See https://github.com/iklam/tools/tree/main/headers
> > 
> > 
> > Does your tool tell you include chokepoints, maybe its just one central 
> > include pulling in allocation.hpp?
> 
> The whoincludes.tcl script can do that. Unfortunately it tells us that many 
> popular header (such as ostream.hpp that was itself included 976 times) 
> include allocations.hpp.
> 
> ```
> src/hotspot$ tclsh whoincludes.tcl allocation.hpp| head -20
> scanning997 allocation.hpp
>2 found976 ostream.hpp
>3 found960 exceptions.hpp
>4 found938 atomic.hpp
>5 found891 memRegion.hpp
>6 found877 iterator.hpp
>7 found871 arena.hpp
>8 found864 mutex.hpp
>9 found860 growableArray.hpp
>   10 found855 mutexLocker.hpp
>   11 found855 autoRestore.hpp
>   12 found848 padded.hpp
>   13 found841 linkedlist.hpp
>   14 found835 jfrAllocation.hpp
>   15 found832 resourceHash.hpp
>   16 found829 gcUtil.hpp
>   17 found825 threadHeapSampler.hpp
>   18 found825 thread.hpp
>   19 found825 filterQueue.hpp
>   20 found679 symbol.hpp
> ```

That's a useful script. Well, maybe we can break this up a bit more.

BTW, can you send me your configure line for your breaking builds? I'd like to 
reproduce them locally.

-

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible [v2]

2022-01-23 Thread Thomas Stuefe
> JDK-8249944 moved AllStatic to its own header. We should use that one instead 
> of allocation.hpp where possible to reduce header dependencies.
> 
> This patch:
> - replaces includes of allocation.hpp with allstatic.hpp where appropiate
> - fixes up resulting errors since this changes uncovers missing dependencies. 
> Mainly, missing includes of debug.hpp, of globalDefinitions.hpp, and missing 
> outputStream definitions.
> 
> Changes are trivial but onerous. Done partly with a script, partly manually.
> 
> Test:
> - Checked the build with gtests on Linux x86, x64, minimal, zero, aarch64, 
> for both fastdebug and release. All builds of course without PCH.
> - GHAs

Thomas Stuefe has updated the pull request incrementally with one additional 
commit since the last revision:

  add missing includes for macos, windows

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7188/files
  - new: https://git.openjdk.java.net/jdk/pull/7188/files/37fd8fdb..2f60b989

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7188=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7188=00-01

  Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7188.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7188/head:pull/7188

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible

2022-01-23 Thread Thomas Stuefe
On Mon, 24 Jan 2022 04:22:34 GMT, Ioi Lam  wrote:

> BTW, I have some scripts for checking how often a header file is included. 
> See https://github.com/iklam/tools/tree/main/headers
> 
> count_hotspot_headers.tcl shows that allocation.hpp was included by 1006 .o 
> files before this fix, and 996 files afterwards, so not a whole lot of 
> reduction. That's because we have over 300 headers that include 
> allocatons.hpp :-)

Yes. allocation.hpp could be split up more. E.g. MEMFLAGS and the NMT 
categories really should live somewhere else, I saw some places where 
allocation.hpp was included only because of them. StackObj may also be a good 
candidate for moving to an own small header.

Does your tool tell you include chokepoints, maybe its just one central include 
pulling in allocation.hpp?

> Unfortunately I am seeing failures on macos and windows:
> 
> macos:
> 
> src/hotspot/os/bsd/gc/z/zNUMA_bsd.cpp:25: 
> src/hotspot/share/gc/z/zNUMA.hpp:39:10: error: unknown type name 'uint32_t'
> 
> windows:
> 
> src\hotspot\os\windows\threadLocalStorage_windows.cpp(34): error C3861: 
> 'assert': identifier not found

Strange, since the GHAs went through. What are your build flags? We should be 
able to rely on GHAs for builds at least :(

The bugs are easy to fix though. Thanks for testing.

-

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible

2022-01-23 Thread Thomas Stuefe
On Mon, 24 Jan 2022 02:43:04 GMT, David Holmes  wrote:

> Hi Thomas,
> 
> Seems okay - hard to validate (and I expect allocation.hpp to be included 
> somewhere in most cases anyway).
> 

It shouldn't, that's the idea of JDK-8249944.

> Some of the changes seem to have nothing to do with:
> 
> -#include "memory/allocation.hpp" +#include "memory/allStatic.hpp"
> 
> ? Are these caused by transitive include changes?
> 

See my comment in code. All places that needed to be fixed.

> Thanks, David

Thanks for the review,

..Thomas

> src/hotspot/share/cds/archiveUtils.cpp line 44:
> 
>> 42: #include "utilities/debug.hpp"
>> 43: #include "utilities/formatBuffer.hpp"
>> 44: #include "utilities/globalDefinitions.hpp"
> 
> Seems unrelated to this issue.

archiveUtils.cpp misses debug.hpp (since it uses assert) and 
globalDefinitions.hpp (since it uses types from there) but missed these 
includes. This had been hidden by one of its includes pulling allocation.hpp. 
Same goes for the other seemingly unrelated fixups: all missing includes hidden 
by including allocation.hpp

-

PR: https://git.openjdk.java.net/jdk/pull/7188


RFR: JDK-8280503: Use allStatic.hpp instead of allocation.hpp where possible

2022-01-22 Thread Thomas Stuefe
JDK-8249944 moved AllStatic to its own header. We should use that one instead 
of allocation.hpp where possible to reduce header dependencies.

This patch:
- replaces includes of allocation.hpp with allstatic.hpp where appropiate
- fixes up resulting errors since this changes uncovers missing dependencies. 
Mainly, missing includes of debug.hpp, of globalDefinitions.hpp, and missing 
outputStream definitions.

Changes are trivial but onerous. Done partly with a script, partly manually.

Test:
- Checked the build with gtests on Linux x86, x64, minimal, zero, aarch64, for 
both fastdebug and release. All builds of course without PCH.
- GHAs

-

Commit messages:
 - fix copyrights
 - start

Changes: https://git.openjdk.java.net/jdk/pull/7188/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7188=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8280503
  Stats: 365 lines in 170 files changed: 32 ins; 0 del; 333 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7188.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7188/head:pull/7188

PR: https://git.openjdk.java.net/jdk/pull/7188


Re: RFR: 8279124: VM does not handle SIGQUIT during initialization [v7]

2022-01-22 Thread Thomas Stuefe
On Sat, 22 Jan 2022 01:17:41 GMT, Xin Liu  wrote:

>> In early stage of initialization, HotSpot doesn't handle SIGQUIT. The 
>> default signal preposition on Linux is to quit the process and generate 
>> coredump.
>> 
>> There are 2 applications for this signal.
>> 1. There's a handshake protocol between sun.tools.attach and HotSpot.  
>> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been 
>> initialized. It expects "Signal Dispatcher" to handle SIGQUIT and create 
>> AttachListener.
>> 2. POSIX systems use SIGQUIT as SIGBREAK. After AttachListener is up, 
>> HotSpot will reinterpret the signal for thread dump.
>> 
>> It is possible that HotSpot is still initializing in Threads::create_vm() 
>> when SIGQUIT arrives. We should change JVM_HANDLE_XXX_SIGNAL to catch 
>> SIGQUIT and ignore it. It is installed os::init_2() and should cover the 
>> early stage of initialization. Later on, os::initialize_jdk_signal_support() 
>> still overwrites the signal handler of SIGQUIT if ReduceSignalUsage is 
>> false(default). 
>> 
>> Testing 
>> 
>> Before, this patch, once initialization takes long time, jcmd may quit the 
>> java process. 
>> 
>> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr 
>> -XX:ParallelGCThreads=1 &
>> [1] 9589
>> [0.028s][debug][gc,heap] Minimum heap 68719476736  Initial heap 68719476736  
>> Maximum heap 68719476736
>> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work 
>> units pre-touching 68719476736B.
>> $jcmd 9589 VM.flags
>> 9589:
>> [1]  + 9589 quit   java -Xms64g -XX:+AlwaysPreTouch 
>> -Xlog:gc+heap=debug:stderr
>> java.io.IOException: No such process
>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native 
>> Method)
>> at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:100)
>> at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>> at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>> 
>> 
>> With this patch, neither jcmd nor kill -3 will disrupt java process 45850. 
>> 
>> $java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 
>> -Xlog:os+init=debug:stderr -version &
>> [1] 45850
>> $ kill -3 45850
>> [6.902s][info][os,init] ignore BREAK_SIGNAL in the initialization phase. 
>>  
>>  
>>
>> $jcmd 45850 VM.flags
>> 45850:
>> [19.920s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [25.422s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [26.522s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [27.723s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [29.023s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [30.423s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file 
>> /proc/45850/root/tmp/.java_pid45850: target process 45850 doesn't respond 
>> within 10500ms or HotSpot VM not loaded
>> at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:105)
>> at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>> at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>> $openjdk version "19-internal" 2022-09-20
>> OpenJDK Runtime Environment (fastdebug build 19-internal+0-adhoc.xxinliu.jdk)
>> OpenJDK 64-Bit Server VM (fastdebug build 19-internal+0-adhoc.xxinliu.jdk, 
>> mixed mode, sharing)
>> [1]  + 45850 done   java -Xms64g -XX:+AlwaysPreTouch 
>> -XX:ParallelGCThreads=1  -version
>
> Xin Liu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Do not do check_signal_handler for SIGQUIT
>   
>   It's because SIGQUIT will be overwritten later. This patch fixed the 
> regresssion
>   runtime/jni/checked/TestCheckedJniExceptionCheck.java.
>   The test uses -Xcheck:jni and the warning message from 
> JniPeriodicCheckerTask
>   may mess up the expected outputs.

Good catch about Xcheck:jni.

This looks good to me. Ship it.

-

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VM does not handle SIGQUIT during initialization [v5]

2022-01-21 Thread Thomas Stuefe
On Fri, 21 Jan 2022 09:00:49 GMT, David Holmes  wrote:

>> hi, David and Thomas, 
>> 
>> I am nervous about this change. It's not complex but I don't want to break 
>> the existing java applications. Bear with me. 
>> 
>> From my reading, HotSpot can chain a user-custom signal handler because of 
>> libjsig.so.  It's not like a linked-list chain. if the user installs a 
>> handler for a signal, libjsig just saves it. JVM_HANDLE_XXX_SIGNAL is 
>> invoked first and then the user-custom handler is called 
>> [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/os/posix/signals_posix.cpp#L635)
>>  if it isn't handled.
>> 
>> The reason we can hoist this logic to line 564 because we **assume** that no 
>> user application would define a handler for BREAK_SIGNAL.  It doesn't work 
>> right now because os::initialize_jdk_signal_support() will overwrite the 
>> signal handler of BREAK_SIGNAL later.
>
> Hi Xin,
> Signal chaining doesn't work for BREAK_SIGNAL - from the signal chaining docs:
> 
> 
> Note:
> 
> The SIGQUIT, SIGTERM, SIGINT, and SIGHUP signals cannot be chained. If the 
> application must 
> handle these signals, then consider using the —Xrs option.

Exactly. If embedding application likes to handle SIGQUIT, it needs to set 
-Xrs. In that case, it does not matter if it installed the signal handler 
before or after VM was initialized. We just won't touch SIGQUIT at all in that 
case.

Wrt the position of handling the break signal: most of the coding between lines 
564 and 588 does not hurt; `PosixSignals::unblock_error_signals();` may 
actually be beneficial, though it is highly unlikely that we get problems with 
secondary crashes when handling SIGQUIT. So, line 588 would be an okay position 
for SIGQUIT handling too.

-

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VM does not handle SIGQUIT during initialization [v5]

2022-01-19 Thread Thomas Stuefe
On Thu, 20 Jan 2022 06:05:22 GMT, Xin Liu  wrote:

>> In early stage of initialization, HotSpot doesn't handle SIGQUIT. The 
>> default signal preposition on Linux is to quit the process and generate 
>> coredump.
>> 
>> There are 2 applications for this signal.
>> 1. There's a handshake protocol between sun.tools.attach and HotSpot.  
>> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been 
>> initialized. It expects "Signal Dispatcher" to handle SIGQUIT and create 
>> AttachListener.
>> 2. POSIX systems use SIGQUIT as SIGBREAK. After AttachListener is up, 
>> HotSpot will reinterpret the signal for thread dump.
>> 
>> It is possible that HotSpot is still initializing in Threads::create_vm() 
>> when SIGQUIT arrives. We should change JVM_HANDLE_XXX_SIGNAL to catch 
>> SIGQUIT and ignore it. It is installed os::init_2() and should cover the 
>> early stage of initialization. Later on, os::initialize_jdk_signal_support() 
>> still overwrites the signal handler of SIGQUIT if ReduceSignalUsage is 
>> false(default). 
>> 
>> Testing 
>> 
>> Before, this patch, once initialization takes long time, jcmd may quit the 
>> java process. 
>> 
>> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr 
>> -XX:ParallelGCThreads=1 &
>> [1] 9589
>> [0.028s][debug][gc,heap] Minimum heap 68719476736  Initial heap 68719476736  
>> Maximum heap 68719476736
>> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work 
>> units pre-touching 68719476736B.
>> $jcmd 9589 VM.flags
>> 9589:
>> [1]  + 9589 quit   java -Xms64g -XX:+AlwaysPreTouch 
>> -Xlog:gc+heap=debug:stderr
>> java.io.IOException: No such process
>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native 
>> Method)
>> at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:100)
>> at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>> at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>> 
>> 
>> With this patch, neither jcmd nor kill -3 will disrupt java process 45850. 
>> 
>> $java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 
>> -Xlog:os+init=debug:stderr -version &
>> [1] 45850
>> $ kill -3 45850
>> [6.902s][info][os,init] ignore BREAK_SIGNAL in the initialization phase. 
>>  
>>  
>>
>> $jcmd 45850 VM.flags
>> 45850:
>> [19.920s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [25.422s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [26.522s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [27.723s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [29.023s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> [30.423s][info][os,init] ignore BREAK_SIGNAL in the initialization phase.
>> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file 
>> /proc/45850/root/tmp/.java_pid45850: target process 45850 doesn't respond 
>> within 10500ms or HotSpot VM not loaded
>> at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:105)
>> at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>> at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>> $openjdk version "19-internal" 2022-09-20
>> OpenJDK Runtime Environment (fastdebug build 19-internal+0-adhoc.xxinliu.jdk)
>> OpenJDK 64-Bit Server VM (fastdebug build 19-internal+0-adhoc.xxinliu.jdk, 
>> mixed mode, sharing)
>> [1]  + 45850 done   java -Xms64g -XX:+AlwaysPreTouch 
>> -XX:ParallelGCThreads=1  -version
>
> Xin Liu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   remove log_info from the signal handler.

Looks good now. Thanks!

Cheers, Thomas

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VM does not handle SIGQUIT during initialization [v3]

2022-01-19 Thread Thomas Stuefe
On Wed, 19 Jan 2022 12:39:31 GMT, Kevin Walls  wrote:

> So I like the change but would like to be clearer where the problem exists, 
> where (what platforms?) can we see no signals ignored or caught at startup, 
> and trigger the problem of crashing the VM with SIGQUIT.

I reproduced it with my artificial delay on Ubuntu 20.04. Cannot reproduce it 
with AlwaysPreTouch since my machine is too fast.

-

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VM does not handle SIGQUIT during initialization [v2]

2022-01-19 Thread Thomas Stuefe
On Wed, 19 Jan 2022 09:20:07 GMT, Xin Liu  wrote:

>>> This is in the signal handler. Is it safe to print message or do something 
>>> complex?
>> 
>> raw write to stdout should be fine I think.
>> 
>>>I would not print anything here, first because it is in the signal handler 
>>>and secondly I don't think it is needed. No real thread dump usages are 
>>>going to hit a VM that hasn't initialized yet IMO.
>> 
>> Might be nice for the user to know why his jcmd or kill -3 gets ignored 
>> though.
>
> because JVM_HANDLE_XXX_SIGNAL is installed in os::init_2(), it's safe to use 
> log. I added a log entry with the tag os+init.  -Xlog:os+init=info will print 
> out a message "ignore BREAK_SIGNAL in the initialization phase." I also 
> update the testing example in the description section.

Sorry for being difficult, but I would not do UL logging here. UL is 
heavyweight, and a shifting target too, because folks like to extend it with 
new capabilities. God knows what log_info really does (initialization, network 
traffic, accessing thread locals...) or will do in the future.

I'd keep it simple stupid and either just use write to fd=2 or fprintf(stderr). 
The former is safe to use in signal handling contexts, not sure about the 
latter. Otherwise maybe David is right and we skip the output altogether.

-

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VirtualMachineImpl should check signal handler has installed before sending SIGQUIT [v2]

2022-01-19 Thread Thomas Stuefe
On Wed, 19 Jan 2022 07:15:24 GMT, David Holmes  wrote:

>> This is in the signal handler. Is it safe to print message or do something 
>> complex?
>
> I would not print anything here, first because it is in the signal handler 
> and secondly I don't think it is needed. No real thread dump usages are going 
> to hit a VM that hasn't initialized yet IMO.

> This is in the signal handler. Is it safe to print message or do something 
> complex?

raw write to stdout should be fine I think.

>I would not print anything here, first because it is in the signal handler and 
>secondly I don't think it is needed. No real thread dump usages are going to 
>hit a VM that hasn't initialized yet IMO.

Might be nice for the user to know why his jcmd or kill -3 gets ignored though.

-

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VirtualMachineImpl should check signal handler has installed before sending SIGQUIT [v2]

2022-01-19 Thread Thomas Stuefe
On Wed, 19 Jan 2022 07:59:28 GMT, Xin Liu  wrote:

> if we install JVM_HANDLE_XXX_SIGNAL conditionally, it will leave java 
> processes with -Xrs unprotected.

That's the idea. ReduceSignalUsage (Xrs) means that the hotspot should leave 
this signal and some others alone:


1. On Solaris and Linux, the signal masks for SIGINT, SIGTERM, SIGHUP,
   and SIGQUIT are not changed by the JVM, and signal handlers for
   these signals are not installed.

because, presumably, the user already installed a handler for this signal and 
does not want the JVM to change it. Matters if  the JVM is embedded into a 
foreign launcher which does its own signal handling.

-

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VirtualMachineImpl should check signal handler has installed before sending SIGQUIT [v2]

2022-01-18 Thread Thomas Stuefe
On Wed, 19 Jan 2022 05:21:04 GMT, Xin Liu  wrote:

>> There is a handshake protocol between attach and HotSpot. Linux 
>> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been 
>> initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as 
>> SIGQUIT) and create AttachListener. However, it is possible that attach 
>> starts "handshake" before os::initialize_jdk_signal_support() is called. The 
>> signal handler which handles SIGQUIT has not been installed. Prior to 
>> os::initialize_jdk_signal_support(), universe_init() is called. Its time 
>> complexity will be linear to the initial size of heap with 'AlwaysPreTouch'. 
>>  It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. 
>> EC2 m4.10xlarge). Many tools such jcmd, jstack etc may force initializing 
>> HotSpot quit prematurely. 
>> 
>> This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will 
>> be caught by the target process before striking it with SIGQUIT.  It will 
>> make HotSpot more robust.  The fields of procfs are well 
>> [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10)
>>  and have supported since Linux 2.6.30. libattach.so will not the only 
>> consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so. 
>> 
>> 
>> Testing 
>> 
>> Before, this patch, once initialization takes long time, jcmd may quit the 
>> java process. 
>> 
>> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr 
>> -XX:ParallelGCThreads=1 &
>> [1] 9589
>> [0.028s][debug][gc,heap] Minimum heap 68719476736  Initial heap 68719476736  
>> Maximum heap 68719476736
>> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work 
>> units pre-touching 68719476736B.
>> $jcmd 9589 VM.flags
>> 9589:
>> [1]  + 9589 quit   java -Xms64g -XX:+AlwaysPreTouch 
>> -Xlog:gc+heap=debug:stderr
>> java.io.IOException: No such process
>> at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native 
>> Method)
>> at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:100)
>> at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>> at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>> 
>> With this patch, jcmd will timeout but won't disrupt 15274. 
>> 
>> $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 &
>> [1] 15274
>> $ jcmd 15274 VM.flags
>> 15274:
>> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file 
>> /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond 
>> within 10500ms or HotSpot VM not loaded
>> at 
>> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:105)
>> at 
>> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>> at 
>> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>
> Xin Liu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Intercept SIGQUIT in the early stage of HotSpot.

Hi Xin,

thanks for taking my suggestion. Remarks inline.

Cheers, Thomas

p.s. may make sense to reformulate the JBS issue. "VM does not handle SIGQUIT 
during initialization", and extending the error description to make clear this 
affects anyone sending SIGQUIT, not only jcmd.

src/hotspot/os/posix/signals_posix.cpp line 597:

> 595: #endif
> 596: 
> 597:   if (!signal_was_handled && sig == BREAK_SIGNAL) {

I would print out a little message, since callers of SIGQUIT may expect 
something to happen here. Maybe "Thread dumps not yet available"?
Also, just for clarity, assert(!ReduceSignalHandling)? (but we don't do that 
for the other signals we omit handling either, so maybe not)

src/hotspot/os/posix/signals_posix.cpp line 1291:

> 1289:   // that an attach client accidentally forces HotSpot to quit 
> prematurely.
> 1290:   set_signal_handler(BREAK_SIGNAL);
> 1291: 

I would do this only if ReduceSignalHandling is inactive, since in that case we 
must not overwrite a SIGQUIT handler the app may have installed before us. We 
should have parsed arguments already at this point, no?

-

Changes requested by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VirtualMachineImpl should check signal handler has installed before sending SIGQUIT

2022-01-17 Thread Thomas Stuefe
On Mon, 17 Jan 2022 09:00:25 GMT, Kevin Walls  wrote:

> > I propose a simpler and more robust way to fix it though
> 
> Great, this is the kind of thing I was heading towards with the conversation 
> in the bug text. Although not sure why I could not reproduce the problem, 
> with various different JDK versions.

Ah, I missed your conversation.

I reproduced this by adding a delay during initialization and sending sigquit 
manually. The bug is not restricted to jcmd, sigquit handling is broken during 
initialization. Folks tend to send sigquit to unresponsive VMs to get thread 
dumps, so coring is unfortunate (another reason not to fix it in jcmd itself).

Cheers, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/7003


Re: RFR: 8279124: VirtualMachineImpl should check signal handler has installed before sending SIGQUIT

2022-01-17 Thread Thomas Stuefe
On Mon, 10 Jan 2022 05:19:26 GMT, Xin Liu  wrote:

> There is a handshake protocol between attach and HotSpot. Linux 
> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been 
> initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as 
> SIGQUIT) and create AttachListener. However, it is possible that attach 
> starts "handshake" before os::initialize_jdk_signal_support() is called. The 
> signal handler which handles SIGQUIT has not been installed. Prior to 
> os::initialize_jdk_signal_support(), universe_init() is called. Its time 
> complexity will be linear to the initial size of heap with 'AlwaysPreTouch'.  
> It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. EC2 
> m4.10xlarge). Many tools such jcmd, jstack etc may force initializing HotSpot 
> quit prematurely. 
> 
> This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will 
> be caught by the target process before striking it with SIGQUIT.  It will 
> make HotSpot more robust.  The fields of procfs are well 
> [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10)
>  and have supported since Linux 2.6.30. libattach.so will not the only 
> consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so. 
> 
> 
> Testing 
> 
> Before, this patch, once initialization takes long time, jcmd may quit the 
> java process. 
> 
> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr 
> -XX:ParallelGCThreads=1 &
> [1] 9589
> [0.028s][debug][gc,heap] Minimum heap 68719476736  Initial heap 68719476736  
> Maximum heap 68719476736
> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work 
> units pre-touching 68719476736B.
> $jcmd 9589 VM.flags
> 9589:
> [1]  + 9589 quit   java -Xms64g -XX:+AlwaysPreTouch 
> -Xlog:gc+heap=debug:stderr
> java.io.IOException: No such process
> at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native 
> Method)
> at 
> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:100)
> at 
> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
> at 
> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
> 
> With this patch, jcmd will timeout but won't disrupt 15274. 
> 
> $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 &
> [1] 15274
> $ jcmd 15274 VM.flags
> 15274:
> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file 
> /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond 
> within 10500ms or HotSpot VM not loaded
> at 
> jdk.attach/sun.tools.attach.VirtualMachineImpl.(VirtualMachineImpl.java:105)
> at 
> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
> at 
> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)

Hi @navyxliu,

nice catch. I can see how this can be annoying.

I propose a simpler and more robust way to fix it though. We (1) set up general 
hotspot signal handling very early, then (2) proceed to initialize the heap - 
which you have shown can take some time - then (3) set up SIGQUIT handling. We 
core if we get a quit signal before (3).

I would add SIGQUIT handling to the general signal handler too, just to cover 
the time frame between (1) and (3). At (3), it would be overwritten, but that 
would be fine. Before (3), we could just ignore SIGQUIT, or print out some 
generic information (I assume thread dumps are not yet possible at this stage).

Since the documented behavior for the JVM is to threaddump on SIGQUIT unless we 
run with -Xrs, I think this is acceptable behavior. Not printing thread dump or 
only printing partial information is preferable to quitting with core.

Then, this would work for any client that sends sigquit to a JVM, not just 
those using the attach framework. And it would work on all Posix platforms, not 
just Linux. And we'd would not have to rely on parsing the proc fs.

Als note that a solution implemented in the client as you did has the 
disadvantage that I need a modern jcmd for it to work. However, I often just 
use whatever jcmd is in the path. Better to handle this in the receiving 
hotspot.

I sketched out a simple patch to test if what I propose can work:


diff --git a/src/hotspot/os/posix/signals_posix.cpp 
b/src/hotspot/os/posix/signals_posix.cpp
index 2c020a79408..3f0dd91e03b 100644
--- a/src/hotspot/os/posix/signals_posix.cpp
+++ b/src/hotspot/os/posix/signals_posix.cpp
@@ -615,6 +615,11 @@ int JVM_HANDLE_XXX_SIGNAL(int sig, siginfo_t* info,
 #endif // ZERO
   }
 
+if (sig == BREAK_SIGNAL) {
+  

Integrated: JDK-8280002: jmap -histo may leak stream

2022-01-14 Thread Thomas Stuefe
On Fri, 14 Jan 2022 10:04:53 GMT, Thomas Stuefe  wrote:

> Very trivial fix to a handle/memory leak. 
> 
> JDK-8215624 added parallel heap iteration to both `jmap -histo` and `jcmd 
> GC.class_histogram`. When called with an explicit file and an invalid 
> argument for number of threads, it leaks the file (bit of memory and a 
> handle).
> 
> Reproduce with:
> 
> `jmap -histo:parallel=notanumber,file=xx.txt`
> 
> Can only be reproduced with jmap. jcmd is safe, arguments are handled 
> correctly in shared code.

This pull request has now been integrated.

Changeset: c359c358
Author:Thomas Stuefe 
URL:   
https://git.openjdk.java.net/jdk/commit/c359c358c8ebaf7b1dddbc4b499a7aae65ba6736
Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod

8280002: jmap -histo may leak stream

Reviewed-by: shade, sspitsyn

-

PR: https://git.openjdk.java.net/jdk/pull/7078


Re: RFR: JDK-8280002: jmap -histo may leak stream

2022-01-14 Thread Thomas Stuefe
On Fri, 14 Jan 2022 12:52:11 GMT, Aleksey Shipilev  wrote:

>> Very trivial fix to a handle/memory leak. 
>> 
>> JDK-8215624 added parallel heap iteration to both `jmap -histo` and `jcmd 
>> GC.class_histogram`. When called with an explicit file and an invalid 
>> argument for number of threads, it leaks the file (bit of memory and a 
>> handle).
>> 
>> Reproduce with:
>> 
>> `jmap -histo:parallel=notanumber,file=xx.txt`
>> 
>> Can only be reproduced with jmap. jcmd is safe, arguments are handled 
>> correctly in shared code.
>
> This is not as trivial, AFAICS. Note that the existing code `delete fs` after 
> checking `if (os != NULL && os != out)`. Does it mean this patch can 
> effectively `delete out`? Or can it call `delete nullptr`?

Thanks, @shipilev  and @sspitsyn !

-

PR: https://git.openjdk.java.net/jdk/pull/7078


Re: RFR: JDK-8280002: jmap -histo may leak stream [v2]

2022-01-14 Thread Thomas Stuefe
On Fri, 14 Jan 2022 16:07:23 GMT, Aleksey Shipilev  wrote:

> Okay then!

Thanks, Alexey!

After your concerns I consider this to be not trivial enough, so I wait for a 
second reviewer.

-

PR: https://git.openjdk.java.net/jdk/pull/7078


Re: RFR: JDK-8280002: jmap -histo may leak stream [v2]

2022-01-14 Thread Thomas Stuefe
> Very trivial fix to a handle/memory leak. 
> 
> JDK-8215624 added parallel heap iteration to both `jmap -histo` and `jcmd 
> GC.class_histogram`. When called with an explicit file and an invalid 
> argument for number of threads, it leaks the file (bit of memory and a 
> handle).
> 
> Reproduce with:
> 
> `jmap -histo:parallel=notanumber,file=xx.txt`
> 
> Can only be reproduced with jmap. jcmd is safe, arguments are handled 
> correctly in shared code.

Thomas Stuefe has updated the pull request incrementally with one additional 
commit since the last revision:

  fix copyright

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7078/files
  - new: https://git.openjdk.java.net/jdk/pull/7078/files/4d27428d..8fd4856d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7078=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7078=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7078.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7078/head:pull/7078

PR: https://git.openjdk.java.net/jdk/pull/7078


Re: RFR: JDK-8280002: jmap -histo may leak stream

2022-01-14 Thread Thomas Stuefe
On Fri, 14 Jan 2022 12:52:11 GMT, Aleksey Shipilev  wrote:

> This is not as trivial, AFAICS. Note that the existing code `delete fs` after 
> checking `if (os != NULL && os != out)`. Does it mean this patch can 
> effectively `delete out`?

Thanks for looking at this. No, it is safe.

`fs` is a temporary fileStream which may or may not be NULL. It is safe to 
`delete` it. The delete you mention could be moved out of the condition and it 
would still be fine. The condition is only really needed for the output which 
should only appear if you don't write to stdout.

-

PR: https://git.openjdk.java.net/jdk/pull/7078


RFR: JDK-8280002: jmap -histo may leak stream

2022-01-14 Thread Thomas Stuefe
Very trivial fix to a handle/memory leak. 

JDK-8215624 added parallel heap iteration to both `jmap -histo` and `jcmd 
GC.class_histogram`. When called with an explicit file and an invalid argument 
for number of threads, it leaks the file (bit of memory and a handle).

Reproduce with:

`jmap -histo:parallel=notanumber,file=xx.txt`

Can only be reproduced with jmap. jcmd is safe, arguments are handled correctly 
in shared code.

-

Commit messages:
 - delete stream

Changes: https://git.openjdk.java.net/jdk/pull/7078/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7078=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8280002
  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7078.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7078/head:pull/7078

PR: https://git.openjdk.java.net/jdk/pull/7078


Re: RFR: 8276982: VM.class_hierarchy jcmd help output and man page text needs clarifications/improvements

2021-12-14 Thread Thomas Stuefe
On Wed, 15 Dec 2021 03:08:40 GMT, Chris Plummer  wrote:

> Clarify the help output for VM.class_hierarchy. The old jcmd help output was 
> misleading, and because of this got translated incorrectly for the man page.

Looks good and trivial.

Thanks, Thomas

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6845


Re: RFR: JDK-8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes [v2]

2021-12-07 Thread Thomas Stuefe
On Tue, 7 Dec 2021 11:13:01 GMT, Gilles Duboscq  wrote:

> Just out of curiosity, for such changes, should we in principle bump 
> `JMM_VERSION`? Or do we not care because libjvm and libmanagement are always 
> shipped together?

Hmm, maybe you are right, though someone from Oracle may answer that better. 
However, seeing that JMM is an internal interface between two internal 
components, I am not even sure why this version check is needed.

-

PR: https://git.openjdk.java.net/jdk/pull/6363


Re: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark

2021-11-29 Thread Thomas Stuefe
On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore  wrote:

> This change seems to keep the test case in the bug from crashing in the 
> ResourceMark destructor.  We have a ResourceMark during stack walking in 
> AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix 
> care of @fisk.
> Testing tier1-6 in progress.

> > > Hi Coleen,
> > > This bypasses the currently observed problem, but we still have a 
> > > fundamentally unsafe mechanism in use here. :(
> > > Thanks, David
> > 
> > 
> > Does AsyncGetCallTrace get triggered asynchronously via signal?
> 
> Yes:
> 
> ```v
> C [libasyncProfiler.so+0x89b4] Profiler::getJavaTraceAsync(void*, 
> ASGCT_CallFrame*, int)+0xd4
> C [libasyncProfiler.so+0x9242] Profiler::recordSample(void*, unsigned long 
> long, int, Event*)+0xd2 
> C [libasyncProfiler.so+0x34f2c] PerfEvents::signalHandler(int, siginfo_t*, 
> void*)+0x8c 
> ```

What you could do is keep (on demand only) a secondary resource area per 
thread. On entering a context that may have been called by a signal handler, 
and with the current resource area in an unknown state, swap the current 
resource area pointer in Thread with that prepared secondary resource area, and 
upon leaving swap back. That way you never touch the original resource area.

Kind of like double buffering for signal contexts.

-

PR: https://git.openjdk.java.net/jdk/pull/6606


Re: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark

2021-11-29 Thread Thomas Stuefe
On Tue, 30 Nov 2021 04:39:58 GMT, David Holmes  wrote:

> Hi Coleen,
> 
> This bypasses the currently observed problem, but we still have a 
> fundamentally unsafe mechanism in use here. :(
> 
> Thanks, David

Does AsyncGetCallTrace get triggered asynchronously via signal?

-

PR: https://git.openjdk.java.net/jdk/pull/6606


Re: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark

2021-11-29 Thread Thomas Stuefe
On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore  wrote:

> This change seems to keep the test case in the bug from crashing in the 
> ResourceMark destructor.  We have a ResourceMark during stack walking in 
> AsyncGetCallTrace.  Also RegisterMap during jvmti shouldn't process oops, fix 
> care of @fisk.
> Testing tier1-6 in progress.

LGTM

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6606


Integrated: JDK-8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes

2021-11-16 Thread Thomas Stuefe
On Fri, 12 Nov 2021 08:25:04 GMT, Thomas Stuefe  wrote:

> jmm_GetDiagnosticCommandArgumentsInfo and jmm_GetDiagnosticCommandInfo are 
> used to query the hotspot about diagnostic commands. They provide output 
> arrays for the information:
> 
> 
> void jmm_GetDiagnosticCommandArgumentsInfo(JNIEnv *env,
>   jstring command, dcmdArgInfo* infoArray)
> 
> 
> but array size is implicitly assumed to be known to both caller and callee. 
> Caller and callee negotiate those sizes in prior steps, but things can go 
> wrong. E.g. I recently hunted a bug where `DCmd::number_arguments()` was off 
> - did not reflect the real number of its jcmd parameters - which led to a 
> hidden memory overwriter.
> 
> Thankfully, JDK-8264565 rewrote the dcmd framework to deal with this 
> particular issue (The VM I analyzed was older). Still, it would be good if we 
> had additional safety measures here.
> 
> -
> 
> Testing:
> - manual tests with artificially induced error in one dcmd for debug, release
> - GHAs (which include tier1 serviceability jcmd tests which use JMX and 
> exercise these APIs)

This pull request has now been integrated.

Changeset: b8d33a2a
Author:Thomas Stuefe 
URL:   
https://git.openjdk.java.net/jdk/commit/b8d33a2a4e4ac1be322644102e8f09ce1435b4fb
Stats: 9 lines in 3 files changed: 3 ins; 0 del; 6 mod

8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes

Reviewed-by: dholmes, sspitsyn

-

PR: https://git.openjdk.java.net/jdk/pull/6363


Re: RFR: JDK-8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes [v2]

2021-11-16 Thread Thomas Stuefe
On Tue, 16 Nov 2021 06:26:08 GMT, Thomas Stuefe  wrote:

>> jmm_GetDiagnosticCommandArgumentsInfo and jmm_GetDiagnosticCommandInfo are 
>> used to query the hotspot about diagnostic commands. They provide output 
>> arrays for the information:
>> 
>> 
>> void jmm_GetDiagnosticCommandArgumentsInfo(JNIEnv *env,
>>   jstring command, dcmdArgInfo* infoArray)
>> 
>> 
>> but array size is implicitly assumed to be known to both caller and callee. 
>> Caller and callee negotiate those sizes in prior steps, but things can go 
>> wrong. E.g. I recently hunted a bug where `DCmd::number_arguments()` was off 
>> - did not reflect the real number of its jcmd parameters - which led to a 
>> hidden memory overwriter.
>> 
>> Thankfully, JDK-8264565 rewrote the dcmd framework to deal with this 
>> particular issue (The VM I analyzed was older). Still, it would be good if 
>> we had additional safety measures here.
>> 
>> -
>> 
>> Testing:
>> - manual tests with artificially induced error in one dcmd for debug, release
>> - GHAs (which include tier1 serviceability jcmd tests which use JMX and 
>> exercise these APIs)
>
> Thomas Stuefe has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove changes to GetDiagnosticCommandInfo

Thanks David and Serguey

-

PR: https://git.openjdk.java.net/jdk/pull/6363


Re: RFR: JDK-8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes [v2]

2021-11-15 Thread Thomas Stuefe
On Mon, 15 Nov 2021 10:01:33 GMT, David Holmes  wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Remove changes to GetDiagnosticCommandInfo
>
> Hi Thomas,
> 
> Sorry but only half of this makes sense to me.
> 
> Cheers,
> David

@dholmes-ora, @sspitsyn Thanks for your reviews. I removed the changes to 
GetDiagnosticCommandInfo as requested.

Thanks, Thomas

-

PR: https://git.openjdk.java.net/jdk/pull/6363


Re: RFR: JDK-8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes [v2]

2021-11-15 Thread Thomas Stuefe
> jmm_GetDiagnosticCommandArgumentsInfo and jmm_GetDiagnosticCommandInfo are 
> used to query the hotspot about diagnostic commands. They provide output 
> arrays for the information:
> 
> 
> void jmm_GetDiagnosticCommandArgumentsInfo(JNIEnv *env,
>   jstring command, dcmdArgInfo* infoArray)
> 
> 
> but array size is implicitly assumed to be known to both caller and callee. 
> Caller and callee negotiate those sizes in prior steps, but things can go 
> wrong. E.g. I recently hunted a bug where `DCmd::number_arguments()` was off 
> - did not reflect the real number of its jcmd parameters - which led to a 
> hidden memory overwriter.
> 
> Thankfully, JDK-8264565 rewrote the dcmd framework to deal with this 
> particular issue (The VM I analyzed was older). Still, it would be good if we 
> had additional safety measures here.
> 
> -
> 
> Testing:
> - manual tests with artificially induced error in one dcmd for debug, release
> - GHAs (which include tier1 serviceability jcmd tests which use JMX and 
> exercise these APIs)

Thomas Stuefe has updated the pull request incrementally with one additional 
commit since the last revision:

  Remove changes to GetDiagnosticCommandInfo

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6363/files
  - new: https://git.openjdk.java.net/jdk/pull/6363/files/65dad518..3bdc6c89

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6363=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6363=00-01

  Stats: 9 lines in 3 files changed: 0 ins; 5 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6363.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6363/head:pull/6363

PR: https://git.openjdk.java.net/jdk/pull/6363


Re: RFR: JDK-8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes

2021-11-15 Thread Thomas Stuefe
On Mon, 15 Nov 2021 09:59:44 GMT, David Holmes  wrote:

>> jmm_GetDiagnosticCommandArgumentsInfo and jmm_GetDiagnosticCommandInfo are 
>> used to query the hotspot about diagnostic commands. They provide output 
>> arrays for the information:
>> 
>> 
>> void jmm_GetDiagnosticCommandArgumentsInfo(JNIEnv *env,
>>   jstring command, dcmdArgInfo* infoArray)
>> 
>> 
>> but array size is implicitly assumed to be known to both caller and callee. 
>> Caller and callee negotiate those sizes in prior steps, but things can go 
>> wrong. E.g. I recently hunted a bug where `DCmd::number_arguments()` was off 
>> - did not reflect the real number of its jcmd parameters - which led to a 
>> hidden memory overwriter.
>> 
>> Thankfully, JDK-8264565 rewrote the dcmd framework to deal with this 
>> particular issue (The VM I analyzed was older). Still, it would be good if 
>> we had additional safety measures here.
>> 
>> -
>> 
>> Testing:
>> - manual tests with artificially induced error in one dcmd for debug, release
>> - GHAs (which include tier1 serviceability jcmd tests which use JMX and 
>> exercise these APIs)
>
> src/hotspot/share/services/management.cpp line 1968:
> 
>> 1966: 
>> 1967: JVM_ENTRY(void, jmm_GetDiagnosticCommandInfo(JNIEnv *env, jobjectArray 
>> cmds,
>> 1968:   dcmdInfo* infoArray, jint count))
> 
> I do not see the point of the change in this case. This doesn't check for a 
> mismatch between an "external" and "internal" value but between two external 
> values. If you don't trust the caller to size infoArray correctly then how 
> can you trust them to pass in the right "count" ?

Well, it warns in case of programming errors. I agree, that programming error 
is very unlikely since the caller has all the information he needs. And he 
could pass in the wrong count. 

Idk. Mostly I changed this interface to follow the established pattern of 
always passing in an explicit output buffer size. And to keep it symmetric 
to``jmm_GetDiagnosticCommandArgumentsInfo` and `jmm_GetGCExtAttributeInfo`. If 
you object, I'll remove this part.

-

PR: https://git.openjdk.java.net/jdk/pull/6363


RFR: JDK-8277029: JMM GetDiagnosticXXXInfo APIs should verify output array sizes

2021-11-14 Thread Thomas Stuefe
jmm_GetDiagnosticCommandArgumentsInfo and jmm_GetDiagnosticCommandInfo are used 
to query the hotspot about diagnostic commands. They provide output arrays for 
the information:


void jmm_GetDiagnosticCommandArgumentsInfo(JNIEnv *env,
  jstring command, dcmdArgInfo* infoArray)


but array size is implicitly assumed to be known to both caller and callee. 
Caller and callee negotiate those sizes in prior steps, but things can go 
wrong. E.g. I recently hunted a bug where `DCmd::number_arguments()` was off - 
did not reflect the real number of its jcmd parameters - which led to a hidden 
memory overwriter.

Thankfully, JDK-8264565 rewrote the dcmd framework to deal with this particular 
issue (The VM I analyzed was older). Still, it would be good if we had 
additional safety measures here.

-

Testing:
- manual tests with artificially induced error in one dcmd for debug, release
- GHAs (which include tier1 serviceability jcmd tests which use JMX and 
exercise these APIs)

-

Commit messages:
 - explicitly pass output array size and check it in hotspot

Changes: https://git.openjdk.java.net/jdk/pull/6363/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=6363=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277029
  Stats: 18 lines in 3 files changed: 8 ins; 0 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6363.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6363/head:pull/6363

PR: https://git.openjdk.java.net/jdk/pull/6363


Integrated: JDK-8276983: Small fixes to DumpAllocStat::print_stats

2021-11-14 Thread Thomas Stuefe
On Thu, 11 Nov 2021 06:15:32 GMT, Thomas Stuefe  wrote:

> When looking at CDS code in the context of Lilliput, I had to spend some time 
> in DumpAllocStat::print(). I noticed two small things which can be fixed 
> independently:
> 
> - the divide-by-zero check at lines 45ff is not needed, since `percent_of` 
> does this already. It also can cause the asserts at the end of the function 
> to fire wrongly.
> 
> - About those asserts: it makes sense to flush the debug message before scope 
> end, otherwise, we won't see the debug message if the asserts fire. If they 
> fire, the debug message would be helpful.
> 
> I also improved the assert message somewhat. I noticed that all sizes in this 
> method are `int`, but left it that way because changing it would have too 
> many ripple effects. I guess we won't get CDS archives beyond int_max any 
> time soon.
> 
> Thanks, Thomas

This pull request has now been integrated.

Changeset: 296780c7
Author:Thomas Stuefe 
URL:   
https://git.openjdk.java.net/jdk/commit/296780c7ae5c129d24997007600f428b697d3365
Stats: 13 lines in 1 file changed: 3 ins; 8 del; 2 mod

8276983: Small fixes to DumpAllocStat::print_stats

Reviewed-by: dholmes, iklam

-

PR: https://git.openjdk.java.net/jdk/pull/6347


Re: RFR: JDK-8276983: Small fixes to DumpAllocStat::print_stats

2021-11-14 Thread Thomas Stuefe
On Mon, 15 Nov 2021 00:39:53 GMT, David Holmes  wrote:

>> When looking at CDS code in the context of Lilliput, I had to spend some 
>> time in DumpAllocStat::print(). I noticed two small things which can be 
>> fixed independently:
>> 
>> - the divide-by-zero check at lines 45ff is not needed, since `percent_of` 
>> does this already. It also can cause the asserts at the end of the function 
>> to fire wrongly.
>> 
>> - About those asserts: it makes sense to flush the debug message before 
>> scope end, otherwise, we won't see the debug message if the asserts fire. If 
>> they fire, the debug message would be helpful.
>> 
>> I also improved the assert message somewhat. I noticed that all sizes in 
>> this method are `int`, but left it that way because changing it would have 
>> too many ripple effects. I guess we won't get CDS archives beyond int_max 
>> any time soon.
>> 
>> Thanks, Thomas
>
> Hi Thomas,
> 
> Seems okay. Relying on percent_of rather than passing in 1 does seem to 
> change what will be output, but AFAICS we should never be able to pass zero 
> in the first place.
> 
> Thanks,
> David

Thanks @dholmes-ora and @iklam !

-

PR: https://git.openjdk.java.net/jdk/pull/6347


RFR: JDK-8276983: Small fixes to DumpAllocStat::print_stats

2021-11-12 Thread Thomas Stuefe
When looking at CDS code in the context of Lilliput, I had to spend some time 
in DumpAllocStat::print(). I noticed two small things which can be fixed 
independently:

- the divide-by-zero check at lines 45ff is not needed, since `percent_of` does 
this already. It also can cause the asserts at the end of the function to fire 
wrongly.

- About those asserts: it makes sense to flush the debug message before scope 
end, otherwise, we won't see the debug message if the asserts fire. If they 
fire, the debug message would be helpful.

I also improved the assert message somewhat. I noticed that all sizes in this 
method are `int`, but left it that way because changing it would have too many 
ripple effects. I guess we won't get CDS archives beyond int_max any time soon.

Thanks, Thomas

-

Commit messages:
 - Small fixes to DumpAllocStat::print_stats

Changes: https://git.openjdk.java.net/jdk/pull/6347/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=6347=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276983
  Stats: 13 lines in 1 file changed: 3 ins; 8 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6347.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6347/head:pull/6347

PR: https://git.openjdk.java.net/jdk/pull/6347


Re: RFR: 8276265: jcmd man page is outdated

2021-11-09 Thread Thomas Stuefe
On Fri, 5 Nov 2021 06:19:56 GMT, David Holmes  wrote:

> It was noticed that a number of updates to the jcmd subcommands (some new, 
> some modified) had not been added to the manpage. The text was supplied by 
> @tstuefe - thanks Thomas. I applied the changes to the closed markdown 
> sources and regenerated the nroff file. For easier review I have generated a 
> html version that can be viewed here:

Your link did not work, I generated them and put them here for reviewer 
convenience: https://gist.github.com/tstuefe/8bd16a64061537d23e9ee556c7bd2965

Looks good. Most of the commands are ours (SAPs), but I had no idea that man 
pages existed.

> 
> There are further improvements that could be made, some mentioned in the bug 
> report. I also noticed a general confusion between arguments and options with 
> some subcommands, but I did not try to address that.
> 

jcmd commands in general feel a bit anarchic. Different styles etc. But I fear 
that cannot be changed because of backward compatibility.

Thanks, Thomas

-

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6271


Re: RFR: 8276337: Use override specifier in HeapDumper

2021-11-05 Thread Thomas Stuefe
On Fri, 5 Nov 2021 13:02:52 GMT, Leo Korinth  wrote:

> > In what way is it "safer" ?? What are you trying to guard against?
> 
> If you use `override`, you can not forget to update a method signature if you 
> change the method signature in the base class. You can also not by mistake 
> change the method signature in this class without realizing you should change 
> the base class. There is no downside of using `override` that I know of.

In addition to that, its a helpful documentation for the code reader.

-

PR: https://git.openjdk.java.net/jdk/pull/6274


  1   2   >