On Mon, 2 Mar 2026 14:05:25 GMT, Robert Toyonaga <[email protected]> wrote:

>> ## Summary
>> Now that we [fail fatally if `os::` memory functions encounter 
>> errors](https://bugs.openjdk.org/browse/JDK-8353564), it's possible to 
>> tighten NMT locking scope by swapping the order of `pd_release_memory` and 
>> `record_virtual_memory_release`.
>> See original discussion: 
>> https://github.com/openjdk/jdk/pull/24084#issuecomment-2752240832
>> 
>> ### The new ordering in os::release_memory:
>> 
>> **Thread A:**
>> _lock_
>> (A1) update NMT
>> _unlock_
>> (A2) pd_release
>> 
>> **Thread B:**
>> (B1) pd_reserve
>> _lock_
>> (B2) update NMT
>> _unlock_
>> 
>> The race having ordering A1, B1, B2, A2  is not possible. As long as 
>> Thread_A has not yet released the reservation, the OS prevents Thread_B from 
>> racing to re-reserve the same range. Meaning Thread_A does not need to call 
>> pd_release under the NMT lock protection. 
>> 
>> 
>> ## Further work
>> It's possible that the same lock scope tightening can be done for 
>> `os::unmap_memory` and `os::uncommit_memory`, but there would be no hard 
>> guarantee provided by the OS against races. We would need to rely on 
>> external synchronization in the caller to prevent concurrent modification of 
>> regions. In practice, it doesn't seem like any existing Hotspot code can 
>> lead to the races described **_below_**, but I'd like to collect more 
>> opinions here before changing `os::unmap_memory` and `os::uncommit_memory`.
>> 
>> ### Possible uncommit Race
>> **Thread_A:**
>> _lock_
>> (A1) update NMT with uncommit
>> _unlock_
>> (A2) pd_uncommit
>> 
>> **Thread_B:**
>> (B1) pd_commit
>> _lock_
>> (B2) update NMT with commit
>> _unlock_
>> 
>> Problematic ordering: A1, B1, B2, A2  
>> (The region is uncommitted but NMT thinks it's committed. The OS doesn't 
>> stop us from doing B1 before A2)
>> 
>> ### Possible unmap race
>> **Thread_A:**
>> _lock_
>> (A1) update NMT with release
>> _unlock_
>> (A2) pd_unmap_memory
>> 
>> **Thread_B:**
>> (B1) pd_map_memory
>> _lock_
>> (B2) update NMT with reserve + commit
>> _unlock_
>> 
>> Problematic ordering: A1, B1, B2, A2 
>> (The OS allows remapping over an existing mapping using MAP_FIXED, so B1 can 
>> happen before A2. This results in Thread_B's mapping being destroyed.)
>
> Robert Toyonaga has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - Revert "update copyright headers"
>    
>    This reverts commit 4ab201a476291b3075edc25940218623f555d113.
>  - update copyright headers

Ahhh, yes you are right. Back to your original suggestion, I think the address 
size and range in the fatal message together with call-stack trace that is 
printed on vm_exit are enough to let the user get info about the source of 
failure. So, an incorrect NMT report is tolerable.
IIRC, this locking issue happens more frequently on linux-aarch64. Have you 
tested on that environment?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29962#issuecomment-4010431913

Reply via email to