On 7/13/18 12:35 PM, Markus Gaisbauer wrote:
Hello,
I am trying to use ThreadMXBean::getThreadAllocatedBytes
(com.sun.management) to get the amount of allocated memory of the
current thread in some performance critical code.
Unfortunately, the current implementation can be rather slow and the
duration of each call unpredictable. I ran a test in a JVM with 500
threads. Depending on which thread was queried,
getThreadAllocatedBytes took between 100 ns and 2500 ns.
The root cause of the problem is
ThreadsList::find_JavaThread_from_java_tid which performs a linear
scan through all Java threads in the current process. The more threads
a JVM has, the slower it gets. In the worst case, the thread with the
given TID is found as the last entry in the list.
Before Java 10, the oldest thread is the slowest one to query.
Since Java 10, the youngest thread is the slowest one to query. I
think this was a side effect of introducing "Thread Safe Memory
Reclamation (Thread-SMR) support".
Oldest Thread Youngest Thread
Java 8 8740 ns 76 ns
Java 10 109 ns 2485 ns
It is good to see that longest search is much faster. Erik and Robbin
will be pleased since speeding up traversal of the ThreadsList was one
of the things that we tried to do during the Thread-SMR project.
A first step is get a new bug filed that documents the issue with
ThreadMXBean::getThreadAllocatedBytes(). Perhaps Gary or Serguei
will take care of that.
Dan
A common use case is to query the metric for the current thread (e.g.
before and after performing some operation). This case can be
optimized by introducing a new method: getCurrentThreadAllocatedBytes.
I created a patch for http://hg.openjdk.java.net/jdk/jdk/ and by using
the new method I saw the following improvements in my test:
Oldest Thread Youngest Thread
Proposal 37 ns 37 ns
This is a 60x improvement over the worst case of the current API. In
the best case of the current API, the new method is still 3 times faster.
// based on JVM_SetNativeThreadName in jvm.cpp.
JVM_ENTRY(jlong, jmm_GetCurrentThreadAllocatedMemory(JNIEnv *env,
jobject currentThread))
// We don't use a ThreadsListHandle here because the current thread
// must be alive.
oop java_thread = JNIHandles::resolve_non_null(currentThread);
JavaThread* thr = java_lang_Thread::thread(java_thread);
if (thread == thr) {
// only supported for the current thread
return thr->cooked_allocated_bytes();
}
return -1;
JVM_END
The proposed method also fixes the problem, that
getThreadAllocatedBytes itself allocates some memory on the current
thread (two long arrays, 24 bytes) and therefore can slightly skew
measurements. The new method, getCurrentThreadAllocatedBytes, returns
exactly the same value if it is called twice without allocating any
memory between those calls.
I also built a variation of this method that could be used to query
allocated memory more efficiently for anyone who already has a
java.lang.Thread object:
JVM_ENTRY(jlong, jmm_GetThreadAllocatedMemory(JNIEnv *env, jobject
threadObj))
// based on code proposedin threadSMR.hpp
ThreadsListHandle tlh;
JavaThread* thr = NULL;
bool is_alive = tlh.cv_internal_thread_to_JavaThread(threadObj,
&thr, NULL);
if (is_alive) {
return thr->cooked_allocated_bytes();
}
return -1;
JVM_END
This method took 70 ns in my test, which is 85% slower
than GetCurrentThreadAllocatedMemory but still 30% faster than the
best case of the current API. I currently have no immediate need for
this second method, but I think it would also be a valueable addition
to the API.
I attached a patch for getCurrentThreadAllocatedBytes. I can create a
second patch for also adding
getThreadAllocatedMemory(java.lang.Thread) to the API.
I am a first time contributor and I am not 100% sure what process I
must follow to get a change like this into OpenJDK. Can someone have a
look at my proposal and help me through the process?
Best regards,
Markus